Xen Paravirtualization on x86 Architecture
The following figure describes the architecture of Xen and its mapping onto a classic x86 privilege
model.
A Xen-based system is managed by the Xen hypervisor, which runs in the highest privileged
mode and controls the access of guest operating system to the underlying hardware. Guest operating systems are executed within domains, which represent virtual machine instances. Moreover, specific control software, which has privileged access to the host and controls all the
other guest operating systems, is executed in a special domain called Domain 0.
Many of the x86 implementations support four different security levels, called rings, where
Ring 0 represent the level with the highest privileges and Ring 3 the level with the lowest ones.
Almost all the most popular operating systems, except OS/2, utilize only two levels: Ring 0 for the
kernel code, and Ring 3 for user application and nonprivileged OS code. This provides the opportunity
for Xen to implement virtualization by executing the hypervisor in Ring 0, Domain 0, and all
the other domains running guest operating systems—generally referred to as Domain U—in Ring 1,
while the user applications are run in Ring 3. This allows Xen to maintain the ABI unchanged, thus
allowing an easy switch to Xen-virtualized solutions from an application point of view.
Because of
the structure of the x86 instruction set, some instructions allow code executing in Ring 3 to jump
into Ring 0 (kernel mode). Such operation is performed at the hardware level and therefore within
a virtualized environment will result in a trap or silent fault, thus preventing the normal operations
of the guest operating system, since this is now running in Ring 1. This condition is generally triggered
by a subset of the system calls.
To avoid this situation, operating systems need to be changed
in their implementation, and the sensitive system calls need to be reimplemented with hypercalls,
which are specific calls exposed by the virtual machine interface of Xen. With the use of hypercalls,
the Xen hypervisor is able to catch the execution of all the sensitive instructions, manage
them, and return the control to the guest operating system by means of a supplied handler.
Paravirtualization needs the operating system codebase to be modified, and hence not all operating
systems can be used as guests in a Xen-based environment. More precisely, this condition holds
in a scenario where it is not possible to leverage hardware-assisted virtualization, which allows running
the hypervisor in Ring -1 and the guest operating system in Ring 0.
Therefore, Xen exhibits
some limitations in the case of legacy hardware and legacy operating systems. In fact, these cannot
be modified to be run in Ring 1 safely since their codebase is not accessible and, at the same time,
the underlying hardware does not provide any support to run the hypervisor in a more privileged
mode than Ring 0.
Open-source operating systems such as Linux can be easily modified, since their
code is publicly available and Xen provides full support for their virtualization, whereas components
of the Windows family are generally not supported by Xen unless hardware-assisted virtualization
is available.
It can be observed that the problem is now becoming less and less crucial since
both new releases of operating systems are designed to be virtualization aware and the new
hardware supports x86 virtualization.
VMWare Full Virtualization Reference Model
VMware’s technology is based on the concept of full virtualization, where the underlying hardware
is replicated and made available to the guest operating system, which runs unaware of such abstraction
layers and does not need to be modified. VMware implements full virtualization either in the desktop environment, by means of Type II hypervisors, or in the server environment, by means of Type I hypervisors. In both cases, fullvirtualization is made possible by means of direct execution
(for nonsensitive instructions) and binary translation (for sensitive instructions), thus allowing the
virtualization of architecture such as x86.
The x86 architecture design does not satisfy the first theorem of virtualization, since the set of sensitive instructions is not a subset of the privileged instructions. This causes a different behavior when such instructions are not executed in Ring0, which is the normal case in a virtualization scenario where the guest OS is run in Ring1. Generally, a trap is generated and the way it is managed differentiates the solutions in which virtualization is implemented for x86 hardware. In the case of dynamic binary translation, the trap triggers the translation of the offending instructions in to an equivalent set of instructions that achieves the same goal without generating exceptions. Moreover, to improve performance, the equivalent set of instruction is cached so that translation is no longer necessary for further occurrences of the same instructions. The above figure gives an idea of the process.



No comments:
Post a Comment
Note: Only a member of this blog may post a comment.