mirror of
https://github.com/mmueller41/genode.git
synced 2026-01-21 12:32:56 +01:00
Documentation changes on account of the book
This patch removes the outdates doc/architecture.txt since the topics are covered by the book. We keep repos/os/doc/init.txt because it contains a few details not present in the book (yet). The patch streamlines the terminology a bit. Furthermore, it slightly adjusts a few source-code comments to improve the book's functional specification chapter.
This commit is contained in:
committed by
Christian Helmuth
parent
3e6308e83b
commit
97a41394b4
@@ -1,789 +0,0 @@
|
||||
|
||||
====================================
|
||||
Design of the Genode OS Architecture
|
||||
====================================
|
||||
|
||||
|
||||
Norman Feske and Christian Helmuth
|
||||
|
||||
Abstract
|
||||
########
|
||||
|
||||
In the software world, high complexity of a problem solution comes along with a
|
||||
high risk for bugs and vulnerabilities.
|
||||
This correlation is particularly perturbing for todays commodity operating
|
||||
systems with their tremendous complexity.
|
||||
The numerous approaches to increase the user's confidence in the correct
|
||||
functioning of software comprise exhaustive tests, code auditing, static code
|
||||
analysis, and formal verification.
|
||||
Such quality-assurance measures are either rather shallow or they scale badly
|
||||
with increasing complexity.
|
||||
|
||||
The operating-system design presented in this document focuses on the root of the
|
||||
problem by providing means to minimize the underlying system complexity for
|
||||
each security-sensitive application individually.
|
||||
On the other hand, we want to enable multiple applications to execute on the
|
||||
system at the same time whereas each application may have different functional
|
||||
requirements from the operating system.
|
||||
Todays operating systems provide a functional superset of the requirements of
|
||||
all applications and thus, violate the principle of minimalism for each single
|
||||
application.
|
||||
We resolve the conflict between the principle of minimalism and the versatility
|
||||
of the operating system by decomposing the operating system into small
|
||||
components and by providing a way to execute those components isolated and
|
||||
independent from each other.
|
||||
Components can be device drivers, protocol stacks such as file systems and
|
||||
network stacks, native applications, and containers for executing legacy
|
||||
software.
|
||||
Each application depends only on the functionality of a bounded set of
|
||||
components that we call _application-specific_trusted_computing_base_(TCB)_.
|
||||
If the TCBs of two applications are executed completely _isolated_ and
|
||||
_independent_ from each other, we consider both TCBs as minimal.
|
||||
|
||||
In practice however, we want to share physical resources between multiple applications
|
||||
without sacrificing their independence.
|
||||
Therefore, the operating-system design has to enable the assignment of physical
|
||||
resources to each application and its TCB to maintain independence from other
|
||||
applications.
|
||||
Furthermore, rather than living in complete isolation, components require to
|
||||
communicate with each other to cooperate.
|
||||
The operating-system design must enable components to create other components
|
||||
and get them to know each other while maintaining isolation from uninvolved
|
||||
parts of the system.
|
||||
|
||||
First, we narrow our goals and pose our mayor challenges in Section [Goals and Challenges].
|
||||
Section [Interfaces and Mechanisms] introduces our fundamental concepts and
|
||||
protocols that apply to each component in the system.
|
||||
In Section [Core - the root of the process tree], we present the one component
|
||||
that is mandatory part of each TCB, enables the bootstrapping of the system,
|
||||
and provides abstractions for the lowest-level resources.
|
||||
We exercise the composition of the presented mechanisms by the means of process
|
||||
creation in Section [Process creation].
|
||||
;Section [Framework infrastructure]
|
||||
|
||||
|
||||
Goals and Challenges
|
||||
####################
|
||||
|
||||
The Genode architecture is designed to accommodate the following types
|
||||
of components in a secure manner concurrently on one machine:
|
||||
|
||||
:Device drivers:
|
||||
|
||||
Device drivers translate the facilities of raw physical devices to
|
||||
device-class-specific interfaces to be used by other components.
|
||||
They contain no security policies and provide their services
|
||||
to only one client component per device.
|
||||
|
||||
:Services that multiplex resources:
|
||||
|
||||
To make one physical resource (e.g., a device) usable by multiple
|
||||
components at the same time, the physical resource must be translated
|
||||
to multiple virtual resources. For example, a
|
||||
frame buffer provided by a device driver can only be used by one
|
||||
client at the same time. A window system multiplexes this physical
|
||||
resource to make it available to multiple clients. Other examples
|
||||
are an audio mixer or a virtual network hub.
|
||||
In contrast to a device driver, a _resource multiplexer_ deals with multiple
|
||||
clients and therefore, plays a crucial role for maintaining the independence
|
||||
and isolation of its clients from each other.
|
||||
|
||||
:Protocol stacks:
|
||||
|
||||
Protocol stacks translate low-level protocols to a higher and more applicable
|
||||
level.
|
||||
For example, a file system translates a block-device protocol to a file
|
||||
abstraction, a TCP/IP stack translates network packets to a socket
|
||||
abstraction, or a widget set maps high-level GUI elements to pixels.
|
||||
Compared to resource multiplexers, protocol stacks are typically an
|
||||
order of magnitude more complex.
|
||||
Protocol stacks may also act as resource multiplexers. In this case however,
|
||||
high complexity puts the independence and isolation of multiple
|
||||
clients at a high risk.
|
||||
Therefore, our design should enable the instantiation of protocol stacks per
|
||||
application.
|
||||
For example, instead of letting a security-sensitive application share one
|
||||
TCP/IP stack with multiple other (untrusted) applications, it could use a
|
||||
dedicated instance of a TCP/IP stack to increase its independence and
|
||||
isolation from the other applications.
|
||||
|
||||
:Containers for executing legacy software:
|
||||
|
||||
A _legacy container_ provides an environment for the execution of existing
|
||||
legacy software. This can be achieved by the means of a virtual machine
|
||||
(e.g., a Java VM, a virtual PC), a compatible programming API (e.g., POSIX,
|
||||
Qt), a language environment (e.g., LISP), or a script interpreter.
|
||||
In the majority of cases, we regard legacy software as an untrusted black box.
|
||||
One particular example for legacy software are untrusted legacy device drivers.
|
||||
In this case, the container has to protect the physical hardware from
|
||||
potentially malicious device accesses by the untrusted driver.
|
||||
Legacy software may be extremely complex and resource demanding, for example
|
||||
the Firefox web browser executed on top of the X window system and the Linux
|
||||
kernel inside a virtualized PC.
|
||||
In this case, the legacy container may locally implement sophisticated
|
||||
resource-management techniques such as virtual memory.
|
||||
|
||||
:Small custom security-sensitive applications:
|
||||
|
||||
Alongside legacy software, small custom applications implement crucial
|
||||
security-sensitive functionality.
|
||||
In contrast to legacy software, which we mostly regard as untrusted anyway,
|
||||
a low TCB complexity for custom applications is of extreme importance.
|
||||
Given the special liability of such an application, it is very carefully
|
||||
designed to have low complexity and require as little infrastructure as
|
||||
possible.
|
||||
A typical example is a cryptographic component that protects credentials
|
||||
of the user.
|
||||
Such an application does not require swapping (virtual memory), a POSIX API,
|
||||
or a complete C library.
|
||||
Instead, the main objectives of such an application are to avoid as much as
|
||||
possible code from being included in its TCB and to keep its requirements at
|
||||
a minimum.
|
||||
|
||||
Our design must be able to create and destroy subsystems that are composed of
|
||||
multiple such components.
|
||||
The _isolation_ requirement as stated in the introduction raises the
|
||||
question of how to organize the locality of name spaces and how to distribute
|
||||
access from components to other components within the system.
|
||||
The _independence_ requirement demands the assignment of physical resources
|
||||
to components such that different applications do not interfere.
|
||||
Instead of managing access control and physical resources from a central
|
||||
place, we desire a distributed way for applying policy for trading and revocating
|
||||
resources and for delegating rights.
|
||||
|
||||
|
||||
|
||||
Interfaces and Mechanisms
|
||||
#########################
|
||||
|
||||
The system is structured as a tree.
|
||||
The nodes of the tree are processes.
|
||||
A node, for which sub-nodes exist, is called the _parent_ of these sub-nodes
|
||||
(_children_).
|
||||
The parent creates children out of its own resources and defines
|
||||
their execution environment.
|
||||
Each process can announce services to its parent.
|
||||
The parent, in turn, can mediate such a service to its other children.
|
||||
When a child is created, its parent provides the initial contact to the
|
||||
outer world via the following interface:
|
||||
|
||||
! void exit(int exit_value);
|
||||
!
|
||||
! Session_capability session(String service_name,
|
||||
! String args);
|
||||
!
|
||||
! void close(Session_capability session_cap);
|
||||
!
|
||||
! int announce(String service_name,
|
||||
! Root_capability service_root_cap);
|
||||
!
|
||||
! int transfer_quota(Session_capability to_session_cap,
|
||||
! String amount);
|
||||
|
||||
|
||||
:'exit': is called by a child to request its own termination.
|
||||
|
||||
:'session': is called by a child to request a connection to the specified
|
||||
service as known by its parent whereas 'service_name' is the name
|
||||
of the desired service _interface_.
|
||||
The way of resolving or even denying a 'session' request depends on
|
||||
the policy of the parent.
|
||||
The 'args' parameter contains construction arguments for the session
|
||||
to be created.
|
||||
In particular, 'args' contains a specification of resources that the
|
||||
process is willing to donate to the server during the session lifetime.
|
||||
|
||||
:'close': is called by a child to inform its parent that the specified
|
||||
session is no longer needed.
|
||||
The parent should close the session and hand back donated
|
||||
resources to the child.
|
||||
|
||||
:'announce': is called by a child to register a locally implemented
|
||||
service at its parent. Hence, this child is a server.
|
||||
|
||||
:'transfer_quota': enables a child to extend its resource donation
|
||||
to the server that provides the specified session.
|
||||
|
||||
We provide a detailed description and motivation for the different functions
|
||||
in Sections [Servers] and [Quota].
|
||||
|
||||
Servers
|
||||
=======
|
||||
|
||||
Each process may implement services and announce them via the 'announce'
|
||||
function of the parent interface.
|
||||
When announcing a service, the server specifies a _root_ capability for
|
||||
the implemented service.
|
||||
The interface of the root capability enables the parent to create, configure,
|
||||
and close sessions of the service:
|
||||
|
||||
! Session_capability session(String args);
|
||||
!
|
||||
! int transfer_quota(Session_capability to_session_cap,
|
||||
! String amount);
|
||||
!
|
||||
! void close(Session_capability session_cap);
|
||||
|
||||
|
||||
[image announce 60%]
|
||||
Announcement of a service by a child (server).
|
||||
Colored circles at the edge of a component represent remotely accessible
|
||||
objects. Small circles inside a component represent a reference (capability)
|
||||
to a remote object. A cross-component reference to a remote object is
|
||||
illustrated by a dashed arrow. An opaque arrow symbolizes a RPC call/return.
|
||||
|
||||
Figure [announce] illustrates an announcement of a service.
|
||||
Initially, each child has a capability to its parent.
|
||||
After Child1 announces its service "Service", its parent knows the
|
||||
root capability of this service under the local name 'srv1_r' and stores
|
||||
the root capability with the announced service name in its _root_list_.
|
||||
The root capability is intended to be used and kept by the parent only.
|
||||
|
||||
[image request 60%]
|
||||
Service request by a client.
|
||||
|
||||
When a parent calls the 'session' function of the root interface of a server
|
||||
child, the server creates a new client session and returns the corresponding
|
||||
'client_session' capability.
|
||||
This session capability provides the actual service-specific interface.
|
||||
The parent can use it directly or it may pass it to other processes, in
|
||||
particular to another child that requested the session.
|
||||
In Figure [request], Child2 initiates the creation of a "Service" session
|
||||
by a 'session' call at its parent capability (1).
|
||||
The parent uses its root list to look up the root capability that matches the
|
||||
service name "Service" (2) and calls the 'session' function at the
|
||||
server (3).
|
||||
Child1 being the server creates a new session ('session1') and returns the
|
||||
session capability as result of the 'session' call (4).
|
||||
The parent now knows the new session under the local name 'srv1_s1' (5) and
|
||||
passes the session capability as return value of Child2's initial 'session'
|
||||
call (6).
|
||||
The parent maintains a _session_list_, which stores the interrelation between
|
||||
children and their created sessions.
|
||||
Now, Child2 has a direct communication channel to 'session1' provided by
|
||||
the server (Child1) (7).
|
||||
|
||||
The 'close' function of the root interface instructs the server to
|
||||
destroy the specified session and to release all session-specific resources.
|
||||
|
||||
; Mittels 'set_quota' kann der Parent einen Dienst anweisen, die Ressourcennutzung
|
||||
; für eine angegebene 'client_session' zu begrenzen. Eine nähere Beschreibung des
|
||||
; Ressourcen-Accountings erfolgt in Kapitel [Quota].
|
||||
|
||||
[image twolevels 80%]
|
||||
Announcement and request of a service in a subsystem.
|
||||
For simplicity, parent capabilities are not displayed.
|
||||
|
||||
Even though the prior examples involved only one parent,
|
||||
the announce-request mechanism can be used recursively for tree
|
||||
structures of any depth and thus allow for partitioning
|
||||
the system into subsystems that can cooperate with each other whereas
|
||||
parents are always in complete control over the communication
|
||||
and resource usage of their children (and their subsystems).
|
||||
|
||||
Figure [twolevels] depicts a nested subsystem on the left.
|
||||
Child1 announces its service named "Service" at its parent that, in turn,
|
||||
announces a service named "Service" at the Grandparent.
|
||||
The service names do not need to be identical.
|
||||
Their meaning spans to their immediate parent only and there
|
||||
may be a name remapping on each hierarchy level.
|
||||
Each parent can decide itself whether to further announce
|
||||
services of their children to the outer world or not.
|
||||
The parent can announce Child1's service to the grandparent
|
||||
by creating a new root capability to a local service that forwards
|
||||
session-creation and closing requests to Child1.
|
||||
Both Parent and Grandparent keep their local root lists.
|
||||
In a second step, Parent2 initiates the creation of a session to
|
||||
the service by issuing a 'session' request at the Grandparent (1).
|
||||
Grandparent uses its root list to look up the service-providing child (from
|
||||
Grandparent's local view) Parent1 (2).
|
||||
Parent1 in turn, implements the service not by itself but delegates
|
||||
the 'session' request to Child1 by calling the 'session' function
|
||||
of the actual "Service" root interface (3).
|
||||
The session capability, created by Child1 (4), can now be passed to Parent2
|
||||
as return value of nested 'session' calls (5, 6).
|
||||
Each involved node keeps the local knowledge about the created session
|
||||
such that later, the session can be closed in the same nested fashion.
|
||||
|
||||
Quota
|
||||
=====
|
||||
|
||||
Each process that provides services to other processes consumes resources on
|
||||
behalf of it clients.
|
||||
Such a server requires memory to maintain session-specific state, processing
|
||||
time to perform the actual service function, and eventually further system
|
||||
resources (e.g., bus bandwidth) dependent on client requests.
|
||||
To avoid denial-of-service problems, a server must not allocate such
|
||||
resources from its own budget but let the client pay.
|
||||
Therefore, a mechanism for donating resource quotas from the client to the
|
||||
server is required.
|
||||
Both client and server may be arbitrary nodes in the process tree.
|
||||
In the following, we examine the trading of resource quotas within
|
||||
the recursive system structure using memory as an example.
|
||||
|
||||
When creating a child, the parent assigns a part of its own memory quota
|
||||
to the new child.
|
||||
During the lifetime of the child, the parent can further transfer
|
||||
quota back and forth between the child's and its own account.
|
||||
Because the parent creates its children out of its own resources,
|
||||
it has a natural interest to correctly manage child quotas.
|
||||
When a child requests a session to a service, it can bind a part
|
||||
of its quota to the new session by specifying a resource donation
|
||||
as an argument.
|
||||
When receiving a session request, the parent has to distinct
|
||||
three different cases, dependent on where the corresponding server
|
||||
resides:
|
||||
|
||||
:Parent provides service:
|
||||
|
||||
If the parent provides the requested services by itself,
|
||||
it transfers the donated amount of memory quota from the
|
||||
requesting child's account to its own account to compensate
|
||||
the session-specific memory allocation on behalf of its own
|
||||
child.
|
||||
|
||||
:Server is another child:
|
||||
|
||||
If there exists a matching entry in the parent's root list,
|
||||
the requested service is provided by another child (or a
|
||||
node within the child subsystem). In this case, the parent
|
||||
transfers the donated memory quota from the requesting child
|
||||
to the service-providing child.
|
||||
|
||||
:Delegation to grandparent:
|
||||
|
||||
The parent may decide to delegate the session request to
|
||||
its own parent because the requested service is provided by
|
||||
a lower node of the process tree.
|
||||
Thus, the parent will request a session on behalf of its child.
|
||||
The grandparent neither knows nor cares about the actual
|
||||
origin of the request and will simply decrease the memory
|
||||
quota of the parent.
|
||||
For this reason, the parent transfers the donated memory
|
||||
quota from the requesting child to its own account before
|
||||
calling the grandparent.
|
||||
|
||||
This algorithm works recursively.
|
||||
Once, the server receives the session request, it checks if
|
||||
the donated memory quota suffices for storing the session-specific
|
||||
data and, on success, creates the session.
|
||||
If the initial quota donation turns out to be too scarce during
|
||||
the lifetime of a session, the client may make further donations
|
||||
via the 'transfer_quota' function of the parent interface that
|
||||
works analogously.
|
||||
|
||||
If a child requests to close a session, the parent must distinguish
|
||||
the three cases as above.
|
||||
Once, the server receives the session-close request from its parent,
|
||||
it is responsible to release all resources that were used for this session.
|
||||
After the server releases the session-specific resources, the
|
||||
server's quota can be decreased to the prior state.
|
||||
However, an ill-behaving server may fail to release those resources by malice
|
||||
or caused by a bug.
|
||||
|
||||
If the misbehaving service was provided by the parent himself,
|
||||
it has the full authority to not hand back session-quota to
|
||||
its child.
|
||||
If the misbehaving service was provided by the grandparent,
|
||||
the parent (and its whole subsystem) has to subordinate.
|
||||
If, however, the service was provided by another child and the
|
||||
child refuses to release resources, decreasing its quota after
|
||||
closing the session will fail.
|
||||
It is up to the policy of the parent to handle such a failure either by
|
||||
punishing it (e.g., killing the misbehaving server) or by granting more of its
|
||||
own quota.
|
||||
Generally, misbehavior is against the server's own interests and
|
||||
each server would obey the parent's 'close' request to avoid intervention.
|
||||
|
||||
|
||||
Successive policy management
|
||||
============================
|
||||
|
||||
For supporting a high variety of security policies for access control, we
|
||||
require a way to bind properties and restrictions to sessions. For example,
|
||||
a file service may want to restrict the access to files according to an
|
||||
access-control policy that is specific for each client session.
|
||||
On session creation, the 'session' call takes an 'args' argument that can be
|
||||
used for that purpose. It is a list of tag-value pairs describing the session
|
||||
properties. By convention, the list is ordered by attribute priority starting
|
||||
with the most important property.
|
||||
The server uses these 'args' as construction arguments for the new
|
||||
session and enforces the security policy as expressed by 'args' accordingly.
|
||||
Whereas the client defines its desired session-construction arguments, each
|
||||
node that is incorporated in the session creation can alter these arguments in
|
||||
any way and may add further properties.
|
||||
This effectively enables each parent to impose any desired restrictions to
|
||||
sessions created by its children.
|
||||
This concept works recursively and enables each node in the process hierarchy
|
||||
to control exactly the properties that it knows and cares about. As a side
|
||||
note, the specification of resource donations as described in the Section
|
||||
[Quota] is performed with the same mechanism. A resource donation is a property
|
||||
of a session.
|
||||
|
||||
[image incremental_restrictions]
|
||||
Successive application of policies at the creation time of a new session.
|
||||
|
||||
Figure [incremental_restrictions] shows an example scenario. A user
|
||||
application issues the creation of a new session to the 'GUI' server and
|
||||
specifies its wish for reading user input and using the string "Terminal" as
|
||||
window label (1).
|
||||
The parent of the user application is the user manager that introduces
|
||||
user identities into the system and wants to ensure that each displayed window
|
||||
gets tagged with the user and the executed program. Therefore, it overrides the
|
||||
'label' attribute with more accurate information (2). Note that the modified
|
||||
argument is now the head of the argument list.
|
||||
The parent of the user manager, in turn, implements further policies. In the
|
||||
example, Init's policy prohibits the user-manager subtree from reading
|
||||
input (for example to disable access to the system beyond official working hours)
|
||||
by redefining the 'input' attribute and leaving all other attributes unchanged (3).
|
||||
The actual GUI server observes the final result of the successively changed
|
||||
session-construction arguments (4) and it is responsible for enforcing the specified
|
||||
policy for the lifetime of the session.
|
||||
Once a session has been established, its properties are fixed and cannot be changed.
|
||||
|
||||
|
||||
Core - the root of the process tree
|
||||
###################################
|
||||
|
||||
Core is the first user-level program that takes control when starting up the
|
||||
system. It has access to the raw physical resources and converts them to
|
||||
abstractions that enable multiple programs to use these resources.
|
||||
In particular, core converts the physical address space to higher-level
|
||||
containers called _dataspaces_.
|
||||
A dataspace represents a contiguous physical address space region with an
|
||||
arbitrary size (at page-size granularity).
|
||||
Multiple processes can make the same dataspace accessible in their
|
||||
local address spaces.
|
||||
The system on top of core never deals with physical memory pages but
|
||||
uses this uniform abstraction to work with memory, memory-mapped I/O
|
||||
regions, and ROM areas.
|
||||
|
||||
*Note:* _Using only contiguous dataspaces may lead to fragmentation of the_
|
||||
_physical address space. This property is, however, only required by_
|
||||
_a few rare cases (e.g., DMA transfers). Therefore, later versions of the_
|
||||
_design will support non-contiguous dataspaces._
|
||||
|
||||
Furthermore, core provides all prerequisites to bootstrap the process tree.
|
||||
These prerequisites comprise services for creating processes and threads,
|
||||
for allocating memory, for accessing boot-time-present files, and for managing
|
||||
address-space layouts.
|
||||
Core is almost free from policy. There are no configuration options.
|
||||
The only policy of core is the startup of the init process to which core
|
||||
grants all available resources.
|
||||
|
||||
In the following, we explain the session interfaces of core's services in
|
||||
detail.
|
||||
|
||||
|
||||
RAM - allocator for physical memory
|
||||
===================================
|
||||
|
||||
A RAM session is a quota-bounded allocator of blocks from physical memory.
|
||||
There are no RAM-specific session-construction arguments.
|
||||
Immediately after the creation of a RAM session, its quota is zero.
|
||||
To make the RAM session functional, it must be loaded with quota from
|
||||
another already existing RAM session, which we call the _reference account_.
|
||||
The reference account of a RAM session can be defined initially via:
|
||||
!int ref_account(Ram_session_capability ram_session_cap);
|
||||
Once the reference account is defined, quota can be transferred back and
|
||||
forth between the reference account and the new RAM session with:
|
||||
!int transfer_quota(Ram_session_capability ram_session_cap,
|
||||
! size_t amount);
|
||||
Provided, the RAM session has enough quota, a dataspace of a given size
|
||||
can be allocated with:
|
||||
!Ram_dataspace_capability alloc(size_t size);
|
||||
The result value of 'alloc' is a capability to the RAM-dataspace
|
||||
object implemented in core. This capability can be communicated to other
|
||||
processes and can be used to make the dataspace's physical-memory region
|
||||
accessible from these processes.
|
||||
An allocated dataspace can be released with:
|
||||
!void free(Ram_dataspace_capability ds_cap);
|
||||
The 'alloc' and 'free' calls track the used-quota information of the RAM
|
||||
session accordingly.
|
||||
Current statistical information about the quota limit and the
|
||||
used quota can be retrieved by:
|
||||
!size_t quota();
|
||||
!size_t used();
|
||||
Closing a RAM session implicitly destroys all allocated dataspaces.
|
||||
|
||||
|
||||
ROM - boot-time-file access
|
||||
===========================
|
||||
|
||||
A ROM session represents a boot-time-present read-only file. This may be a
|
||||
module provided by the boot loader or a part of a static ROM image. On session
|
||||
construction, a file identifier must be specified as a session argument using the
|
||||
tag 'filename'. The available filenames are not fixed but depend on the actual
|
||||
deployment. On some platforms, core may provide logical files for special memory
|
||||
objects such as the GRUB multiboot info structure or a kernel info page. The
|
||||
ROM session enables the actual read access to the file by exporting the file as
|
||||
dataspace:
|
||||
!Rom_dataspace_capability dataspace();
|
||||
|
||||
|
||||
IO_MEM - memory mapped I/O access
|
||||
=================================
|
||||
|
||||
With IO_MEM, core provides a dataspace abstraction for non-memory parts of the
|
||||
physical address space such as memory-mapped I/O regions or BIOS areas. In
|
||||
contrast to a memory block that is used for storing information of which the
|
||||
physical location in memory is of no matter, a non-memory object has a special
|
||||
semantics attached to its location within the physical address space. Its
|
||||
location is either fixed (by standard) or can be determined at runtime, for
|
||||
example by scanning the PCI bus for PCI resources. If the physical location of
|
||||
such a non-memory object is known, an IO_MEM session can be created by
|
||||
specifying 'base' and 'size' as session-construction arguments.
|
||||
The IO_MEM session then provides the specified physical memory area as
|
||||
dataspace:
|
||||
!Io_mem_dataspace_capability dataspace();
|
||||
|
||||
|
||||
IO_PORT - access to I/O ports
|
||||
=============================
|
||||
|
||||
For platforms that rely on I/O ports for device access, core's IO_PORT service
|
||||
enables fine-grained assignment of port ranges to individual processes.
|
||||
Each IO_PORT session corresponds to the exclusive access right to a
|
||||
port range as specified with the 'io_port_base' and 'io_port_size'
|
||||
session-construction arguments. Core creates the new IO_PORT session
|
||||
only if the specified port range does not overlap with an already existing
|
||||
session. This ensures that each I/O port is driven by only one
|
||||
process at a time. The IO_PORT session interface resembles the
|
||||
physical I/O port access instructions. Reading from an I/O port
|
||||
can be performed via an 8bit, 16bit, or 32bit access:
|
||||
!unsigned char inb(unsigned short address);
|
||||
!unsigned short inw(unsigned short address);
|
||||
!unsigned inl(unsigned short address);
|
||||
Vice versa, there exist functions for writing to an I/O port via
|
||||
an 8bit, 16bit, or 32bit access:
|
||||
!void outb(unsigned short address, unsigned char value);
|
||||
!void outw(unsigned short address, unsigned short value);
|
||||
!void outl(unsigned short address, unsigned value);
|
||||
The address argument of I/O-port access functions are absolute
|
||||
port addresses that must be within the port range of the session.
|
||||
|
||||
|
||||
IRQ - handling device interrupts
|
||||
================================
|
||||
|
||||
The IRQ service of core provides processes with an interface to
|
||||
device interrupts. Each IRQ session corresponds to an attached
|
||||
interrupt. The physical interrupt number is specified via the
|
||||
'irq_number' session-construction argument. A physical interrupt
|
||||
number can be attached to only one session. The IRQ session
|
||||
interface provides a blocking function to wait for the next
|
||||
interrupt:
|
||||
!void wait_for_irq();
|
||||
While the 'wait_for_irq' function blocks, core unmasks the
|
||||
interrupt corresponding to the IRQ session.
|
||||
On function return, the corresponding interrupt line is masked
|
||||
and acknowledged.
|
||||
|
||||
;*Note:* _The interface of the IRQ service is going to be changed_
|
||||
;_with the planed addition of signals to the framework._
|
||||
|
||||
|
||||
RM - managing address space layouts
|
||||
===================================
|
||||
|
||||
RM is a _region manager_ service that allows for constructing address space
|
||||
layouts (_region map_) from dataspaces and that provides support for assigning
|
||||
region maps to processes by paging the process' threads.
|
||||
Each RM session corresponds to one region map. After creating a new RM session,
|
||||
dataspaces can be attached to the region map via:
|
||||
!void *attach(Dataspace_capability ds_cap,
|
||||
! size_t size=0, off_t offset=0,
|
||||
! bool use_local_addr = false,
|
||||
! addr_t local_addr = 0);
|
||||
The 'attach' function inserts the specified dataspace into the region map and
|
||||
returns the actually used start position within the region map.
|
||||
By using the default arguments, the region manager chooses an appropriate
|
||||
position that is large enough to hold the whole dataspace.
|
||||
Alternatively, the caller of 'attach' can attach any sub-range of the dataspace
|
||||
at a specified target position to the region map by enabling 'use_local_addr'
|
||||
and specifying an argument for 'local_addr'. Note that the interface allows for the
|
||||
same dataspace to be attached not only to multiple region maps but also multiple
|
||||
times to the same region map.
|
||||
As the counterpart to 'attach', 'detach' removes dataspaces from the region map:
|
||||
!void detach(void *local_addr);
|
||||
The region manager determines the dataspace at the specified 'local_addr' (not
|
||||
necessarily the start address) and removes the whole dataspace from the region
|
||||
map.
|
||||
To enable the use of a RM session by a process, we must associate it with
|
||||
each thread running in the process. The function
|
||||
!Thread_capability add_client(Thread_capability thread);
|
||||
returns a thread capability for a _pager_ that handles the page faults of the
|
||||
specified 'thread' according to the region map.
|
||||
With subsequent page faults caused by the thread, the address-space layout
|
||||
described by the region map becomes valid for the process that is executing the
|
||||
thread.
|
||||
|
||||
|
||||
CPU - allocator for processing time
|
||||
===================================
|
||||
|
||||
A CPU session is an allocator for processing time that allows for the creation,
|
||||
the control, and the destruction of threads of execution.
|
||||
There are no session arguments used.
|
||||
The functionality of starting and killing threads is provided by two functions:
|
||||
!Thread_capability create_thread(const char* name);
|
||||
!void kill_thread(Thread_capability thread_cap);
|
||||
The 'create_thread' function takes a symbolic thread name (that is only used
|
||||
for debugging purposes) and returns a capability to the new thread.
|
||||
Furthermore, the CPU session provides the following functions for operating
|
||||
on threads:
|
||||
!int set_pager(Thread_capability thread_cap,
|
||||
! Thread_capability pager_cap);
|
||||
|
||||
!int cancel_blocking(Thread_capability thread_cap);
|
||||
|
||||
!int start(Thread_capability thread_cap,
|
||||
! addr_t ip, addr_t sp);
|
||||
|
||||
!int state(Thread_capability thread,
|
||||
! Thread_state *out_state);
|
||||
The 'set_pager' function registers the thread's pager whereas 'pager_cap'
|
||||
(obtained by calling 'add_client' at a RM session) refers to the RM session to
|
||||
be used as the address-space layout.
|
||||
For starting the actual execution of the thread, its initial instruction
|
||||
pointer ('ip') and stack pointer ('sp') must be specified for the 'start'
|
||||
operation.
|
||||
In turn, the 'state' function provides the current thread state including
|
||||
the current instruction pointer and stack pointer.
|
||||
The 'cancel_blocking' function causes the specified thread to cancel a
|
||||
currently executed blocking operation such as waiting for an incoming message
|
||||
or acquiring a lock. This function is used by the framework for gracefully
|
||||
destructing threads.
|
||||
|
||||
*Note:* _Future versions of the CPU service will provide means to further control the_
|
||||
_thread during execution (e.g., pause, execution of only one instruction),_
|
||||
_acquiring more comprehensive thread state (current registers), and configuring_
|
||||
_scheduling parameters._
|
||||
|
||||
|
||||
PD - providing protection domains
|
||||
=================================
|
||||
|
||||
A PD session corresponds to a memory protection domain. Together
|
||||
with one or more threads and an address-space layout (RM session), it forms a
|
||||
process.
|
||||
There are no session arguments. After session creation, the PD contains no
|
||||
threads. Once a new thread has been created from a CPU session, it can be assigned
|
||||
to the PD by calling:
|
||||
! int bind_thread(Thread_capability thread);
|
||||
|
||||
|
||||
CAP - allocator for capabilities
|
||||
================================
|
||||
|
||||
A capability is a system-wide unique object identity that typically refers to a
|
||||
remote object implemented by a service. For each object to be made remotely
|
||||
accessible, the service creates a new capability associated with the local
|
||||
object. CAP is a service to allocate and free capabilities:
|
||||
! Capability alloc(Capability ep_cap);
|
||||
! void free(Capability cap);
|
||||
The 'alloc' function takes an entrypoint capability as argument, which is the
|
||||
communication receiver for invocations of the new capability's RPC interface.
|
||||
|
||||
|
||||
LOG - debug output facility
|
||||
===========================
|
||||
|
||||
The LOG service is used by the lowest-level system components such as the init
|
||||
process for printing debug output.
|
||||
Each LOG session takes a 'label' string as session argument,
|
||||
which is used to prefix the debug output of this session.
|
||||
This enables developers to distinguish multiple producers of debug output.
|
||||
The function
|
||||
! size_t write(const char *string);
|
||||
outputs the specified 'string' to the debug-output backend of core.
|
||||
|
||||
|
||||
Process creation
|
||||
################
|
||||
|
||||
The previous section presented the services implemented by core.
|
||||
In this section, we show how to combine these basic mechanisms to create and
|
||||
execute a process.
|
||||
Process creation serves as a prime example for our general approach to first
|
||||
provide very simple functional primitives and then solve complex problems using
|
||||
a composition of these primitives.
|
||||
We use slightly simplified pseudo code to illustrate this procedure.
|
||||
The 'env()' object refers to the environment of the creating process, which
|
||||
contains its RM session and RAM session.
|
||||
|
||||
:Obtaining the executable ELF binary:
|
||||
|
||||
If the binary is available as ROM object, we can access its data by creating
|
||||
a ROM session with the binary's name as argument and attaching its dataspace
|
||||
to our local address space:
|
||||
!Rom_session_capability file_cap;
|
||||
!file_cap = session("ROM", "filename=init");
|
||||
!Rom_dataspace_capability ds_cap;
|
||||
!ds_cap = Rom_session_client(file_cap).dataspace();
|
||||
!
|
||||
!void *elf_addr = env()->rm_session()->attach(ds_cap);
|
||||
|
||||
The variable 'elf_addr' now points to the start of the binary data.
|
||||
|
||||
:ELF binary decoding and creation of the new region map:
|
||||
|
||||
We create a new region map using the RM service:
|
||||
!Rm_session_capability rm_cap;
|
||||
!rm_cap = session("RM");
|
||||
!Rm_session_client rsc(rm_cap);
|
||||
Initially, this region map is empty.
|
||||
The ELF binary contains CODE, DATA, and BSS sections.
|
||||
For each section, we add a dataspace to the region map.
|
||||
For read-only CODE and DATA sections, we attach the corresponding ranges of
|
||||
the original ELF dataspace ('ds_cap'):
|
||||
!rsc.attach(ds_cap, size, offset, true, addr);
|
||||
The 'size' and 'offset' arguments specify the location of the section within
|
||||
the ELF image. The 'addr' argument defines the desired start position at the
|
||||
region map.
|
||||
For each BSS and DATA section, we allocate a read-and-writeable RAM dataspace
|
||||
!Ram_dataspace_capability rw_cap;
|
||||
!rw_cap = env()->ram_session()->alloc(section_size);
|
||||
and assign its initial content (zero for BSS sections, copy of ELF DATA sections).
|
||||
!void *sec_addr = env()->rm_session()->attach(rw_cap);
|
||||
! ... /* write to buffer at sec_addr */
|
||||
!env()->rm_session()->detach(sec_addr);
|
||||
After iterating through all ELF sections, the region map of the new process
|
||||
is completely initialized.
|
||||
|
||||
:Creating the first thread:
|
||||
|
||||
For creating the main thread of the new process, we create a
|
||||
new CPU session from which we allocate the thread:
|
||||
!CPU_session_capability cpu_cap = session("CPU");
|
||||
!Cpu_session_client csc(cpu_cap);
|
||||
!Thread_capability thread_cap = csc.create_thread();
|
||||
When the thread starts its execution and fetches its first instruction, it
|
||||
will immediately trigger a page fault. Therefore, we need to assign a
|
||||
page-fault handler (pager) to the thread. With resolving subsequent page faults, the
|
||||
pager will populate the address space in which the thread is executed with
|
||||
memory mappings according to a region map:
|
||||
!Thread_capability pager_cap = rsc.add_client(thread_cap);
|
||||
!csc.set_pager(thread_cap, pager_cap);
|
||||
|
||||
:Creating a protection domain:
|
||||
|
||||
The new process' protection domain corresponds to a PD session:
|
||||
!Pd_session_capability pd_cap = session("PD");
|
||||
!Pd_session_client pdsc(pd_cap);
|
||||
|
||||
:Assigning the first thread to the protection domain:
|
||||
|
||||
!pdsc.bind_thread(thread_cap);
|
||||
|
||||
:Starting the execution:
|
||||
|
||||
Now that we defined the relationship of the process' region map, its main
|
||||
thread, and its address space, we can start the process by specifying the
|
||||
initial instruction pointer and stack pointer as obtained from the ELF
|
||||
binary.
|
||||
!csc.start(thread_cap, ip, sp);
|
||||
|
||||
; supplying the parent capability to the new process
|
||||
|
||||
|
||||
Reference in New Issue
Block a user