|
From http://www.agorics.com/Library/Joule/bouncer.html:
Proving Properties of
Instruction-Level Programs
Preface
Copyright 1995 Agorics, Inc.
While Agorics, Inc. reserves all copyrights, the information in this document
is to be considered public information and is available for use without
restriction. Agorics disclaims any warranty as to the utility, accuracy
or effectiveness of the information contained in this document and specifically
disclaims any liability for consequential damages that may result directly
or indirectly from use of the information in this document.
DRAFT
Many programming languages reduce each module of program code to a set of
assembly language-level instructions (bytecodes or native assembler instructions).
When the time comes to load a given module, its instructions must be checked
to make sure they will not violate certain properties of the higher-level
language, such as memory boundaries. This paper presents a technique for
implementing such load-time checking, as exemplified by the Bouncer; component
of Joule.
While Joule applies this bouncing technique to bytecodes, the mechanism
is as applicable to other software-implemented instruction sets and to native
machine instructions. The technique is equally appropriate for strongly
typed languages such as Ada, Pascal, Modula 3, C, and C++, and for latently
typed languages such as Smalltalk, Scheme, and Joule.
The "bouncing" technique described here is not new, but is rather derived
straightforwardly from previous work, including:
- type state checking, by which a variable's value can be statically
determined as unreferenced and therefore deletable.
- abstract interpretation, in which properties of programs (deadlock
freedom, computational complexity, etc.) are computed by interpreting
instructions using only abstract properties of the actual data values.
- a machine-code verifier created by L. Peter Deutsch for the 940 operating
system to verify security properties of instruction-level programs.
This document contains the following sections and subsections:
1.1. About Modules
1.2. The Bouncer as Load-Time Checker
1.2.1. Bouncer Rules
1.2.2. Bouncer State
1.2.3. Control-Flow Checking
1.2.4. Representation Type Checking
1.3. Summary
1.4. Acknowledgments
1.1 About Modules
In this paper, a "module" is the smallest unit of compiled code that can
be loaded into a runtime environment. In the Joule language, this unit of
code corresponds to the Module form. In Joule, a module
of code is reduced to bytecodes by the compiler.
1.2 The Bouncer as Load-Time Checker
The Bouncer is a program that statically checks each module before it is
loaded, rejecting any module that does not pass specific static checks.
The name "Bouncer" is inspired by the doorman of a nightclub who keeps out
unsavory characters.
1.2.1 Bouncer Rules
The Bouncer maintains internal state which changes as it processes each
instruction. First, it checks each instruction to make sure the preconditions
for that instruction have been established by prior instructions in the
module. Then it alters its state to reflect the mutating impact of the instruction,
and finally tests to make sure its own invariants are intact.
Thus, for each type of instruction, a set of rules have been specified for
comparing the state of the abstract machine against known requirements.
The rules for checking the preconditions of an instruction are called its
"test" rules. The rules for checking the state-mutating impact of the instruction
are called its "mutate" rules.
In this way, the Bouncer protects Joule's memory safety and security. A
module that fails to satisfy the rules is rejected, or "bounced." Acceptable
code is passed by the Bouncer to be interpreted or translated by other components
of the Joule abstract machine.
The Bouncer does not attempt to prove the correctness of every possible
correct program, which is theoretically impossible. Rather, it proves a
verifiable subset of programs and rejects all others. In this respect, it
is similar to type checking in languages such as Pascal. No loss of expressive
power results from this constriction of possibilities, though the set of
alternative implementations of a given solution may be reduced.
1.2.2 Bouncer State
The Bouncer state machine has a set of static registers. These exist at
bounce time, not at run time; they represent the state of what the Bouncer
knows about the actions of the program code before and after each instruction.
Some of the static Bouncer registers are:
- pc--The current value of the Joule abstract machine's program
counter.
numLocals--The number of local variables allocated in the current
activation frame.
writable[numLocals]--An array of booleans indicating, for each
local variable, whether it is legal to assign it.
types[numLocals]--An array of static types, providing knowledge
about the runtime representation type ("uninitialized", "tagged pointer",
"facet pointer", etc.) of each local variable of the current Activation.
The actual local variables are in the array locals[numLocals],
which exist only at run time, not at bounce time.
In these terms, the instruction
OP_MOVE (3, 5)
means to move the value in locals[5] into the position locals[3]. It is
subject to this set of rules:
Test writable[3]
Mutate types[3] := types[5];
This tests that local variable number three can be assigned to, and specifies
how to modify the typestate used to test the next instruction. All of the
array indexing is implicitly bounds checked, so if numLocals is less
than 6, this load attempt is also bounced.
1.2.3 Control-Flow Checking
First, the Bouncer derives a model of the state of the abstract machine
before and after each instruction in the module being checked that is consistent
with all forks and joins called for by the instructions.
1.2.3.1 Forking
A program fork results from the evaluation of some Boolean expression, which
may be true, false, or a pointer to some other value.
The instruction that implements forking,
OP_TEST_IF (cond, onFalse, onFail)
falls through to the next instruction if locals[cond] is primitively
true, branches to onFalse if locals[cond] is primitively
false, and branches to onFail otherwise. In the Bouncer, this means
that we propagate the current Bouncer state to all three locations.
1.2.3.2 Joining
Following a fork, it may be necessary to interpret the re-joining of the
two execution paths into a single consistent state.
Figure 1: Joining -- L and R represent the two (possibly
inconsistent) prior states of the Bouncer; J represents the resulting
state.
Treating this join as if it were an instruction, the Bouncer applies these
rules (where the subscripts L and R designate the static
values in each of the two prior states):
Test numLocalsL== numLocalsR
Mutate for (i = 0; i < numLocals; i++) {
if (writableL[i] == writableR[i]) {
writableJ[i] := writableL[i];
} else {
writableJ[i] := FALSE;
}
if (typesL[i] == typesR[i]) {
typesJ[i] := typesL[i];
} else {
typesJ[i] := VOID;
}
}
That is, if the two prior states agree on the type of a slot, the slot is
considered to be of that type in the state representation J; if not,
the slot is considered uninitialized, and cannot be read by code following
the join until it is assigned.
1.2.3.3 Looping
If any loops occur in the module, the instructions within the loop again
need to be checked against the Bouncer state as it exists at the time of
the backward jump. If multiple possible states loop back to the same point,
they are treated as joins.
The proliferation of states resulting from multiple forks and joins is controlled
because inconsistent slots collapse to "uninitialized" when joined. This
means that, on any iteration through the module, there will be strictly
fewer joins required than on the previous iteration, because any inconsistencies
previously encountered have collapsed the affected slots to "uninitialized".
The Bouncer inevitably either rejects the program for being unsafe or converges
to a single consistent model describing the state of the Joule abstract
machine prior to each instruction in the module. This representation must
converge to a fixed-point state of the Bouncer because the collapse of inconsistent
slots to "uninitialized" is monotonic, and the number of slots is finite.
1.2.4 Representation Type Checking
There are two Bouncer phases. After the Bouncer has derived a single consistent
state history of the Joule abstract machine based on the control-flow checking,
the Bouncer then passes through the instructions of the module and applies
the test and mutate rules to each, based on the state arrived at in the
control-flow check for each instruction. If any instruction generates an
unacceptable state of the Bouncer by failing to conform to the rules, the
module is rejected and an exception is signaled.
1.3 Summary
The smallest unit of program code that can be loaded is called a module.
Each module consists of a set of assembly-level instructions, which may
be either bytecodes (as in Joule) or native instructions.
When a module is loaded, its instructions must be checked to make sure they
will not violate certain properties of the higher-level language, such as
memory boundaries. This paper presented a technique for implementing such
load-time checking, using the Bouncer component of the Joule abstract machine
as an example.
While Joule applies this bouncing technique to bytecodes, the mechanism
is equally applicable to native assembler instructions. The technique can
be extended to verify more sophisticated properties, such as statically
known reference counts, static array bounds checking, type structure for
objects in the heap, and real-time constraints.
1.4 Acknowledgments
Future drafts will include the many references appropriate to credit the
early development of these ideas. |
|