Unserialization as Evaluation(This is approximately an unabridged form of the Unserialization as Evaluation section presented previously.) As shown above, unserialization can be thought of, or even implemented as, expression evaluation [ref Rees, XMLEncoder]. A depiction is an expression in some programming language, the unserializer is the eval function, the exit references to be reconnected are free variable references, the values to reconnect them to come from the scope (i.e., environment) provided to eval, and the root of the reconstructed subgraph is the value the expression evaluates to. Serialization is the logically inverse process, in which an uneval function is applied to a root and an unscope, and writes out an expression that, were it evaluated in the corresponding scope, would reconstruct the subgraph. Data-E is the subset of E used for depicting a subgraph as an expression. Ignoring precedence, it consists of the following productions: Fundamental Data-E Constructs
E Syntactic Shorthands generated by deSrcKitSince we use deSrcKit to build the depictions we present for expository purposes, we need to know the shorthands it builds, which are a subset of the shorthands recognized and expanded by E. Going the other way, all E syntactic shorthands, including those below, are recognized by deSrcKit, since it uses the E parser to parse and expand its input. Any valid E expression that expands only into the above Data-E primitives is a valid Data-E expression with the same meaning. Likewise any valid Data-E expression is a valid E expression with the same meaning.
Using several cases together: ? pragma.syntax("0.8") ? def root := [1, root, 1, <import:java.lang.makeStringBuffer>] # value: [1, <***CYCLE***>, 1, <makeStringBuffer>] ? def surgeon := <elib:serial.makeSurgeon>.withSrcKit("de: ") ? def depiction := surgeon.serialize(root) # value: "de: def t__0 := [def t__2 := 1, # t__0, # t__2, # <import:java.lang.makeStringBuffer>]" ? surgeon.unserialize(depiction) # value: [1, <***CYCLE***>, 1, <makeStringBuffer>]
For those familiar with Java, Data-E should be mostly familiar, but with a few important differences:
Data-E is a true subset of E, both syntactically and semantically. Since E is a secure object-capability language, the security issues surrounding evaluation of E programs are already well understood. By using a subset of E in this way, we get to leverage this understanding. Recognizers and BuildersThe following table shows all the current representations of Data-E and the kits that support them.
We have already encountered the first and last row. Like the first row, the middle rows are all depictions -- they contain nothing but portable bits. All the depiction rows are just straightforward engineering, and operate in a canonical way with no policy choices. All expression of policy and mixing of intent occurs on the subgraph row, mostly during recognition.
For all the depiction kits other than deASTKit, what you get out (during recognition) is the same as what you put in (during building). The deASTKit has one special feature of benefit to the readability of the other depictions -- it compacts and simplifies its input a bit, removing unnecessary temporary variables and turning defrecs into defines when it can. ? def cycle := ['a', cycle, 'a', -3] # value: ['a', <***CYCLE***>, 'a', -3] ? def deSubgraphKit := <elib:serial.deSubgraphKit> ? def deSrcKit := <elib:serial.deSrcKit> ? deSubgraphKit.recognize(cycle, deSrcKit.makeBuilder()) # value: "def t__0 := [def t__2 := \'a\', t__0, t__2, def t__4 := -3]" In our first preview example, several temporary variables were defined but none were used. Here, all but t__4 are used. deASTKit removes the definition of t__4 and weakens the defrec of t__2 into a define. ? def deASTKit := <elib:serial.deASTKit> # value: <deASTKit> ? def ast := deSubgraphKit.recognize(cycle, > deASTKit.makeBuilder()) # value: term`defrec(0, # call(import("__makeList"), # "run", # [define(2, 'a'), # ibid(0), # ibid(2), # -3]))` The result is a kind of tree structure known as a term tree, but for present purposes may as well be S-Expressions, XML, or any other adequate system for representing trees of symbols. In this case, we can ignore it, as it is just an intermediate representation on the way to the simplification we're interested in: ? deASTKit.recognize(ast, deSrcKit.makeBuilder()) # value: "def t__0 := [def t__2 := \'a\', t__0, t__2, -3]" This shows the chaining of recognizers and builders, much as one chains Unix filters using pipes. Because deASTKit is particularly useful as a "filter", it provides a convenient wrap(..) method which takes a builder and returns a similar builder, but with deASTKit's simplifications interposed in front of the argument builder: ? deSubgraphKit.recognize(cycle, > deASTKit.wrap(deSrcKit.makeBuilder())) # value: "def t__0 := [def t__2 := \'a\', t__0, t__2, -3]" We make a reasonably compact binary serialization as follows: ? def deBytecodeKit := <elib:serial.deBytecodeKit> # value: <deBytecodeKit> ? (def builder := deASTKit.wrap(deBytecodeKit.makeBuilder()) > def code := deSubgraphKit.recognize(cycle, builder) > code.size()) # value: 34 The variable code now holds a list of 34 bytes. To see what it says, we can disassemble it: ? def deAssemblyKit := <elib:serial.deAssemblyKit> # value: <deAssemblyKit> ? deBytecodeKit.recognize(code, deAssemblyKit.makeBuilder()) # value: "OP_PROMISE # [t__0, t__1] # OP_IMPORT(\"__makeList\") # OP_LIT_CHAR(\'a\') # OP_DEFINE # t__2 # OP_IBID(0) # OP_IBID(2) # OP_LIT_NEGINT(3) # OP_CALL(\"run\", 4) # OP_DEFREC(1) # OP_ROOT # "
or decompile it: ? deBytecodeKit.recognize(code, deSrcKit.makeBuilder()) # value: "def t__0 := [def t__2 := \'a\', t__0, t__2, -3]" We can unserialize by interpreting these instructions: ? deBytecodeKit.recognize(code, deSubgraphKit.makeBuilder()) # value: ['a', <***CYCLE***>, 'a', -3] The Event-Based DEBuilder APIWe see above that we effectively have many converters between different representations of our subgraphs: serialization, unserialization, assembly, disassembly, decompilation, etc. We define each converter by composing a recognizer with a builder for the same reason many compiler suites compose a compiler front-end (lexer, parser, etc) with a compiler back-end (code generator) -- to provide all needed converters with n+m code rather than n*m code. (Or, in our case, 2n rather than n**2.) We do this with the same trick used by many compilers: we define a reusable API which sits between multiple front ends and multiple back ends [ref gcc]. A front-end recognizes patterns in the input representation, and reports by calling this API. A backend implements this API. In response to being called, it builds an output representation. Our reusable API is what Data-E builders all implement, so it defines the interface type DEBuilder. (For present purposes, we ignore the differences between guards and types.) With the DEBuilder API in the middle, we can convert from any of these representations to any other by composing any recognizer with any builder. This provides some novel flexibility. For example, sometimes the need for serialization and for unserialization are not separated in space or time, as when we only want a deep-copy-with-differences operation. In this case we can cut out the middle-man, and hook a subgraph recognizer directly to a subgraph builder, without ever creating an intermediate depiction.DEBuilder is actually a parameterized type defined by the following E code: def DEBuilderOf(Node :Guard, Root :Guard) :Guard { interface DEBuilder { to getNodeType() :Guard to getRootType() :Guard to buildRoot(top :Node) :Root to buildLiteral(value :(int | float64 | char | String)) :Node to buildImport(varName :String) :Node to buildIbid(tempIndex :int) :Node to buildCall(rec :Node, verb :String, args :Node[]) :Node to buildDefine(rValue :Node) :Tuple[Node, int] to buildPromise() :int to buildDefrec(resolverIndex :int, rValue :Node) :Node } }
Node and Root are type parameters of the function DEBuilderOf. When this function is called with two actual type arguments, it returns a DEBuilder type defined in terms of those types arguments. Grammar of Valid Sequences of Data-E Building CallsDEBuilder must deal well both with sequence-based and with tree-based representations. Were we only building sequences, the Nodes returned by the build methods might not matter, but the sequence of calls would matter. In particular, to easily generate instructions for a stack machine, these methods must be called in postfix order (reverse polish). When generating a tree, the sequence might not matter, but the returned and argument Nodes would. To build a node of an AST, one would first build the children. The Nodes returned from those build calls would be the arguments to the call to build their parent. Therefore, since this interface can be used to generate either a tree or a sequence, its clients must obey both constraints, and the builder can make use of either regularity. The allowed sequences of calls are described by the following "grammar":
Where nodeN is the value returned by exprN, and resolverN is one more than the integer returned by promiseN. In other words, if promise0 is from a buildPromise() which returned a 37, then resolver0 would be 38. The 37'th temporary variable will hold a promise, and the 38th will hold the corresponding Resolver. It is the responsibility of the builder to maintain a counter of temporary variable indices, and have buildDefine(..) and buildPromise() allocate and return the next sequential number. It is the responsibility of the caller to feed previous results back in as arguments according to this numbering. Rather than burdening each individual recognizer and builder with the task of checking that its counterparty obeys this grammar, we expect instead to create a transparent validating builder wrapper that both does this checking and forwards these calls to its wrapped builder. We may then code the other recognizers and builders assuming the counterparty is correct. When this assumption in inappropriate, we can interpose the validator. This kind of event-based interface for generating or consuming trees is in a tradition most prominently represented by SAX -- the" Simple API for XML" [ref SAX events]. If DEBuilder is like SAX, then Data-E AST is like DOM and Data-E Source is like XML surface syntax. For tree structured data, or any syntax expressed in BNF, many of these issues are eternal and perpetually reinvented. However, SAX -- and most of the APIs that call themselves event-based -- provide only the sequence of calls (both entry and exit events, allowing both prefix and postfix processing), but not the passing of previous results back in as construction arguments. To build a DOM tree from SAX events, the builder must keep track of the stack itself. A closer precedent is provided by parser generators such as yacc [ref], in which the (building) actions happen in postfix order, and each action is provided with the "semantic value" (our "Node") returned by the actions associated with its child productions. A yacc generated parser calling semantic actions corresponds well to our notion of a recognizer calling a builder. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
|