Object Transport

Last updated: [98/05/29 Bob]

Introduction

In order to build a distributed system and to provide persistent storage, Neocosm will need an underlying serialization methodology. It is expected that Java serialization will provide all the necessary infrastructure.

Requirements

Object Transport should be as insensitive to version changes in both our code base and the JVM as possible
It should support proxy object semantics (as opposed to make them difficult -- this shouldn't be a problem, since the decision for an object to be a proxy should be a simple one)
If possible, it should be transparent to 'application programmers' -- some simple choice (e.g. inherit from Proxyable and *poof* you're proxied) should be available with appropriate generic behavior. Only a very few (two? -- sturdyrefs undergoing cryptohashing and objects with private fields that never go over the wire?) should require special handling code.

Architecture

There are three kinds of serialization that we know of at this time:

To save an object's state (writing out the vat, saving instantiations in catalogs)
To send an object across the wire (arguments to a message, even instantiating an avatar is arguments to a create message)
To write the immutable parts of an object out (to a byte array or a stream) to generate a cryptohash (sturdyrefs)

There are are also three occasions where una get serialized (hand waving based on the current DObject model and various hallway discussions about future una directions):

In response to a request for the unum to replicate itself
In response to a request for the unum to persist itself
As a side-effect of being included in an E.Send that ended up going through the comm system

(XXX UnumTransport.gif not found)

In each case, the desired result is known, regardless of the stream. The only case where there are platform requirements on the contents of the result of the serialization is the third case, where we expect a "proxy" to be generated. The only requirement on the serialization results of the other requests is that an unum of the same class can use the stream to (re)instantiate itself.

Note Bene: upon further discussion, the third case may very well be 'never do this.' If an unum is always explicitly asked to serialize itself, then it's an invalid for it to be dumped on a random stream at the low-level. This means that vat saves need to explicitly ask each unum to persist itself, rather than 'merely' serialize a collection.

There have been problems in the past with 'hints vectors' and serialization, but with an appropriate rethinking of how hints are handled I believe we can sidestep all of the known issues (adding a 'getHintsVector' to parallel the 'getCerts' call in the repository/resource manager would be one way).

Current Architecture Overview

A strawman proposal for an E object comm system based as closely as possible on the current architecture, as explained in an e-mail from Bill Frantz:

The basic E message consists of: <target> <verb> <args[]> <opt-resolver>. This may also be described as <target> <envelope> or even <instance of ERun>. (See org.erights.e.elib.prim.E)

For the comm system we should plan on creating a subclass of java.io.ObjectOutputStream, call it CommObjectOutputStream. To send an E message to another machine we then call the local proxy for the remote object to receive the message with an <envelope> - proxy.send(envelope).

The proxy has as part of its state, an ObjectConnection which leads to the remote object. The proxy performs:

myObjectConnection.send(myRemoteObjectID, envelope).

The ObjectConnection has as part of its state the VatTPConnection which leads to the remote object. (It also implements a message handler for incoming envelopes.) The ObjectConnection does:

myByteArrayOutputStream.reset();
CommObjectOutputStream coos
= new CommObjectOutputStream(myByteArrayOutputStream);
coos.write...(remoteObjectID);
coos.writeObject(envelope);
myVatTPConnection.sendMsg(myByteArrayOutputStream.toByteArray());

* We may be able to move the allocation of new CommObjectOutputStream out
of the per message loop.

On the receiving end, the message handler for incoming envelopes does:

ByteArrayInputStream bais = new ByteArrayInputStream(message);
bais.read(); // Skip message type code
CommObjectInputStream cois = new CommObjectInputStrean(bais);
...hands waving wildly here get the local object via the proxy mechanism.
Envelope e = (Envelope)bais.readObject();
ERun.deliver(targetObject, e);

I think that's "all" there is to it. (There is the issue of stopping serialization at the leaves, and special line behavior which will be handled by implementing writeObject and readObject methods which test for instance of CommObjectOutputStream in the affected classes.)

The CommObjectBlahStream objects can easily implement our form of compression.

Proposed Architecture Overview

The proposal is to use native Java serialization as of the JDKv1.1 (i.e. we're not going to rely on any 1.2 features). For proxy objects, we will implement a default behavior that the application programmers can use easily (merely by inheriting from Proxyable, for example). The few specific objects that need special handling will be dealt with on a case by case basis, other than proxies, the only definitively known special case is that of sturdyrefs -- we don't want to include the hints vector in the byte array that is used to generate the cryptohash.

For the purposes of simplification, we could assume a superclass that contains only the immutable data, and then before the cryptohash operation the superclass is instantiated (using the copy constructor?) and serialize out the superclass and hash that. This appears (to me, at least) to be less work than writing a custom writeobject that will only be used in a special case (rather than constructing a CryptoHashStream, for example, and checking for that in the custom writeobject for SturdyRefs).

Off the shelf alternatives

Standard Java serialization is available, and should be general enough to support our needs.

Other Design Objectives, Constraints and Assumptions

Lists any special objectives and assumptions of the code e.g. reusability, thread safety, security, performance, use of resources, compatibility with existing code etc. This section gives important context for reviewers

Current implementation

This section should give details of the major classes and interfaces.

Is it JavaDoc'ed?

In many cases, this section can link to JavaDoc output from actual Java classes and interfaces. This saves writing documentation twice (the designers will have to JavaDoc their interfaces anyway). The JavaDoc should be linked into the design document. Chip's JavaDoc style guidelines (XXX file not found) explain how to use JavaDoc effectively.

Examples

Are there examples?

Testing and Debugging

(Optional) Lists any tests and debugging utilities which are to be developed to help test the design (e.g. test classes, trace categories, etc)

Design Issues

Resolved Issues

History of issues raised and resolved during initial design, or during design inspections. Can also include alternative designs, with the reasons why they were rejected

Open Issues

Is Java serialization powerful enough for us to use? Do we currently implement some arcane feature that isn't supported in Java?
How much of the default behavior can we use? Breaking the serialization tree may require us to specify the member variables that get serialized, which will require considerable care with respect to planning for future upgrade strategies. Hopefully Java's upgrade story will work for us too.
Are there any more special cases in our current code that we're not aware of, that won't fit into a strategy that uses Java serialization?
It appears from reading the 1.2 source that simply subclassing ObjectBlahStream will allow us to use compressed representations of int, long, short, float, String etc. The only thing we have to be carefull about with this technique is to use super.writeByte to write our data. The conversion from RtEncoderDataOutputStream will be easy.
The E runtime requires that strings that are used as method selectors be interned. This logic will have to be included somewhere between the byte array, and placing the envelope on the E run queue.