An object is classically considered to be a combination of state and behavior, and this perspective is directly reflected in the Fat Pointer implementation technique. A Fat Pointer designates an E object, and is composed of two pointer-sized machine words rather than one. The first word is always the "pointer-to-Script". The Script holds the various pointers-to-C-functions that constitute the E object's behavior. The second word is data to be interpreted according to the above Script. Fat Pointers can be seen as a unification of the "tagged pointer" technique common to dynamic symbolic language implementations, such as Lisps and Smalltalks, and the "vtable" (or "class pointer") technique common to object languages with run-time polymorphism, such as Smalltalks and C++. In addition, Fat Pointers more directly accomodate multi-faceted objects, an essential programming style for capability-based computation. For a simple language implementation on the scale of ENative or Python, it isn't clear whether Fat Pointers are a net performance win. We haven't measured, but our best guess is that they cause code to run somewhat faster, but data to take more memory. Both effects should be less than a factor of two.
Boxed and Unboxed DataWhen the data word holds the entirety of the E object's state information, we say the E object is unboxed. When the data word is interpreted as a pointer to the state information, we say the E object is boxed. Boxed state is typically heap-allocated.
Historical AnalogsIn distinguishing between the kinds of scalars, and to distinguish whether something is scalar, the Fat Pointer technique clearly corresponds to traditional (Lisps, Smalltalks) tagged pointers. Instead of trying to squeeze a few extra bits into a single machine word, and masking them off to get the data, we use a full word each for the tag and the data. This allows the tag (the Script pointer) to distinguish among vastly more cases, and allows access to both tag and data without masking. Of course, the price we pay is that our pointers are twice as big. As we will see, this less than doubles total memory use. Object languages with run-time polymorphism (Smalltalk, C++) are normally implemented by a heap-allocated record holding both the instance variable state and a pointer to a data structure (vtable or class pointer) representing the shared behavior of all instances of the same class. Since our Fat Pointer's tag-equivalent is now a full word, it can point at this behavior directly, freeing the object's state record from doing so itself. In this analogy, the Script pointer acts like the vtable pointer. Without static type analysis, a conventional vtable-based implementation would always invoke an object by a double indirection: one to get to the object record, and one to get to the vtable. To an object's clients, the vtable is farther away than the data, but the client only ever invokes the vtable, leaving it to the behavior to deal with the data. The Fat Pointer technique saves us both the storage of the vtable pointer in the object record, and the extra indirection. Better, it saves us an indirection in the fast-path, which is where it counts. On modern processors, it can be more informative to count expected cache-line faults than supposed memory accesses. If we can arrange for our Fat Pointers to be 64-bit aligned, then a single cache-line fault will bring both halves of a Fat Pointer into the data cache, whereas the vtable technique requires two faults for the same job. On the other hand, fewer Fat Pointers fit into a cache line, so the net effect is unclear. Facets and Fat PointersIn E, when multiple object expressions appear in the same scope, the resulting objects share (subsets of) the same state, but each implements a different behavior using that state. In the getter & setter example, the shared state is the variable named "value". ENative implements this situation by having a single heap-allocated state-array per instantiation of such a composite, rather than a separate heap-allocated record per object. The Fat Pointer designating the getter object pairs the Script representing the getter behavior with the array representing this instantiation of the composite's state. Similarly, the Fat Pointer designating setter object pairs the setter Script with this same state. A vtable-based implementation would be unable to achieve this economy without resorting to complex techniques, such as that used by C++ multiple inheritance. (Note: the Orbit compiler does this kind of aggregation, but it doesn't look ugly there.) Without resorting to these techniques, a vtable-based implementation would place the vtable into the object-record rather than the pointer. To associate the same state with multiple vtables would require allocating multiple objects. By pairing different behavior with a common state, these two objects also provide differing authority over common state. In capability programming style, one often needs to define and hand out differing authority to common state. Remember that old adage, "an object is a combination of state and behavior"? Conventional object-oriented multiple-instatiation shows the flexibility of combining multiple states with same behavior. Facets show the flexibility of combining multiple behaviors with the same state. Our getter/setter is an example of a multiply-instatiated multiple-facet composite, and so does both in combination. Fat Pointers, by directly reflecting the truth of this adage, supports this implied flexibility of object programming in a simple and direct manner. At level of ambition of implementations this simple, they also support this flexibility is an efficient manner.
|
||||||||||||
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
|