ERights Home data / serial / jhu-paper 
Back to: Persistence and Upgrade On to: Conclusions

Related Work


*** incoherent notes here to the end of this file. Do Not Read ****

Mozart / OZ

[Duchier98]

All data structures reside in an abstract store, whose states take the form of a directed graph. The nodes of the graph represent entities like integers, records, variants (obtained by constructors), assignable references cells, procedures (i.e., closures), and operations (i.e., built-in procedures). Procedures are nodes whose departing links point to lexically bound nodes. For instance, the SML expression let val x = fn y => z(y) in ... end will bind x to a procedure node with one departing link that points to the node bound to z. Given an execution state, the data structure associated with a node x in the abstract store is the maximal subgraph of the abstract store that is reachable from x. We distinguish between two types of operations (i.e., built-in procedures). Global operations have the same semantics in every VM (e.g., addition of numbers or creation of threads). Local operations affect the resources of a particular VM and are hence tied to a particular VM. All operations that are available through the base environment at compile time are global. Access to local operations can only be obtained through system modules, which are tied to the VM in which they exist. A data structure in a VM can be pickled if and only if it does not contain futures or local operations. By excluding futures, we make sure that they are not cut off from the threads that are supposed to eliminate them. By excluding local operations, we ensure that loading a pickle will not create a data structure that captures local operations of the loading VM. This is an essential security property of our model [...]. If the pickle operation encounters a future, it requests it and blocks. Once the future is eliminated it resumes. This way pickling is compatible with lazy linking. If the pickle operation encounters a local operation, it raises an error exception. Next we make precise what we mean by a clone of a data structure. Let x be the original data structure and y be a clone of x. Then the graphs reachable from x and y must be graph isomorphic with respect to their roots x and y. Moreover, the reference cells reachable from x must all be different from the reference cells reachable from y. Finally, if we redirect all external links into the graph rooted by x to the respective nodes of the graph rooted by y, no difference must be observable when the VM proceeds.

Cardelli, Vijay

Sun, NeXT, Kaleida Labs, Waterken

Rees' uneval

XMLEncoder

javadoc jdj article sun article

java.beans.Expression

JOSS

 

In JOSS, implementing Serializable declares that one's state should be divulged to all serializers, independent of whether one is transparent in general. We can say that such an object is transparent to all serializers. To be transparent to all serializers without being transparent in general makes sense only if all serializers are only wielded by objects that are universally more trusted than are normal clients. While this may make sense for supporting applets, since we are trying to support mutual suspicion, our idealized model instead only distinguishes transparent and opaque.

 

  • A non-scalar object is a combination of state and behavior. In a conventional object language like Smalltalk or Java, the behavior corresponds to a class, and the state of an object consists of a map from the name of each instance variable to the corresponding value of that variable. (The obvious optimization -- based on knowing that all instances of the same behavior will have the same instance variable layout -- is left as an exercise for the reader.)

    In E, the behavior corresponds more closely to a lambda expression, but for present purposes these can be mapped to Java classes with no loss, so we proceed assuming this mapping. The state of an E object is based on its free variables -- variable names used within the behavior but defined in the scope enclosing the behavior. This is the lambda-equivalent of instance variables. This state is a mapping from the name or mangled name of each of free variable to the corresponding state of that variable. When the variable is defined to be final (using "def"), then the association is from the name to the value. When the variable is potentially mutable (defined using "var"), then, since it may be shared, the association is from a mangle of the variable name to the Slot object containing the current value of the variable.

     

    To accommadate schema evolution, about which more below, if the state maps from names unused by the behavior, those associations are ignored. If the state is missing some needed associations, by default a behavior will complain by throwing an exception (which will terminate unserialization). But a clever behavior may have means to cope with this situation.

  • uneval(behavior) => <import:fully-qualified-name>

    Following JOSS, in order to accommodate schema evolution, covered below, we don't actually serialize the behavior itself. Rather, we serialize a expression containing an identifier that corresponds to the behavior. We ignore here JOSS's serialVersionUID, and assume the fully qualified name is used instead. Therefore, the expression we serialize is simply one that imports this behavior from the import__uriGetter. Crucially, only safe classes may be imported by the import__uriGetter, so the result of evaling / unserializing such an expression will be born with any authority. (This restriction is essential, but is not yet implemented.) If the evaling / unserializing environment cannot find a behavior of the specified name, unserialization fails.

    (In Java, the identity of a class corresponds to a pair of a fully qualified name and a ClassLoader. However, JOSS serializes the class's identifier but not its ClassLoader. This may be fine when the identifier is a serialVersionUID, but when it is a fully qualified name, our idealization of JOSS is failing to represent different behaviors by different expressions. We could repair this by explicitly representing the Loader as well. In fact, the import__uriGetter is a Loader, very much like a ClassLoader. Once we've presented the Gordian Scalpel, we will have enough machinery to consider serializing Loaders as well.)

    E's syntax discourages but does not prohibit the creation of anonymous behavior definitions -- the moral equivalent of Java's anonymous inner classes, Scheme's anonymous closures, or Smalltalk's BlockClosures. No system we are aware of supports the schema evolution of any such anonymous objects. (Unfortunately, JOSS acts incoherently in this case, rather than rejecting it.) We believe that such support is absent because it is a bad idea: since any name automatically assigned doesn't have any programmer-aware stability across program versions. In order to support schema evolution, anonymous behaviors should not be considered to have a fully qualified name valid for use in serialization. Any behavior defined within an anonymous behavior also has no valid fully qualified name, since an invalid name is on its name-path. Attempting to serialize a behavior without a valid fully qualified name must cause serialization to fail.

Putting together some of these elements into a silly example, if p is a point created by the following code in module foo.bar:

var count :int := 0
def makePoint(x :int, y :int) :any {
    count += 1
    def point implements Transparent {
        # ... required self revelation ...
        to getX()     :int { x }
        to getY()     :int { y }

        /** How many points have been created? */
        to getCount() :int { count }
    }
}
def p := makePoint(3, 4)

then uneval(p) would yield the string:

<import:foo.bar$makePoint__C$point__C>.instatiate([
    "x"           => 3,
    "y"           => 4,
    "count__Slot" => settable.makeSlot(1)
])

Given the decision to save data but not code, it seems a bit ironic that the eval technique saves the data by turning it into code.

(This is based on the mapping: E module => top level Java class. E behavior (ie, object definition expression) => Java nested class. Since nothing in E maps onto static methods of these nested classes, we are free to specify that each of these nested classes has the static instantiate method assumed above.)

 

Replacement

JOSS serialization has many customization hooks, but that provided by the replaceObject method seems to be universal. All the other serialization-customizations that concern us can be expressed, perhaps at a loss in efficiency, by overriding only this one method.

In the example shown earlier, we specified the null customization by providing the identity function as the replace function. What if we provide a different function?

? pragma.syntax("0.8")

? def makeSerializer := <elib:serial.makeSerializer>
# value: <makeSerializer>

We define the replace function to be one that replaces an integer with the next higher integer, but acts otherwise as an identity function.

? def sillyReplacer(obj) :any {
>     if (obj =~ i :int) {
>         i + 1
>     } else {
>         obj
>     }
> }
# value: <sillyReplacer>

? def sillySerializer := makeSerializer(sillyReplacer)
# value: <a Serializer>

? def desc := sillySerializer.record([1, 2, 'c', "foo"]); desc.size()
# value: 360

As we see, the reconstruction is just like the original except that each integer, no matter where it may appear, has been replaced by its successor.

This is specified as a mostly trivial variation of our earlier model of serialization.

As a first approximation, whereas previously the serialization cases may have looked like:

def uneval(ref) :any {
    switch (ref) {
        # ... the above cases using ref ...
    }
}

we would instead have

def uneval(originalRef) :any {
    def ref := replace(originalRef)
    switch (ref) {
        # ... the above cases using ref ...
    }
}

We now need to adjust this to form broader equivalence classes for our notion of "having already serialized" a reference. In a call to uneval(A1), uneval will call its replace function which, let's say, returns ARep, which then gets serialized as expression AData. The serializer then remembers both A1 and ARep as having been seen, and as having produced expression AData. If either are seen again, then a variable is generated as before, but this variable is associated with both references. If A1 is novel but ARep has already been seen, then A1 is added to the equivalence pool for the ARep, and a variable reference is generated.

(*** In the case of a previously seen variable, is replace called? Should it be? Replace is called when the original is already transparent. Should it be?)

A serializer can thereby be cutomized to serialize graphs containing opaque objects, so long as the replacement for these objects is transparent. We will make much use of this ability. Traversal proceeds using the fields of the transparent replacement.

By the above steps, the serializer traverses a virtual graph of transparent replacements. This graph is never represented as an actual graph of objects, and often it cannot be. If replacement ARep's instance variable myB points at original B1, and if the replacement for B1 is BRep, then in the virtual replacement graph ARep's myB points at BRep, even though myB may be of a type incompatible with BRep. it is fine for a painter to depict a scene that's not literally possible, so long as its meant for viewers who know how to interpret this depiction -- so long as corresponding unserializers are customized to reconstruct a possible graph.

 

Cycles vs. Resolution

Unfortunately, this simple story faces a terribly hard problem. When the depicted XRep's myY points at the depicted YRep, then we must initialize the evolvedX's myY to point at the appointed resolution of Y, YRes. So long as there are no cycles, JOSS does exactly this, as it should, by instantiating and resolving objects in child-to-parent order. But when there are cycles, JOSS fails in unacceptable ways:

Note - The readResolve method is not invoked on the object until the object is fully constructed, so any references to this object in its object graph will not be updated to the new object nominated by readResolve. [...] Therefore in cases where an object being serialized nominates a replacement object whose object graph has a reference to the original object, deserialization will result in an incorrect graph of objects. [ref JOSS3.6]

When building a system for the coexistence of mutually suspicious objects, any spec that says "under these conditions an incorrect result will be returned" must be read as an opportunity for an adversary. Were they to purposely induce an incorrect result, how might they disrupt the assumptions of others in ways they might exploit? These issues are usually hard enough to think about that it is better to fix the spec to remove the opportunity.

Unfortunately, the above problem seems unavoidable in Java, since a Java reference cannot exist prior to the existence of the object it designates. If evolvedX is to be initialized with YRes, and evolvedY is to be initialized with XRes, and if a fully initialized evolvedX is going to participate in the calculation of XRes, and if a fully initialized evolvedY is going to participate in the calculation of YRes, there would seem to be no order that would satisfy all these constraints.

Our idealization mostly solves the problem by using E's promises to cheat on the notion of "fully initialized". E's promises -- like logic variables in logic programming languages, or futures [ref Actors], or Channels [ref Joule] -- are references which exist, can be passed around, and can be held before it is determined which object they designate. Until their target object is determined, we say that the promise is unresolved. Once the target is determined, the promise is fulfilled and is then equivalent to any normal reference to that target.

When the unserializer, walking from child-to-parent, encounters a depicted reference to an ancestor of the current object, it creates a promise to represent that reference, and uses that promise to initialize that field of the evolved object. Later, when the appointed resolution of the ancestor is produced, the promise is resolved to this appointed resolution. Unfortunately, if the field being initialized is declared to require a non-promise type, which can happen in either Java or E, then unserialization will still fail inappropriately with a thrown exception. But this algorithm will never silently return an incorrect graph, which is the important property for security. The remaining problem simply gives an adversary an opportunity to mount a denial of service attack, which E generally does not claim to defend against.

 
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home data / serial / jhu-paper 
Back to: Persistence and Upgrade On to: Conclusions
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign