ERights Home data / serial / jhu-paper 
Back to: Manipulating Identity at the Entries On to: Related Work

Persistence and Upgrade

Lessons of Persistence and Upgrade

(*** To be written)

In Praise of Manual Persistence

Computers crash too often. We wish our applications, data, and activities to last much longer. To achieve this, traditionally, programmers had to encode their application's representations twice -- once as a live runtime data structure and once as schema: as a file or database format to be saved to disk. This was the pain of manual persistence. Transparent orthogonal persistence was first conceived as a way to avoid this error-prone redundancy, essentially by masking the crash [KeyKOS, EROS, PJama]. A process running in such a system proceeds essentially as if no crash had happened. Such a process has an easy immortality, and the original persistence problem is solved.

The upgrade problem occurs when we wish our application data to survive a different kind of trauma -- the upgrade of the code the application instantiates. When using manual persistence, the upgrade problem is properly the schema evolution problem -- to design, along with a new release of an application, means for converting its persistent schema from an old representation to the new one. For example, later releases of an editor typically know how to read documents written by earlier releases.

However, if one uses transparent orthogonal persistence instead of manual persistence, then the entire runtime representation becomes the schema that needs to evolve. This amplifies the difficulty of the upgrade problem, often to a fatal extent. Manual persistence provides a source of great leverage for upgrade, and one that's easy to miss: By encoding their representation twice, programmers naturally bring to each encoding those concerns specific to the purpose of that encoding, often without thinking about this dichotomy explicitly.

Do you, Programmer,
take this Object to be part of the persistent state of your application,
to have and to hold,
through maintenance and iterations,
for past and future versions,
as long as the application shall live?

--Arturo Bejar [ref StateBundles]

Normally, we view the design of the runtime representation of our application as the "real" one, and wish the persistent state to be derived from it. But as this quote from Arturo suggests, the kind of commitment one needs to invest in persistent state isn't appropriate for runtime data structures, and shouldn't be. Runtime data structures are often delicate complex machines in motion, with many complex distributed consistency assumptions between the parts, designed to interact efficiently with an ongoing world of users or devices, and encoding meaning in ways that are largely undocumented. The complexity of this runtime world traditionally relies on the program itself staying constant while the process is executing.

By contrast, when programmers design schema for manual persistence, Arturo's question is properly uppermost on their mind. These schema are stable representations designed with little redundancy, few opportunities for distributed inconsistency, with little penalty for inefficient representations, encoding only the essential application state that needs to survive across time, and where this encoding is much more likely to be well documented. Runtime representations emphasize the operational, whereas schema emphasize the declarative.

Once the customers of an application accumulate their own privately held persistent state from this application, such as their own private documents, then Arturo's question becomes unavoidable. To release a new version of an application without losing old customers, one must enable those customers to revive their old state into an instantiation of the new version of the application reliably -- with no per-instance programmer intervention.

Smalltalk, with its easy support for live upgrade, is not a counter-example. This support cannot be made reliable, and is instead designed for programmers-as-customers who know how to recover from inconsistencies.

If the programmers were using only transparent orthogonal persistence to give the application's data long life, then this upgrade problem resembles maintenance on an operational (though suspended) machine whose workings may be largely mysterious. Worse, since upgrades must happen in an automated way on customer data without programmers present, it more closely resembles building an upgrade-robot that will reliably perform this maintenance on any possible state such a machine may be in. With machines of great complexity, the feasible changes will usually only be minor tweaks and adjustments, not major design changes. The difficulty of upgrade will place a severe limit on the speed with which a vendor will be able to improve their program. This kind of persistence indeed provides a process with easy immortality, but only as a living fossil.

If, on the other hand, the programmers were using manual persistence (whether through foresight, habit, or lack of an alternative), then, when they wish to release a new version, the total number of semantically significant cases in the schema should usually be small enough that they can each be thought about carefully, in order to see how to convert its meaning into the closest appropriate meaning in the application's new version. The upgrade-robot arrives with parameterized blueprints (the new version of the program) for building a new running machine (instantiating a new running process). The schema provides the arguments needed to complete the blueprint. The old machine is scrapped and a new machine is freshly built around these arguments.

As another analogy, if the runtime representation is the application instance's phenotype, then the schema is the instance's genotype. Biological evolution works partially because it operates only on the genotype, where a genotype unfolds into the vastly more complex phenotype via the indirect operational process of embryology. Like an ephemeral live process instantiating an application (ie, a vat incarnation), each phenotype operates only from a fixed snapshot of its genotype. Evolution only happens in the transition between generations. While we needn't take these analogies too seriously, they can significantly aid our intuitions.

Having made the greater initial investment in engineering two representations, the programmers using manual-persistence will then be able to improve their application much faster without losing their customers, perhaps overtaking the head start of the harder-to-evolve but faster time-to-market singly represented alternative.

The first step in dealing with the schema evolution problem is to mostly avoid the problem by saving vastly smaller schema.

Persistence and E

*** Basic E orientation, including Vats, distributed objects, live refs and SturdyRefs, and object-capability security.

*** Assumption of per-vat persistence by E computational model.

Manual Revival as Zero-Delta Upgrade

Note that none of the above discussion assumes that transparent orthogonal persistence and largely manual persistence are exclusive options. A system may well use both: transparent orthogonal persistence to mask crashes efficiently [KeyKOS, EROS], and largely manual persistence only when upgrading. A future E-on-EROS may very well operate in this mode. In this scenario, performance need not be a goal of the largely manual system.

In the absence of support for high speed transparent orthogonal persistence, a system may very well use largely manual persistence mechanism for both purposes. Each post-crash revival is then a degenerate zero-delta upgrade: Each revival runs through the upgrade-supporting logic each time, even when no upgrade is actually occurring. The current E, running on Java running on stock OSes, operates in this mode. Performance therefore should be a goal of E's persistence mechanisms, but is not at this time.

Mechanism / Policy Separation

Of course, by definition, anything a program does is automated, so what do we even mean by "manual" persistence? We are not arguing against automation, abstraction, and reuse. Rather the issue is whether to build a primitively provded inescapable comprehensive solution vs. a toolkit of reusable tools from which one can roll one's own solution, or several co-existing ones. When one can design a single solution adequate for the needed range of uses, often one should, as the uniformity of a single comprehensive solution can bring great benefits. When one size doesn't fit all, we should instead turn to the tradition of mechanism / policy separation. A toolkit can serve as the mechanisms out of which one may build a variety of persistence systems embodying a (limited) range of policy choices.

What we mean by "manual" persistence is that the E kernel does not itself provide a primitive persistence system, but rather provides primitive tools out of which persistence systems may be fashioned. The E system as a whole does provide a default persistence system built "manually" from these tools, but this has the status of library code rather than fundamental primitives. Multiple such libraries can coexist, and the default one is in no sense special.

*** incoherent notes here to the end of this file. Do Not Read ****

The tool most central to such a toolkit is a serialization / unserialization system. E currently uses Java's serialization streams, which has a rather flexible and mature set of customization hooks for building streams embodying a wide range of serialization policies. Most of the goals of our toolkit are already achieved by Java's serialization design, so this paper proceeds from there.

Here, we wish to support a range of compromises between the explicitness and separation of concerns of manual persistence vs. the economy of expression provided by automating aspects of persistence. Why a range of compromises? Why not try to find one good compromise and just build that? Because there are too many kinds of persistence policies that plausibly need to be supported.

  • What to save/restore vs. what to reconstruct vs. what to reconnect. (More on this below.)
  • Where to save persistent state? Files? Databases?
  • When to revive saved state? On process (vat) revival, or faulting on-demand?
  • Fail-stop vs. best efforts. When problems are hit, either saving or restoring, should one give up or make due?
  • What is a consistent state, and how does one obtain access such a state? Does such a state include messages in flight?
  • Transactions: When is a saved state a basis for commitment? How does one abort and fall back to a previous state? As separate subsystems asynchronously snapshot, how is consistency recovered when they revive from different times?

    Faced with such a variety, we use the traditional answer: mechanism / policy separation. We have built persistence support into E in two layers:

    A set of building blocks from which an application developer can (within limits) build a persistence system embodying those policies that serve their needs, including a fully manual system if desired. These mechanisms must not allow an unprivileged persistence subsystem from violating any of E's security properties, while allowing for operation that's reasonable for a subsystem holding a given set of authorities.
  • One example persistence system, built only from these building blocks, embodying a set of policy choices that won't be suitable for all applications, but is nevertheless designed to be widely reused.

    7.6. Persistence and Mutual Suspicion


Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home data / serial / jhu-paper 
Back to: Manipulating Identity at the Entries On to: Related Work
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign