ERights Home elang 
Back to: Block Structure 1st child: Literal Expressions On to: Semantics of

Kernel-E Reference


(The old PDF document, The Kernel-E Language Reference Manual, is now out of date, but still contains information we have not yet migrated to these pages. It is deprecated but useful.)

The E language is specified in layers. At the bottom is the Kernel-E language. The Kernel-E language is a subset of the regular E language -- every program written in Kernel-E is also a valid E program with the same meaning. The remainder of E's grammar outside the kernel subset is E's sugar (see The E Language Grammer). The semantics of the sugar is defined by canonical expansion to Kernel-E. This expansion happens during parsing -- E parse trees only contain nodes defined by Kernel-E -- so only these are executed by the virtual machine.

To give a semantics of Kernel-E it suffices to write an executable specification of the virtual machine as an interpreter of such parse trees. Following a venerable tail-biting tradition, this chapter presents such an interpreter written in the full E language. (An interpreter written in the same language it interprets is called a Meta-Interpreter.)

Unfortunately, this does cause some circular-definition ambiguity. In Brian Smith's terminology[?], the interpreter absorbs some issues by mapping them onto the same issues in the language in which the interpreter is written. When these are the same languages, this leaves some issues unresolved. For the moment, we resolve these ambiguities only informally in the text. (The bootstrap E interpreter is essentially a transliteration into Java of the interpreter presented here. Since Java's definition does not depend on E, and the source to this bootstrap interpreter is available, this should adequately resolve these circular ambiguities for present engineering purposes.)

Since this chapter is not concerned about surface syntax, the BNF statements of the kernel productions leave out non-structural grammatical detail, such as issues of precedence and associativity. See The E Language Grammar for these.

E Kernel Language Quick Reference Card

In the pseudo-BNF used here, Terminals (tokens emitted by the lexer) are either quoted bold-faced strings, or are names that begin with an upper-case letter. Non-terminals (forms defined by this grammar) are names that begin with a lower-case letter. Square brackets around a form means the appearance of the form is optional. An asterisk suffix means zero or more repetitions of that form, and a plus sign suffix means one or more repetitions.

Since we define our semantics in terms of parse trees (technically, abstract syntax trees), we need to be concrete about their form. As explained here, E is moving towards using Minimal-XML DOM trees as its normal parse tree representation, including for E itself. The Kernel-E DTD is the current draft XML DTD for representing Kernel-E programs. Each of the following pages displays the part of this DTD that pertains to it.

The above paragraph is obsolete, as E is no longer planning to use XML or Minimal-XML for this purpose. As explained here, E is planning to use Term trees instead.

Productions beginning with an upper-case letter correspond to terminals in the grammer, and to Minimal-XML elements containing character data. Productions beginning with a lower-case letter correspond to non-terminals, and to Minimal-XML elements containing other elements.

Kernel-E-Program:
seqExpr
eExpr: any of the following
literalExpr:
Integer | Float64 | Char | String
Noun:
Identifier
slotExpr:
"&" Identifier
assignExpr:
Noun ":=" eExpr
seqExpr:
eExpr ("\n", eExpr)*
matchBindExpr:
eExpr "=~" pattern
defineExpr:
"def" pattern ":=" eExpr
hideExpr:
"{" eExpr "}"
ifExpr:
"if" "(" eExpr ")" "{"
    eExpr
"}" "else" "{"
    eExpr
"}"
escapeExpr:
"escape" pattern "{"
    eExpr
"}" "catch" pattern "{"
    eExpr
"}"
catchExpr:
"try" "{"
    eExpr
"}" "catch" pattern "{"
    eExpr
"}"
finallyExpr:
"try" "{"
    eExpr
"}" "finally" "{"
    eExpr
"}"
callExpr:
eExpr "." Verb "(" eExpr ("," eExpr)* ")"
sendExpr:
eExpr "<-" Verb "(" eExpr ("," eExpr)* ")"
objectExpr:
"/**" docComment
  "*/" "def" (String | "_") auditors behavior

# objectExpr is therefore methodicalExpr | plumbingExpr

  methodicalExpr:
"/**" docComment
  "*/" "def" (String | "_") auditors eScript
  plumbingExpr:
"/**" docComment
  "*/" "def" (String | "_") auditors matcher
pattern: any of the following
simplePattern:
finalPattern | varPattern | ignorePattern
finalPattern:
Noun ":" eExpr
varPattern:
"var" Noun ":" eExpr
ignorePattern:
"_"
suchThatPattern:
pattern "?" eExpr
listPattern:
"[" [pattern ("," pattern)*] "]"
cdrPattern:
listPattern "+" pattern
Helper Productions
auditors:
["implements" eExpr ("," eExpr)*]
eMethod:
"/**" docComment
  "*/" "method" Verb "(" pattern ("," pattern)* ")" ":" eExpr "{"
    eExpr
"}"
matcher:
"match" pattern "{" eExpr "}"
eScript:
"{" eMethod* matcher? "}"
behavior:
eScript | matcher
docComment:
(Text | DocTag)*
Terminals
(in addition to the literals and quoted terminals above)
Verb:
Identifier
Identifier
  (IdentStart)(IdentPart)*
| "_"(IdentPart)+

Since "_" by itself is a keyword

Text
Arbitrary text within a docComment other than DocTags.
DocTag
"@" Identifier Text

Meta-Interpreter Setup

Specification by Meta-Interpreter Enhancement

Defining a computational model that deals with security, upgrade, and debugging in one step is too hard. Rather, we take it in stages. There is a danger of losing security when introducing support for either upgrade or debugging, so we first present -- as the contents of this chapter -- a meta-interpreter for a secure but non-upgradable, non-debuggable E.

In Brian Smith's terminology, this meta-interpreter reifies eval but absorbs apply and capability security. Borrowing some terminology from Jonathan Rees, eval is the means by which an object changes it state and decides what actions it would like to take on the world outside of itself -- it is what happens inside an object. apply is the means by which an object takes such external actions -- it is what happens between objects. By absorbing apply, interpreted subworlds can work transparently with non-interpreted contexts. In E, the apply functionality is provided by the callExpr and sendExpr constructs.

(This is the also basis for the transparent inter-operation of Java and E objects: ELib implements E's inter-object semantics, and it is used both as the E language's runtime library, and directly as a library called by Java programs. Neither caller nor callee knows whether the other is implemented in E or Java.)

This enables us to define upgrade and debugging support as enhanced meta-interpreters, such that we can design and understand the resulting security properties. A real implementation can then provide the behavioral equivalent of allowing (not requiring) code to be run under such an enhanced interpreter. This clearly continues to be a faithful and secure implementation of the semantics specified by the unenhanced interpreter, since one could have run the enhanced interpreter on top of it.

The resulting enhanced interpreter gives us clear answers to the "who is allowed to do-or-see what?" issues needed for debugging securely. Basically, the instantiator of an interpreted subworld holds the only debugging capabilities to that world. For anyone else to have debugging access, they must get it from the interpreter-starter. However, the interpreter-starter can only obtain a debugger's-eye-view on an object if they also have the object. Not surprisingly, this follows KeyKOS discipline rather well. Ironically, these same debugging hooks also enhance security, by providing us a discretion check enabling confinement. (** explain & refer somewhere **)

Name Spaces

  1. Keywords are reserved for use by the grammar, and eventually, perhaps by user-defined macros (syntactic extensions). Each keyword is its own Terminal type, and these are not Identifiers. Some keywords are reserved for future use.

  2. Nouns are E's term for variable names, since they are a way to refer to things. Every use-occurence of a variable name is a nounExpr, and corresponds to a defining-occurence -- a finalPattern or varPattern -- of the same name. The rules for matching nounExpr to a finalPattern or varPattern are purely static.

    In the kernel language, each use-occurence statically corresponds to exactly one defining-occurence by a simple rule: the scope of a finalPattern or varPattern extends textually left-to-right from the point of its definition until the corresponding close curly, except where it is shadowed by an inner definition of the same name. However, the definition of corresponding is a bit tricky. (E's syntactic sugar usually follows the same rule, with the exception of the for-loop, conditional-or expression, and quasi-literal pattern.)

  3. Verbs are E's terms for method or selector names, since they are a way to request actions. The defining-occurence appears in a method definition, and the use-occurence occurs in a call or send expression. The correspondence between the two is many-to-many and not statically determinable.

  4. Behavior-names. For all the elegance of anonymous closures, to my knowledge no existing systems upgrade old instances of anonymous lambda expression to behave according to a newer versions of the lambda expression. How would it figure out which new lambda corresponds to which old lambda? By contrast, behavior-names in E's object and plumbing expressions enable such upgrades. Behavior names are composed from finalPatterns and varPatterns according to the rules in Upgrading Behaviors (XXX need to write).

    The following meta-circular interpreter ignores upgrade and debugging-support issues. Without these issues,

            def foo { ... }

    is equivalent to

            def foo := def _ { ... }

    since the remaining significance of foo is only as a finalPattern (or varPattern). Therefore, the meta-interpreter ignores behavior names.

Meta-Interpreter Skeleton: Evaluation

As with the Lambda Calculus and related languages (most notably Scheme), the heart of computation within an object is the evaluation of an expression in a lexical scope. E's expressions are exactly those kernel parse node types defined by the eExpr production above. In the Lambda Calculus, a scope is an immutable mapping from names (in E, nouns) to values. In the non-debuggable, non-upgradable E presented here, a scope is also an immutable mapping from nouns to values, but these values are termed slots, since they are expected to hold a variable's value. Slots are similar to locations in Scheme's semantics, except that the E programmer can bind whatever object they want to be a noun's slot, and can obtain the slot of an in-scope noun. A primitive mutable slot type is available -- and used by default in the expansion of the sugar language -- enabling classic side effects by assignment.

So altogether, we have four distinct indirections to get from a variable name to the object it designates:

1) Static Correspondence

nounExpr -> finalPattern or varPattern

The statically analyzable correspondence between use-occurences of a variable names (noun expressions) and a defining occurence of the same name (a finalPattern or varPattern). In the kernel language, this strictly follows the left-to-right definer to corresponding close curly rule. The interpreter presented below makes less use of this static analyzability than it probably should, instead managing scopes carefully so that the run-time name/scope uniqueness exactly matches the static correspondence rule.

2) Lexical Loopup

scope[identifier] -> slot

A scope is an immutable mapping from names to slots. Each time an expression is evaluated, a distinct scope is provided, so it will often look up a distinct slot. For example, the equivalent of object's instance variable storage is a scope allocated per-instance that maps instance variable names to slots holding the values. (Note that E has no distinct notion of instance variables. The effect described is an outcome of the semantics presented here.)

3) Mutable State

slot.get() -> reference # reading a variable's value

slot.put(expr) # assignment

When a variable name is used in a noun expression (ther than on the left-side of an assignment), this turns into a slot lookup as described above, followed by asking that slot for its current value. Assignment is turned into a slot lookup, followed by asking the slot to change its value. A slot-expression skips step three, and just returns the slot itself as the value of the expression. Variables can vary precisely because slots can change their value over time.

4) Designation object

reference -> object

The value returned by step 3 shouldn't be thought of as the object itself, but as a reference (or pointer, or capability) to the object. When drawing a heap of objects, we typically depict the objects themselves as circles or blobs, and objects references as arrows. If object Alice points at object Bob, then Alice holds the tail of an arrow whose head is attached to Bob. Since E is a distibuted language, we give these arrows more semantics than usual: in particular, Alice and Bob can be on different machines, in which case the reference will span machines. Such EVENTUAL references have a restricted semantics that reflect the inescapable difficulties of distributed computing -- like partial failure. This is documented in the E Reference Mechanics.


In Lambda Calculus, expression evaluation takes in an expression and a scope, and produces a value. In E, expression evaluation similarly takes in an expression and a scope, but produces an outcome. The three forms of outcome are

  1. Success, in which case eval returns a pair of a resulting value and a resulting scope. The resulting scope is a superset of the input scope, but possibly containing further bindings. (** need to specify shadowing rules **)
  2. Failure, in which case eval throws an object as the exception indicating what problem occured. See catchExpr and finallyExpr.
  3. Escape, as documented in escapeExpr. Both failure and escape are forms of non-local exit.

The following meta-interpreter absorbs the call-return stack discipline, so that returning successfully in the interpreted language is represented by eval returning successfully, and non-local exit in the interpreted language is represented by eval performing the same non-local exit.

The natural object-oriented way to define the eval function would be to distribute it into a set of eval methods defined on each expr-parse-node type. (Indeed, the bootstrap interpreter written in Java does exactly this.) However, this has the wrong extensibility property: we wish to hold the definition of the kernel language fixed over long periods of time, while we also expect to define enhanced semantics for executing E -- especially for debugging and upgrade -- by enhancing this interpreter. Therefore we need to enable multiple interpreters to co-exist in one system, and to allow the others to be defined by incremental modifications to this one. To do this, we write the interpreter as a simple function containing a big switch. Each branch of the switch matches one of the expression types above, and is shown in the corresponding section:

def eval(expr, scope) {
    switch (expr) {
        ...
        match e'...' {
            ...
        }
        ...
    }
}

Meta-Interpreter Skeleton: Pattern Matching

In the Lambda Calculus, defining occurences of nouns only appear in parameter lists. Most programming languages also have a variable declaration/definition construct that can appear in a block of code. Where classically one has defining occurences of variable names, E instead has patterns. Where one classically binds a name to an initial value -- by argument matching or initialization -- E instead pattern matches the pattern against the initial value. When the pattern is the expansion of a simple Identifier, the effect is identical to the classical case. In other words, the classic parameter variable declarations and local variable definitions are degenerate (but common!) forms of E's pattern matching mechanism.

As with eval we define E's pattern matching routine, testMatch, with a global function that dispatches (using E's switch) on the type of the pattern. The Java implementation of the bootstrap interpreter instead distributes the clauses into the Pattern parse-node types.

def testMatch(patt, scope, specimen) {
    switch (patt) {
        ...
        match e'...' {
            ...
        }
        ...
    }
}

The testMatch function takes in a Pattern parse-tree, a scope as the context in which to perform the match, and a run-time value to match it against. The outcomes:

  1. If the match is successful, testMatch returns a one-element array containing a scope derived from the imput scope, but also containing any bindings resulting from the match.
  2. If the match is unsuccessful, testMatch returns null.
  3. Various components of the match may perform a non-local exit (failure or escape), in which case testMatch is so exited.

As we will see below, by returning a one-element array for the success case, rather than just returning the result directly, we enable the use of matchBind expressions in the interpreter to both test for success and bind the result in a compact fashion.

Miscellaneous Interpreter Functions

The mustMatch function is like testMatch, except that an unsuccessful match results in failure rather than returning null, and therefore a successful match can just return the scope directly. If mustMatch returns, it will always return a scope.

def mustMatch(patt, scope, specimen) {
    def [result] := testMatch(patt, scope, specimen)
    result
}

Defining Behaviors: Method & Matcher

Classically, an object is an encapsulated package of state and behavior, and E's semantics reflects this faithfully.

Objects are defined by the object expressions, of which there are two kinds: methodical expressions -- for defining methodical objects, and plumbing expressions, for defining objects that act as message plumbing. An object expression evaluates in a scope to an object that behaves as described by the object expression. For example, the following program in the full E language

def makePoint(x, y) :any {
    def self {
        to getX() :any { return x }
        to getY() :any { return y }
    }
    return self
}

Sample command-line interaction in the context of the above definition:

    ? def pt := makePoint(3, 5)
    ? pt.getX()
    # value: 3

expands into the following Kernel E program:

def Point {
    to run(x y) :any {
        def self {
            to getX() :any { return x }
            to getY() :any { return y }
        }
        return self
    }
}
    ? def pt := makePoint.run(3, 5)
    ? pt.getX()
    # value: 3

In the original program, it seems we've defined Point to be a function. In the expansion, we see that Point is actually an object with a single run method that takes two arguments. After all, kernel-E is a pure object language: all values are objects, and all inter-object interaction (with the exception of equality) is purely by message sending. However, E's sugar streamlines the use of objects to express conventional functional/procedural patterns. run is E's default verb. If left out, it will be supplied in the expansion to the kernel language. An apparent function definition actually defines an object with a run method.

The behavior of Point's run method is to define and return an object (named self within the scope of the run method, and therefore named self to itself). By behavior, we mean this is what Point does in response to a run message of two arguments. This returned object's behavior is similarly described by two methods: getX and getY. (Another tasty bit of sugar is that zero-argument argument lists may be left out of both definition and call. However, you may not leave out both the verb and the argument list.)

This returned object acts like a conventional point object, but where does the state of the object come from? Somehow, x and y are acting like its classical instance variables, but there doesn't seem to be anything special about their declaration. Indeed there isn't. The scope of x and y extends from their definition until the close curly at the end of the run method. The self object expression within this scope can therefore refer to these slots by using their names.

So the state-nouns of an object expression are those nouns used within the object expression that statically correspond (see above) to noun definitions outside the object expression. An object expression evaluates in a scope to an object. The object's state is the subset of this scope containing the object expression's state-nouns. When an object receives a message, it executes a corresponding eMethod or matcher in a scope which is a child of this state.

 
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home elang 
Back to: Block Structure 1st child: Literal Expressions On to: Semantics of
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign