2010/04/30

Simple Attributes

Each Beta pattern declares zero or more attributes, which correspond to class members in other OO programming languages, but are more general. This paper describes the Java implementations of enclosed patterns and static and dynamic references. Variable-pattern declarations, repetitions, and virtual patterns are discussed elsewhere.

A pattern declared within another pattern is represented by a Java inner class. Thus the Beta syntax:

Foo: (#
  Bar: (# ... #)
#)

is represented as:

public class Foo {
  public class Bar {
  ...
  };
}

The Beta superpattern, if present, is declared as a Java superclass. Note, however, that Beta patterns without a specified superpattern do not extend java.lang.Object, but rather the Beta pattern called Object (which, considered as a Java class, extends java.lang.Object like all other Java classes). This must be explicitly declared in the Java code.

Static and dynamic references are implemented as instance variables. Unlike other implementations of Beta, static references are not physically incorporated into the pattern to which they belong. (At least, not as far as Java source code is concerned. Java compilers are hypothetically free to do so if they can prove that no user-visible changes result.) The only difference in the two Java declarations is that static references are declared final, which tells the Java compiler that their values never change. The value is automatically initialized by an appropriate expression (see below). Dynamic references do not have initializers, and are automatically set by Java to null (the Java equivalent of "none").

The exact form of the initializer of a static expression depends on the form of the ObjectDescriptor that declares it. If there is no superpattern, then the form is new beta.Object() { ... } where ... is the compiled content of the (# ... #), a Java anonymous class. If there is a superpattern, and it is a local (non-remote) reference, then new PatternName() { ... } results. However, if the superpattern is remote, then Java requires a more complex construction. The Beta form:

Foo: (#
  Bar: A.B.C (# ... #)
#)

appears as:

public class Foo {
  public A.B.C Bar = new A().new B().new C() { ... };
}

The form "new A.B.C()" will not compile.

If there is no MainPart of the ObjectDescriptor (no (# ... #) portion), then the class contents within the braces, as well as the braces themselves, are simply omitted.

These rules for instantiation are required not only for static attributes, but also for explicit instantiation with the "& operator, and for implicit instantiation (a so-called "inserted object").

Lexical issues

Loki requires a systematic means of mapping Java identifiers to Beta.  This must provide each Beta identifier with a unique Java mapping, while still providing a rich space of Java identifiers reserved to the Loki implementation.  Some reserved identifiers belong directly to the implementation; others are generated by the Loki compiler from Beta source.  These must not overlap.

The general principle is that every Beta identifier is translated into an identical Java identifier as far as possible.  Thus the mapping for the identifier "Beta" is "Beta", and for the identifier "Java" is "Java".  However, to provide the space of reserved identifiers, the following systematic change is made:  Any Beta identifier ending with an underscore has an additional underscore appended.  Thus the mapping for
"Beta_" is "Beta__".

As a result, every mapped Beta identifier ends either with zero underscores or with two or more underscores.  Every Java identifier ending with a single underscore is reserved to the implementation.  Of these, those ending with a Latin capital letter are reserved for compiler-generated code.  Thus "Beta_" would be reserved for the implementation, and "BetaA_", "BetaB_", ...  "BetaZ_" would be reserved for generated code.  Each such capital letter represents a distinct group of generated names:  thus, "TestM_" would represent the name of the method associated with the name "Test", if "M" were the suffix reserved for method names.  (It is.)

Beta identifiers are case-blind, whereas Java identifiers are case-sensitive.  It is necessary, therefore, to have a systematic case rule when generating Java identifiers.  Since Beta programmers may use casing to convey human-readable information, the rule is to preserve the casing that appeared when the Beta identifier was defined (any language construct where the identifier is followed by a colon).  For comparison purposes within Loki, however, the identifier is stored in a fully lowercase format, since Unicode provides many lowercase letters without an uppercase equivalent.

The following (case-blind) Beta identifiers, plus any identifier ending in "exception" or "error", need to be given an idiosyncratic mapping, since they have uses within Java that cannot or should not be overridden.  These are the defined keywords and literals of Java, the classes of package java.lang, and the methods of class java.lang.Object.  The mapping is to append "\u03b2_" to them.  (This \u03b2 is GREEK SMALL LETTER BETA.)

abstract         false            notifyall        synchronized
boolean          final            null             system
break            finalize         number           thread
byte             finally          object           threaddeath
case             float            package          threadgroup
catch            getclass         private          throw
char             goto             process          throwable
character        hashcode         protected        throws
class            implements       public           tostring
classloader      import           return           transient
clone            instanceof       runnable         true
cloneable        int              runtime          try
compiler         integer          securitymanager  void
const            interface        short            volatile
continue         long             static           wait
default          math             string           while
double           native           stringbuffer
equals           new              super
extends          notify           switch 

Character Sets

The character set used for Java programming is Unicode.  The Mjolner compiler allows only ASCII characters.  The ASCII character set is embedded in Unicode, as is the Latin-1 (ISO 8859-1) character set, so that upward compatibility is maintained.  This allows non-English-speaking programmers to write identifier names, comments, and text strings belonging to their own languages.  Loki will extend this ability to Beta programmers as well.  A trivial change to the Mjolner compiler (not involving extending it to Unicode!) will permit easy interchange between Mjolner and Loki Beta programs.

The Java compiler accepts programs in one of two transformation formats: UTF-8 and Unicode escape mode.  Both of these have the useful property that ASCII characters are represented by themselves, so that ASCII-only programs are immediately compatible.  UTF-8 is sufficiently documented elsewhere , and I will simply say that Loki will accept it.

Unicode escape mode is more interesting.  Most characters outside the ASCII range is represented by an escape sequence "\uxxxx" where "xxxx" is four hexadecimal digits.  (Some characters are represented by two consecutive escape sequences.)  These sequences are interpreted immediately on reading in the source code, and thus they may be used anywhere: in identifiers, comments, or strings.  It is legal in Java to use values of "xxxx" that represent an ASCII character (0000-007f), but I propose to forbid this usage in Beta code.  To the Java compiler, "\u002c" is equivalent to a comma in every way: it can be used to separate arguments in a method call or for any other purpose.  This usage makes for nothing but confusion to the reader.

Introduction

The purpose of the Loki Project, which this blog documents, is to provide a compiler and support classes for translating Beta into the Java language.  Although Beta and Java are very different languages in syntax as well as semantics, Java is flexible enough that compiling Beta into it does not seem an utterly impossible proposition.

The Loki Papers, when complete, will serve as a specification of such a compiler.  Obviously, there's more than one way to do it:  these papers reflect only my personal views.  When I say "Loki does this" or "Loki will have that", that is only for conciseness.

I do not have the resources to actually write and deliver Loki to the astonished world.  I wish I did.  However, I am hoping that someone with the necessary time will be able to make a stab at a real implementation.  Such an effort should be open source, available under either the GNU Public License or one of the other standard licenses.  Obviously, a closed-source implementation is also possible, and I cannot prevent it.

I will assume a fair knowledge of Java.