2010/04/30

Lexical issues

Loki requires a systematic means of mapping Java identifiers to Beta.  This must provide each Beta identifier with a unique Java mapping, while still providing a rich space of Java identifiers reserved to the Loki implementation.  Some reserved identifiers belong directly to the implementation; others are generated by the Loki compiler from Beta source.  These must not overlap.

The general principle is that every Beta identifier is translated into an identical Java identifier as far as possible.  Thus the mapping for the identifier "Beta" is "Beta", and for the identifier "Java" is "Java".  However, to provide the space of reserved identifiers, the following systematic change is made:  Any Beta identifier ending with an underscore has an additional underscore appended.  Thus the mapping for
"Beta_" is "Beta__".

As a result, every mapped Beta identifier ends either with zero underscores or with two or more underscores.  Every Java identifier ending with a single underscore is reserved to the implementation.  Of these, those ending with a Latin capital letter are reserved for compiler-generated code.  Thus "Beta_" would be reserved for the implementation, and "BetaA_", "BetaB_", ...  "BetaZ_" would be reserved for generated code.  Each such capital letter represents a distinct group of generated names:  thus, "TestM_" would represent the name of the method associated with the name "Test", if "M" were the suffix reserved for method names.  (It is.)

Beta identifiers are case-blind, whereas Java identifiers are case-sensitive.  It is necessary, therefore, to have a systematic case rule when generating Java identifiers.  Since Beta programmers may use casing to convey human-readable information, the rule is to preserve the casing that appeared when the Beta identifier was defined (any language construct where the identifier is followed by a colon).  For comparison purposes within Loki, however, the identifier is stored in a fully lowercase format, since Unicode provides many lowercase letters without an uppercase equivalent.

The following (case-blind) Beta identifiers, plus any identifier ending in "exception" or "error", need to be given an idiosyncratic mapping, since they have uses within Java that cannot or should not be overridden.  These are the defined keywords and literals of Java, the classes of package java.lang, and the methods of class java.lang.Object.  The mapping is to append "\u03b2_" to them.  (This \u03b2 is GREEK SMALL LETTER BETA.)

abstract         false            notifyall        synchronized
boolean          final            null             system
break            finalize         number           thread
byte             finally          object           threaddeath
case             float            package          threadgroup
catch            getclass         private          throw
char             goto             process          throwable
character        hashcode         protected        throws
class            implements       public           tostring
classloader      import           return           transient
clone            instanceof       runnable         true
cloneable        int              runtime          try
compiler         integer          securitymanager  void
const            interface        short            volatile
continue         long             static           wait
default          math             string           while
double           native           stringbuffer
equals           new              super
extends          notify           switch 

No comments:

Post a Comment