The Java compiler accepts programs in one of two transformation formats: UTF-8 and Unicode escape mode. Both of these have the useful property that ASCII characters are represented by themselves, so that ASCII-only programs are immediately compatible. UTF-8 is sufficiently documented elsewhere , and I will simply say that Loki will accept it.
Unicode escape mode is more interesting. Most characters outside the ASCII range is represented by an escape sequence "\uxxxx" where "xxxx" is four hexadecimal digits. (Some characters are represented by two consecutive escape sequences.) These sequences are interpreted immediately on reading in the source code, and thus they may be used anywhere: in identifiers, comments, or strings. It is legal in Java to use values of "xxxx" that represent an ASCII character (0000-007f), but I propose to forbid this usage in Beta code. To the Java compiler, "\u002c" is equivalent to a comma in every way: it can be used to separate arguments in a method call or for any other purpose. This usage makes for nothing but confusion to the reader.
No comments:
Post a Comment