[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lexical analysis




Dear Peter,

we are just implementing a lexical analysis for 
our Isabelle CASL parser (up to now, we have used
Isabelle's standard lexical analysis, but this
is not exactly what is needed for CASL).

Now there are the following questions and problems:

- Are the following recognized as complete TOKENs or not?
  ->  ->?  =e=  {}  ×
  If they are, they should be listed together with
  < * ? ! and / on p. C-10, if not, they should be listed
  together with : :? ::= etc. on p. C-10.

- A NUMBER can simultaneously be a WORDS.
  We currently resolve this by scanning it as a NUMBER,
  and adding
     TOKEN ::= WORDS | SIGNS | DOT-WORDS | NUMBER
  and perhaps also
     SIMPLE-ID ::= WORDS | NUMBER
  but the latter seems not to be very useful.
  By the way, "5a5" and "'5a" are recognized as WORDS as well,
  was this intended, or should a WORDS start with a letter?

- There is no syntax for PATH and URL. Should we follow some
  international standard here? If so, which one and how to obtain
  a precise description of the syntax?

- A WORDS can (probably - see above) simultaneously be a PATH.
  We currently resolve this by scanning it as a WORDS,
  and adding
     LIB-IB ::= URL | PATH | WORDS 

- More seriously, x/zero is currently recognized as a PATH,
  but within a TERM, it should be recognized as three lexical 
  tokens, namely WORDS SINGS WORDS. There is no way to distinguish
  these cases at the lexical level, and by the longest match rule,
  we always get a PATH.
  Moreover, probably other SIGNS (like ".") will be allowed in a PATH,
  leading to similar problems.
  One way out would be to disallow SIGNS in a PATH, and reinroduce
  the necessary SIGNS, such as "/" and ".", via the grammar. But this would
  allow to write PATHs interspersed with spaces, such as
    CASLdir / examples / file1 . casl
  while we probably would like to enforce the user to write
    CASLdir/examples/file1.casl
  The other possibility would be to require to quote a PATH, e.g.
    "CASLdir/examples/file1.casl"

  The same problem also occurs for URLs.


We also have two problems with the grammar:

- There is no syntax for TOKEN-PLACES on page C-5 bottom.
  We assume that the syntax is
     TOKEN-PLACES ::= PLACE ... PLACE TOKEN PLACE ... PLACE
                    | TOKEN PLACE ... PLACE
                    | PLACE ... PLACE TOKEN 
                    | TOKEN
  (if there is not exactly one TOKEN in a TOKEN-PLACES,
   it becomes unclear where to attach the components of
   the compound id).

- The production
     SIMPLE-TERM ::= ID | ....
  has to be replaced by 
     SIMPLE-TERM ::= TOKEN-ID | ....
  because a MIXFIX-ID should not be a legal SIMPLE-TERM
  (and we would run into ambiguity problems).

Greetings and happy new year,
Till and Kolyang