[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problems and questions about CASL concrete syntax




Together with the submitted note T-7, I rediscovered the following
problems and questions about the CASL concrete and abstract grammars. 

> From: Axel Schairer <schairer@dfki.de>
> Date: Wed, 28 Apr 1999 20:40:25 +0200 (MEST)
> ...
> Another thing I have promised to send you is a list of things that
> struck me when I implemented the INKA frontend.  This list in plain
> ASCII is also included below.  However, these comments are not worked
> out really.
> 
> I hope the information is interesting and of use.

Dear Axel,

Thanks for your message, and sorry for the things that have been
unclear - as well as for not addressing your questions until now...

> ======================================================================
> Problems I encountered which I think are not caused by the tools I
> use.
> 
> - There seems to me to be a problem with the definition of symbol
> items: Should
> 
> 	{ ... } hide zero: nat
> 
> hide the unary predicate zero on naturals or the nullary function
> (constant) zero of sort nat?  I think the concrete grammar allows
> both.  (At least I got a conflict in my parse tables that I think I
> tracked down to this phenomenon.)

The ambiguity here is supposed to be resolved by a later phase.
Unfortunately that may involve transforming an OP-TYPE into a
PRED-TYPE, but this is not the only situation where a transformation
is required (e.g., MIXFIX to TERM).

Could you persuade bison to prefer one parse to the other?  If not, we
may have to reconsider a previously-rejected proposal to use the same
syntax for qualified symbols in such lists of symbols as in terms,
i.e., allow all of:

  { ... } hide zero
  { ... } hide (op zero: nat)
  { ... } hide (pred zero: nat)

possibly allowing also: 

  { ... } hide (sort zero)

(It is unlikely that there will be much use of qualified symbols in
practical examples of specifications, but of course parsers have to
deal with the complete language!)

> - SYMB-ITEMS are used in a comma-separated list in RESTRICTION.
> SYMB-ITEMS internally can contain a list of comma-separated SYMBs at
> the end as well.  So basically the grammar allows
> 
> 	hide x, op y, z
> 
> to be grouped as in (curly braces are meta-symbols)
> 
> 	hide x, {op y}, z    or    hide x, {op y, z}
> 
> Obviously, the second is the intended grouping, but the grammar does
> not reflect this.  (This can be handled by forcing the context free
> parser to use the first grouping consistently and then massaging the
> AST later to get the second grouping.  I haven't found a way yet to
> force the second straight away with my tools.)

The intention here was that all groupings should be semantically
equivalent, so the parser is free to choose any of them.

> - The disambiguation rules for extensions and unions of specifications
> say that `then' has lower precedence than `and'.  However the
> productions in the concrete grammar for extensions and unions do not
> say that the the longest match should be used for an and-separated
> list of specs.  I.e.
> 
> 	{...} and {...} and {...}
> 
> can either be produced by the concrete grammar as (square brackets are 
> meta symbols indicating grouping)
> 
> 	[ {...} and {...} ] and {...}    or
> 	{...} and [ {...} and {...} ]    or
>         [ {...} and {...} and {...} ]
> 
> Maybe this is addressed in the paper specifying the mapping from
> concrete to abstract syntax but the core concrete syntax does allow
> all of these as far as I can see.

It would be useful to add the point about longest match to the
explanation of disambiguation.  However, the different groupings are
semantically equivalent (at least in the absence of annotations), so
the user needn't worry about which is chosen by the parser -
especially if they are mapped to the same abstract syntax.

> ======================================================================
> Technical problems in the grammar I came across when I constructed a
> bison-style grammar for CASL.
> 
> - The Summary (on C-10) says that `<', `*', `?', and `!' should be
> recognized as complete tokens.  This means that these signs act as two
> distinct lexeme types: the lexeme used to form profiles for functions
> and tokens used to name things.
> 
> Given the tools I use: in a bison-style grammar `*' has to be returned
> as a special lexical symbol, call it star, because otherwise the rule
> for, say, SOME-SORTS is problematic.  For bison one needs
> 
> 	SOME-SORTS ::= SORT | SORT star SOME-SORTS
> 
> where star is distinct from anything else but a single star.  If this
> lexeme is acceptable in other places as well it has to be mentioned
> there explicitly.  So there is a rule saying that
> 
> 	TOKEN ::= ... | `<' | `*' | ... 
> 
> This might not be an issue for other parser technologies, it took me
> quite some time to figure this out, however, no matter how obvious it
> looks to me now I have found it.

I'm afraid that this complication arising from the double usage of
symbols such as `<' is something we have to accept.

> - The disambiguation rules say that `=>' groups to the right and `if'
> groups to the left.  Also `A => B if C' is not allowed. This does not
> go together with the way in which precedence and association is
> specified in bison: Each keyword (like %left or %right) introduces a
> new precedence level.  The order in which the declarations appear
> determine which one binds tighter.  So I wouldn't know how one could
> get the intended behaviour of `=>' and `if' in the context free
> parsing stage.  

I wonder how this is achieved in the other generated parsers?
Frederic, Till & Markus, Mark: would you please let Axel know
(if you haven't already done so).

> (One can, of course, group these later, after the
> context free parsing.)  The same applies to some other disambiguiation
> rules, e.g. `/\' vs. `\/', `and' vs. `then'.
> 
> - There is a slight problem with the optional semicolons.  (This one
> is a bit longer because I have written this earlier to convince myself
> that what I was doing makes sense :-)
> 
> Consider
> 
> 	vars i, j: int; 
> 	     m, n: nat{;}
> 
> where the semicolon in braces is optional, and
> 
> 	vars i, j: int; 
> 	     m, n: nat{;}			%% ***
> 	     . i = j
> 
> where the semicolon in braces is illegal.  

Markus recently mentioned to me that he'd prefer it to be made
legal... but purely from the language design point of view, rather
than to solve parsing problems.

> This is a special case of
> the following: Consider some element type, say X, and sequences of X
> separated by `;', i.e.
> 
> 	SEQ ::= X | X `;' SEQ
> 
> At this point it would also be possible to write down a right
> recursive solution (i.e.~replace the second right hand side by SEQ `;'
> X) but this will not work for the optional semicolons.  So we stick
> with the left recursive productions for the time being.
> 
> Now suppose that we need to be able to parse all of the following
> 
> 	\alpha SEQ \beta
> 	\alpha SEQ \gamma
> 	\alpha SEQ `;' \gamma
> 	
> in the same situation, where \beta and \gamma cannot start with the
> same terminal.  My naive solution was the following:
> 
> 	A ::= \alpha SEQ \beta
> 	    | \alpha SEQ \gamma
> 	    | \alpha SEQ `;' \gamma
> 
> The problem here is: When the parser, after having handled the \alpha,
> has read a number of X elements separated by semicolons and looks
> ahead at a semicolon, it does not know whether the semicolon precedes
> another X element or whether it is the optional semicolon before the
> \gamma.
> 
> I came up with the following solution.  Compact the last two
> productions and introduce another nonterminal SEQ-SEMIOPT that is a
> sequence optionally followed by a semicolon.
> 
> 	A ::= \alpha SEQ \beta
> 	    | \alpha SEQ-SEMIOPT \gamma
> 
> 	SEQ-SEMIOPT ::= X | X `;' | X `;' SEQ-SEMIOPT
> 
> This works because in the situation above the parser can simply shift
> the semicolon.  There is no need to decide with which rule it should
> reduce.

Good - but perhaps the difficulty you had lends support to Markus's
proposal to make the semicolon always legal at the end of a list of
VAR-DECLs?  Would it give problems for the other parsers to generalize
the language this way?

> ======================================================================
> General problems I came across when trying to understand the grammar
> 
> 
> - It is as yet unclear to me whether, and if yes how, a specification
> (not in a library) may use libraries.  It seems that download-items
> are only allowed inside libraries.  Am I missing something here?

References to specification names are always interpreted in the
context of a global environment, assumed to be provided by the
preceding part of the library in which the enclosing specification
definition occurs.  Download-items allow specification definitions to
be copied from other libraries, and affect the global environment.
It would be more complicated if they could occur in the middle of a
specification - also involving the semantics of libraries in that of
structured specifications.

> - How does one get at the text of a specification library given its
> path or URL?  These two are supposed to be directories, so there
> should be a rule that says in which file the current specification
> (maybe considering the version) resides.  Is there a document
> describing this I have missed?

Sorry, no document yet...  The idea is for each specification in a
library to reside in a separate file, and for an index file to
determine the order in which they occur.  But (as for concrete syntax)
the details are open to adjustment in the light of implementation
attempts.

> - I have found the abstract syntax confusing at first because it
> conveys two relations with the same notation: On the one hand it
> defines a subsort relation on tree sorts (e.g. AXIOM is-a FORMULA).
> On the other hand it defines a representation of these trees
> (PRED-NAME is-part-of PRED-DEFN).  I found it difficult to read from
> the abstract syntax the intended representation of the trees using a
> type system like, e.g, ML's.

I think that the (somewhat longer) grammar in App. A of the Summary
has the property that each production can easily be identified as of
one type or the other.  App. B is more concise, but not so good in
that respect.

Best regards,

-- Peter
_________________________________________________________
Dr. Peter D. Mosses             International Fellow  (*)

Computer Science Laboratory     mailto:mosses@csl.sri.com
SRI International               phone: +1 (650)  859-2200
333 Ravenswood Avenue           fax:   +1 (650)  859-2844
Menlo Park, CA 94025, USA       http://www.brics.dk/~pdm/

(*) on leave from DAIMI & BRICS, University of Aarhus, DK
    also affiliated to CS Department, Stanford University
_________________________________________________________