[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Axel's Problems [long and technical]




Here are some anwers to Axel and Peter's questions. I have almost the same 
problems as Axel since I basically  used the same parsing technology

> > - There seems to me to be a problem with the definition of symbol
> > items: Should
> > 
> > 	{ ... } hide zero: nat
> > 
> > hide the unary predicate zero on naturals or the nullary function
> > (constant) zero of sort nat?  I think the concrete grammar allows
> > both.  (At least I got a conflict in my parse tables that I think I
> > tracked down to this phenomenon.)
> 
> The ambiguity here is supposed to be resolved by a later phase.
> Unfortunately that may involve transforming an OP-TYPE into a
> PRED-TYPE, but this is not the only situation where a transformation
> is required (e.g., MIXFIX to TERM).
> 
> Could you persuade bison to prefer one parse to the other?  If not, we
> may have to reconsider a previously-rejected proposal to use the same
> syntax for qualified symbols in such lists of symbols as in terms,
> i.e., allow all of:
> 

Actually I first parse such input as a constant and hope that a later
semantic part makes the difference using local context. This means that in
my grammar I had to break regularity and introduce rules for nullary predicates
and predicates with more than one sort, to let the remaining case be parsed
as a constant.

> > 	hide x, op y, z
> > 
> > to be grouped as in (curly braces are meta-symbols)
> > 
> > 	hide x, {op y}, z    or    hide x, {op y, z}
> > 
> 

It has been said in some meeting that in that case the "op" (or "pred"
or "sort") applied till the end of the list or some further keyword.
I solved the problem by having a marker (i.e. some hidden global variable)
remembering if some explicit kind (op, pred,...) has been previously seen in
the list or if the kind is still unspecified. However my grammar for that part
is quite different form the one in the official document since the above means
that I only deals with "flat" lists with optional keywords in the middle.


> > - The disambiguation rules for extensions and unions of specifications
> > say that `then' has lower precedence than `and'.  However the
> > productions in the concrete grammar for extensions and unions do not
> > say that the the longest match should be used for an and-separated
> > list of specs.  I.e.
> > 
> > 	{...} and {...} and {...}
> > 

I (probably unduly) thought that __and__ and __then__ were left-associative !

> > Technical problems in the grammar I came across when I constructed a
> > bison-style grammar for CASL.
> > 
> > - The Summary (on C-10) says that `<', `*', `?', and `!' should be
> > recognized as complete tokens.  This means that these signs act as two
> > distinct lexeme types: the lexeme used to form profiles for functions
> > and tokens used to name things.
> > 
> > Given the tools I use: in a bison-style grammar `*' has to be returned
> > as a special lexical symbol, call it star, because otherwise the rule
> > for, say, SOME-SORTS is problematic.  For bison one needs
> > 
> > 	SOME-SORTS ::= SORT | SORT star SOME-SORTS
> > 
> > where star is distinct from anything else but a single star.  If this
> > lexeme is acceptable in other places as well it has to be mentioned
> > there explicitly.  So there is a rule saying that
> > 
> > 	TOKEN ::= ... | `<' | `*' | ... 
> > 
> > This might not be an issue for other parser technologies, it took me
> > quite some time to figure this out, however, no matter how obvious it
> > looks to me now I have found it.
> 
> I'm afraid that this complication arising from the double usage of
> symbols such as `<' is something we have to accept.


My favourite topic :-)

I have evoked that problem - and the long list of non reserved symbols like {,
}, *, ->,  and (too) many others - several times but never got a "sympathetic"
answer. I'm afraid that Axel will have a much larger list of productions
that he probably expect. The only other way is to let the parser drives the
scanner by marking which kind of symbols is expected (for instance '*' is
always to be accepted as part od a SIGN, except in some particular locations;
however this is much harder for other symbols like {, }, ->, ...; and the
more of these tricks you have, the less robust system you get;
Interaction between scanner and parser is less easy that one usually think
because of lookahead (you had to turn some switch "ON" before the
corresponding token is "looked at" the first time (for controlling some
prior reduction), not just before the time it will be actually shifted.

> 
> > - The disambiguation rules say that `=>' groups to the right and `if'
> > groups to the left.  Also `A => B if C' is not allowed. This does not
> > go together with the way in which precedence and association is
> > specified in bison: Each keyword (like %left or %right) introduces a
> > new precedence level.  The order in which the declarations appear
> > determine which one binds tighter.  So I wouldn't know how one could
> > get the intended behaviour of `=>' and `if' in the context free
> > parsing stage.  
> 
> I wonder how this is achieved in the other generated parsers?
> Frederic, Till & Markus, Mark: would you please let Axel know
> (if you haven't already done so).

By not usign precedence levels but explicit non-terminals:

<Formula1> = <Implication> ;
<Formula1> = <Equivalence> ;
<Formula1> = <Formula2> ;


<IfFormula> = <IfFormula> "if" <Formula2> ;
<IfFormula> = <Formula2> "if" <Formula2> ;


<Implication> = <Formula2> "=>" <Implication> ;
<Implication> = <Formula2> "=>" <Formula2> ;


> > (One can, of course, group these later, after the
> > context free parsing.)  The same applies to some other disambiguiation
> > rules, e.g. `/\' vs. `\/', `and' vs. `then'.
> > 

Same solution. Actually I did not succeed in using the disambiguation
notation for the tool I'm using, especially in connection with the TERM level.


> > - There is a slight problem with the optional semicolons.  (This one
> > is a bit longer because I have written this earlier to convince myself
> > that what I was doing makes sense :-)
> > 
> > Consider
> > 
> > 	vars i, j: int; 
> > 	     m, n: nat{;}
> > 
> > where the semicolon in braces is optional, and
> > 
> > 	vars i, j: int; 
> > 	     m, n: nat{;}			%% ***
> > 	     . i = j
> > 
> > where the semicolon in braces is illegal.  
> 
> Markus recently mentioned to me that he'd prefer it to be made
> legal... but purely from the language design point of view, rather
> than to solve parsing problems.


> 
> > This is a special case of
> > the following: Consider some element type, say X, and sequences of X
> > separated by `;', i.e.
> >
> > [technical stuff deleted - fv]
> > 
> > 	SEQ-SEMIOPT ::= X | X `;' | X `;' SEQ-SEMIOPT
> > 
> 
> Good - but perhaps the difficulty you had lends support to Markus's
> proposal to make the semicolon always legal at the end of a list of
> VAR-DECLs?  Would it give problems for the other parsers to generalize
> the language this way?

I'm not sure I understand everything, but it seems that by using
right-recursive productions it also works. Except for a few special
cases, most productions look like

<SortItemSequence> = <Label> <SortItem> <OptSemiColon> ;
<SortItemSequence> = <Label> <SortItem> ";" <SortItemSequence> ;

Beware (as far as I remember) that it worked before one introduced
architectural specifications, and specially lambda expressions, where the
optional ";" strikes again and the conflict cannot be solved with one
character lookahead (no matter left or right recursive productions).
The problem had been mentioned but (voluntarily) ignored because syntax for
architectural specs. was tentative and known to need some tuning once real
examples had be written.

if you make the semicolon legal in that place, make sure it does not
introduce new conflicts or break some uniformity - i.e. what about
   f(x:nat; y: nat;) = ...
Allowed or not ?

>