[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: concrete syntax problems in arch spec and views



I have heavily edited Peter's answer to my original mail to keep this message
short, hoping without altering the meaning.

> From mosses@csl.sri.com  Sun Sep 13 00:24:06 1998
> > From Frederic.Voisin@lri.fr
> Subject: Re: concrete syntax problems in arch spec and views

> > Here is a short list of the current problems with the concrete syntax
> > for Casl.

For architectural spec. and views, here is the current state of work from my
point of view:

1. I have adopted the suggestions proposed by Christophe, Michel and Peter in
July for some of the problems:

- in architectural spec. one could not embed arbitrary BASIC-SPEC without
bracketting, i.e. the original productions
	UNIT-SPEC ::= UNIT-SPEC-NAME
		    | SPEC
		    | SPEC * ... * SPEC -> SPEC
and     VIEW-TYPE ::= SPEC -> SPEC

would be replaced by

	UNIT-SPEC ::= GROUPED-SPEC
		    | GROUPED-SPEC * ... * GROUPED-SPEC -> GROUPED-SPEC

and     VIEW-TYPE ::= GROUPED-SPEC -> GROUPED-SPEC

Production UNIT-SPEC ::= UNIT-SPEC-NAME	has been removed since it is
a special case of  GROUPED-SPEC (one cannot make the distinction between
a SPEC-NAME and a UNIT-SPEC-NAME at the syntactic level).


- As proposed by Michel, the UNIT-TERM-list after the "given" keyword
in UNIT-DECL has been restricted to a unique UNIT-TERM. Maybe using 
the "then" instead of "," plus fixing a precedence level might solve
the problem if really needed (or we rely on the "and" ??)

- Once the above is adopted, most of the problems are related to the
precedence level of UNIT-TERM (RESTRICTION, AND, ..) By adopting the same
precedence levels as in structured specs. most of them disappear in the LaLR
version.

- Still there is a problem with optional semi-colons. Consider:

arch spec Name =
     unit UnitName = lambda
		        UName : arch spec unit <UnitDeclDefnList>
					 result <UnitExpression>
			    ; 			<--- Problem here !!!
			    ....
When about to parse the semi-colon, one cannot know if it is the optional
semi-colon that ends the BASIC-ARCH-SPEC declared by "units.... result..."
or if that construction ends without the semi-colon, in which case
the semi-colon is the one between multiple UNIT-BINDIND in the lambda
expression... This may not be the only case where the problem occurs.

I tried using an optional "end", or using "," instead of ";", at various
locations, but the only solution I found is by DISALLOWING the optional
";" at the end of the production for BASIC-ARCH-SPEC...

Any clever idea is welcome.


With the above propositions, I have no additional conflicts for
arch. spec. than with structured spec only (that does not imply that the
work is done :-( )


Do you think that the above restrictions are reasonable or not ? Let me know
before I go further...


> > There are also problems with the syntax of PATHs when naming specifications.

> I don't see anything called PATH in the grammars in the Summary v0.99;
> I guess you mean the use of "/" in LIB-ID?  Notice that this is not
> allowed when referring to a spec, only to a library.
> 

Well, in fact it is Christophe that was having the problem in his LL(2)
parser... For me is seems to work.

Note however that by restricting the various parts in a LIB-ID to
SIMPLE-ID, we cannot deal with an arbitrary UNIX path :-(.
Consider for instance any file named .../ASF+SFD/... , because of the '+'.

This is why we have suggested to use a special lexical form to denote such
path, like "arbitrary_unix_path_enclosed_in_double__quotes" or
whatever you like but still scannable as a unique token without having
to control much of its content.


> > Micro-problems/ambiguities in the current document

> > - In a definition by extension, spaces are needed around the '.'
> >   NO: 	Odd = {n:Nat.odd n}  %% Note that ".odd" is a valid token
> >   Yes:	Odd = {n:Nat . odd n}
> 
> I see the need for the second space, but not for the first one
> Moreover, isn't it just the same situation as with QUANTIFICATION:

True for both points. The space before the "." in the above example is
not technically needed, only the one after it ! I put a space both before and
after the "." because it seems more readable.

It is clear that the same problem occurs also for QUANTIFICATION
and "bulletized formulae". It is a purely lexical problem of knowing whether the
"." is separate from the adjacent identifier, or not.
	
> Of course, some might prefer to remove the rather odd-looking
> DOT-WORDS from the language altogether, since it also prevents using
> "." as in infix operation, and might hinder the introduction of the
> notorious "dot-notation" in extensions of CASL.  But let's not waste
> our precious time on this relatively minor issue, which has already
> been debated at considerable length - extensions will in any case have
> to restrict the CASL lexical syntax for ID so as to reserve new
> keywords, and they might just as well be allowed to remove DOT-WORDS
> at the same time.

Agreed, of course !!


> > - <Sort> is defined as <TokenId>. Probably it should be a <SimpleId>
> 
> NO!  In structured specs, TOKEN-ID allows for compound ids, such as
> "List[Elem]", whereas SIMPLE-ID is still merely WORDS.

OK, sorry I have been too restrictive and missed my point: the current 
SORT ::= TOKEN-ID derives into TOKEN that itself derives into SIGNS or
DOT-WORDS, besides WORDS. I'm not sure it was intended...


> > - Are both ->? and -> ? needed ? Or which form is the right one ?
 
> For the ASF+SDF CASL v0.99 parser (see ftp://ftp.brics.dk/Projects/CoFI/
> Documents/CASL/SyntaxExamples/ASF+SDF/ZCasl-BasicItems.syn) Bjarke
> only allowed "->?".

Probably I can live with any solution for that problem, as long as anyone
agrees about it. Let say that only "->?" is correct.

> > - The "display annotations" must probably be allowed also for symbols
> >   declared in renamings, not only in "declarations"
> 
> Right - because new symbols can be introduced there too.  Thanks,
> I hadn't noticed that.  The same goes for fittings in instantiations;
> so I suggest to provide these annotations uniformly for SYMB-MAP-ITEMS.

or only for the targets in SYMB-MAP (not the source).


> > - the tokens -> and ->? are not described as being complete reserved tokens
> 
> Neither is "*"; presumably the treatment of these symbols should be
> analogous?  Perhaps the explanation in App. C.4:
>   `(since they all have to be recognized as terminal symbols)'
> is misleading.  As far as I can see, one needs to make sure that a
> terminal symbol which can terminate an ID, TERM, or FORMULA cannot
> also be a valid complete TOKEN.  That motivates reserving:
>   :  :?  =  =>  <=>  .  |  |->  \/  /\  {  }  [  ]
> as well as most of the keywords.  (Maybe "::=" should be left
> unreserved, although I can't imagine anyone wanting to use it.)  I
> don't see any *technical* reason for reserving tokens such as "*" and
> "->", since when they are used in FUN-TYPE in TERM, they always follow
> a SORT, which is a single TOKEN-ID.  

Let we answer with my current focus: "Technical" = parsing technology using
standard tools !
With parsers using lookahead techniques, one has to tell the scanner
"well in advance" (1 or 2 tokens before the next move) if a token like -> or
* is to be scanned as a SIGN or as a special token, depending on the
syntactical context... and this is a source for potential bugs and increased
complexity for the grammar. The less there are such symbols, the less bugs we
will have probably !

It is clear for me that we need special cases like "*". If people really feel
for having -> and ->? (and similar symbols), ok but otherwise let us try to
have the tools as simple as possible.

> However, perhaps "->" and "->?" should be reserved to avoid problems
> with parsing arising in a higher-order extension of CASL (where the
> distinction between SORT and TERM may disappear).  The question is
> then whether "*" should be reserved too - I hope not!  In fact I'd
> prefer to regard all of "->", "->?" and "*" as ordinary tokens usable
> as infix operators (but getting a predefined interpretation when
> applied to types in HO-CASL).

-------------------------------
 
> > - Is the operator {{ __ }} allowed, since in an application like
> >   {{ ( a ) }}, from a grammatical point of view we have different IDs
> >   "{{" and "}}" and the reference manual aks each ID to have balanced
> >   occurrences of {.. } (or [... ]).
> 
> I don't understand your "grammatical" point of view: "{{" and "}}" are
> simply separate TOKENs, and *not* valid as complete IDs.  The ID 
> "{{ __ }}" is allowed, and since it is balanced wrt "{" and "}", so is
> any TERM (or FORMULA) built from it.
> 

"complete" ID ???

Probably only a question of wording (and apology to all of you that will find
that part quite esoteric and boring). What do we call an ID ? A sequence
of TOKEN-OR-PLACE (let us consider only mixfix) as in the grammar ? In that
case for an application of a mixfix operator like {{ ( a ) }}, and because
of the parentheses, I see three chunks ("ID" in the grammar... sorry) namely
"{{", "a", "}}", separated by "(" and ")". With that (narrow) view, none of
the ID "{{" and "}}" is balanced by itself, as requested by the paragraph in C.4

If what we call an ID in C.4 is indeed an operator/predicate name, like
{{ __ }}, of course it is balanced.


> > - I have doubts about the possibility of allowing [ and ] in IDs since
> >   the symbols are also used for marking compound ids.
> 
> I had hoped that e.g. "[ __ ]" and "__ [ __ ]" would be a valid IDs,
> but that may well lead to problems when trying to group a list of IDs
> into a TERM.  E.g., whether "f[elem]" is a constant compound ID or
> mixfix notation for "__[__](f,elem)" cannot be determined
> context-freely.  

Exactly.

> However, a solution there may be for context-free parsers to leave the
> recognition of compound ids to the static analysis.  After all,
> "f[elem]" is only legal as a compound id when it has been declared as
> such (possibly by the renaming that is implicit in the instantiation
> of a generic spec); in specs not using compound ids at all, e.g.,
> those arising from translation of OBJ3 specs, it would be annoying to
> have unnecessary restrictions on mixfix symbols.  
> 
> In a CASL sublanguage not allowing mixfix notation, however, terms
> should be fully parsed, context-freely, including compound ids.  Maybe
> a parser annotation could be provided, to declare whether mixfix
> notation is to be used in a spec or not.

Parser annotation will not change what has been fixed in the context-free
grammar.


Frederic