[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: concrete syntax problems in arch spec and views



Sorry that I didn't have time to respond to Frederic's message until
now.  However, I have of course tried to take account of the points he
raised while revising the concrete syntax for CASL.  Here's some
explanation of how.  (As it's been some time since Frederic sent his
message, I'm citing the open points almost in full.)

Frederic wrote:
> ...
> For architectural spec. and views, here is the current state of work from my
> point of view:
> 
> 1. I have adopted the suggestions proposed by Christophe, Michel and Peter in
> July for some of the problems:
> 
> - in architectural spec. one could not embed arbitrary BASIC-SPEC without
> bracketting, i.e. the original productions
> 	UNIT-SPEC ::= UNIT-SPEC-NAME
> 		    | SPEC
> 		    | SPEC * ... * SPEC -> SPEC
> and     VIEW-TYPE ::= SPEC -> SPEC
> 
> would be replaced by
> 
> 	UNIT-SPEC ::= GROUPED-SPEC
> 		    | GROUPED-SPEC * ... * GROUPED-SPEC -> GROUPED-SPEC
> 
> and     VIEW-TYPE ::= GROUPED-SPEC -> GROUPED-SPEC

OK

> Production UNIT-SPEC ::= UNIT-SPEC-NAME	has been removed since it is
> a special case of  GROUPED-SPEC (one cannot make the distinction between
> a SPEC-NAME and a UNIT-SPEC-NAME at the syntactic level).

OK - and I've removed UNIT-SPEC-NAME altogether from the abstract
syntax too.

> - As proposed by Michel, the UNIT-TERM-list after the "given" keyword
> in UNIT-DECL has been restricted to a unique UNIT-TERM. Maybe using 
> the "then" instead of "," plus fixing a precedence level might solve
> the problem if really needed (or we rely on the "and" ??)

I see the problem (a UNIT-TERM might terminate with a HIDING, which is
also a comma-separated list) but I've taken a different solution, more
in line with the way similar problems are avoided in other constructs:
to allow grouping of unit terms.  (Perhaps one might prefer to use
ordinary parentheses rather than braces for grouping here?)  I hope
that this eliminates the lookahead problem here; this should
preferably be checked before 15 October....

> - Once the above is adopted, most of the problems are related to the
> precedence level of UNIT-TERM (RESTRICTION, AND, ..) By adopting the same
> precedence levels as in structured specs. most of them disappear in the LaLR
> version.
> 
> - Still there is a problem with optional semi-colons. Consider:
> 
> arch spec Name =
>      unit UnitName = lambda
> 		        UName : arch spec unit <UnitDeclDefnList>
> 					 result <UnitExpression>
> 			    ; 			<--- Problem here !!!
> 			    ....
> When about to parse the semi-colon, one cannot know if it is the optional
> semi-colon that ends the BASIC-ARCH-SPEC declared by "units.... result..."
> or if that construction ends without the semi-colon, in which case
> the semi-colon is the one between multiple UNIT-BINDIND in the lambda
> expression... This may not be the only case where the problem occurs.
> 
> I tried using an optional "end", or using "," instead of ";", at various
> locations, but the only solution I found is by DISALLOWING the optional
> ";" at the end of the production for BASIC-ARCH-SPEC...
> 
> Any clever idea is welcome.

Not particularly clever: just introduce grouping for ARCH-SPEC too...

> With the above propositions, I have no additional conflicts for
> arch. spec. than with structured spec only (that does not imply that the
> work is done :-( )
> 
> 
> Do you think that the above restrictions are reasonable or not ? Let me know
> before I go further...

With the concrete syntax in App C of the final draft v1.0 summary, I
think that I've managed to circumvent the proposed restrictions, at
the expense of introducing grouping.  I regard the latter as the
obvious way to eliminate ambiguity and look-ahead problems, and not a
burden for the user who makes sensible use of the possibilities for
naming entities of the various kinds.  Of course one needs precedence
too.  (Oops, I guess I forgot to add for arch specs to App C.3, sorry!
I'll draft what I think should be inserted, and check it with Frederic
before submitting as an objection :-) to cofi-language.

> > > There are also problems with the syntax of PATHs when naming specifications.
> 
> > I don't see anything called PATH in the grammars in the Summary v0.99;
> > I guess you mean the use of "/" in LIB-ID?  Notice that this is not
> > allowed when referring to a spec, only to a library.
> > 
> 
> Well, in fact it is Christophe that was having the problem in his LL(2)
> parser... For me is seems to work.
> 
> Note however that by restricting the various parts in a LIB-ID to
> SIMPLE-ID, we cannot deal with an arbitrary UNIX path :-(.
> Consider for instance any file named .../ASF+SFD/... , because of the '+'.
> 
> This is why we have suggested to use a special lexical form to denote such
> path, like "arbitrary_unix_path_enclosed_in_double__quotes" or
> whatever you like but still scannable as a unique token without having
> to control much of its content.

The contexts in which a library identifier can occur are so limited
that I see little motivation for reserving a special lexical syntax
for them (one that cannot be confused with the other lexical tokens).

In fact URL is just as much a problem as PATH.  My suggestion is to
scan a general URL (I haven't checked with the official definition,
maybe it needs some restrictions) where expected and use the presence
or absence of transport protocol specifier to determine whether it is
intended as a direct or indirect link.

> > > Micro-problems/ambiguities in the current document
> ...
> > > - <Sort> is defined as <TokenId>. Probably it should be a <SimpleId>
> > 
> > NO!  In structured specs, TOKEN-ID allows for compound ids, such as
> > "List[Elem]", whereas SIMPLE-ID is still merely WORDS.
> 
> OK, sorry I have been too restrictive and missed my point: the current 
> SORT ::= TOKEN-ID derives into TOKEN that itself derives into SIGNS or
> DOT-WORDS, besides WORDS. I'm not sure it was intended...

Not really, but it doesn't seem to do any harm.  I'd rather avoid
(re-)introducing WORDS-ID, which restricts TOKEN-ID to WORDS
components.  But it's easy enough if there's a real need for it.

> ...
> > > - the tokens -> and ->? are not described as being complete reserved tokens
> > 
> > Neither is "*"; presumably the treatment of these symbols should be
> > analogous?  Perhaps the explanation in App. C.4:
> >   `(since they all have to be recognized as terminal symbols)'
> > is misleading.  As far as I can see, one needs to make sure that a
> > terminal symbol which can terminate an ID, TERM, or FORMULA cannot
> > also be a valid complete TOKEN.  That motivates reserving:
> >   :  :?  =  =>  <=>  .  |  |->  \/  /\  {  }  [  ]
> > as well as most of the keywords.  (Maybe "::=" should be left
> > unreserved, although I can't imagine anyone wanting to use it.)  I
> > don't see any *technical* reason for reserving tokens such as "*" and
> > "->", since when they are used in FUN-TYPE in TERM, they always follow
> > a SORT, which is a single TOKEN-ID.  
> 
> Let we answer with my current focus: "Technical" = parsing technology using
> standard tools !
> With parsers using lookahead techniques, one has to tell the scanner
> "well in advance" (1 or 2 tokens before the next move) if a token like -> or
> * is to be scanned as a SIGN or as a special token, depending on the
> syntactical context... and this is a source for potential bugs and increased
> complexity for the grammar. The less there are such symbols, the less bugs we
> will have probably !
> 
> It is clear for me that we need special cases like "*". If people really feel
> for having -> and ->? (and similar symbols), ok but otherwise let us try to
> have the tools as simple as possible.

My point was precisely that *, ->, and ->? occur as reserved symbols
in the SAME contexts (OP-TYPE and PRED-TYPE), so it's actually more
systematic to treat them the same way than to treat * as an anomaly.
Moreover:

> > However, perhaps "->" and "->?" should be reserved to avoid problems
> > with parsing arising in a higher-order extension of CASL (where the
> > distinction between SORT and TERM may disappear).  The question is
> > then whether "*" should be reserved too - I hope not!  In fact I'd
> > prefer to regard all of "->", "->?" and "*" as ordinary tokens usable
> > as infix operators (but getting a predefined interpretation when
> > applied to types in HO-CASL).

- treating them differently in a parser might undermine extending it
to HO CASL (which is probably somebody else's problem, but still!).

> -------------------------------
>  
> > > - Is the operator {{ __ }} allowed, since in an application like
> > >   {{ ( a ) }}, from a grammatical point of view we have different IDs
> > >   "{{" and "}}" and the reference manual aks each ID to have balanced
> > >   occurrences of {.. } (or [... ]).
> > 
> > I don't understand your "grammatical" point of view: "{{" and "}}" are
> > simply separate TOKENs, and *not* valid as complete IDs.  The ID 
> > "{{ __ }}" is allowed, and since it is balanced wrt "{" and "}", so is
> > any TERM (or FORMULA) built from it.
> > 
> 
> "complete" ID ???
> 
> Probably only a question of wording (and apology to all of you that will find
> that part quite esoteric and boring). What do we call an ID ? A sequence
> of TOKEN-OR-PLACE (let us consider only mixfix) as in the grammar ? In that
> case for an application of a mixfix operator like {{ ( a ) }}, and because
> of the parentheses, I see three chunks ("ID" in the grammar... sorry) namely
> "{{", "a", "}}", separated by "(" and ")". With that (narrow) view, none of
> the ID "{{" and "}}" is balanced by itself, as requested by the paragraph in C.4
> 
> If what we call an ID in C.4 is indeed an operator/predicate name, like
> {{ __ }}, of course it is balanced.

I've adjusted the wording in C.4 to refer to a "declared ID".  The
"chunks" referred to above are instances of TOKEN, and not to be
parsed as complete IDs.

> > > - I have doubts about the possibility of allowing [ and ] in IDs since
> > >   the symbols are also used for marking compound ids.
> > 
> > I had hoped that e.g. "[ __ ]" and "__ [ __ ]" would be a valid IDs,
> > but that may well lead to problems when trying to group a list of IDs

(I meant TOKENs, interspersed with parentheses and other reserved symbols.)

> > into a TERM.  E.g., whether "f[elem]" is a constant compound ID or
> > mixfix notation for "__[__](f,elem)" cannot be determined
> > context-freely.  
> 
> Exactly.
> 
> > However, a solution there may be for context-free parsers to leave the
> > recognition of compound ids to the static analysis.  After all,
> > "f[elem]" is only legal as a compound id when it has been declared as
> > such (possibly by the renaming that is implicit in the instantiation
> > of a generic spec); in specs not using compound ids at all, e.g.,
> > those arising from translation of OBJ3 specs, it would be annoying to
> > have unnecessary restrictions on mixfix symbols.  
> > 
> > In a CASL sublanguage not allowing mixfix notation, however, terms
> > should be fully parsed, context-freely, including compound ids.  Maybe
> > a parser annotation could be provided, to declare whether mixfix
> > notation is to be used in a spec or not.
> 
> Parser annotation will not change what has been fixed in the context-free
> grammar.

That seems to be missing the point.  For those who avoid mixfix
notation one should perhaps provide a specialized parser that not only
rejects mixfix notation for applications, but also provides a complete
analysis of terms, including compound ids.  On the other hand, the
general parsers that allow mixfix notation - the ones currently being
developed, I hope - should avoid trying to do too much, probably NEVER
recognizing an application or a compound id, leaving this to a
subsequent context-sensitive analysis where the mixfix declarations
are available.

(I seem to recall that Bjarke discovered that Christophe's parser was
indeed a bit too eager in this respect, always recognizing f(x) as an
application even when f and (x) might be the arguments of a mixfix op
such as "__ __"; I thought that he'd already pointed this out to
the others developing parsers - otherwise, sorry for the lateness...)

Anyway, surely languages (e.g. SML) that allow users to declare the
precedence of infix operators can be parsed using standard technology,
so at least annotations should be able to influence the parsing!

One further point concerning mixfix parsing:

Thanks to linear visibility, one has the anomaly that a mixfix
application that parsed OK may become ambiguous due to later
declarations - even in the same basic spec!  It might be better to be
more strict, taking account of all the mixfix patterns declared in a
basic spec (as well as the local environment), regardless of their
order.  This slightly increases the chance of rejection due to
ambiguity of grouping, but ensures that rejection is independent of
the order of basic items.  Linear visibility can still be enforced by
subsequently rejecting use before declaration (except in datatype
declarations, of course...).  This should perhaps have been made
explicit somewhere in App C?

> Frederic

-- Peter
_________________________________________________________
Dr. Peter D. Mosses             International Fellow  (*)

Computer Science Laboratory     mailto:mosses@csl.sri.com
SRI International               phone: +1 (650)  859-2200 ! CHANGED
333 Ravenswood Avenue           fax:   +1 (650)  859-2844
Menlo Park, CA 94025, USA       http://www.brics.dk/~pdm/

(*) on leave from DAIMI & BRICS, University of Aarhus, DK
    also affiliated to CS Department, Stanford University
_________________________________________________________