[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: concrete syntax problems [with mixfix notation]




Frederic wrote:
 
> Well : Peter now inclines on one way, Bjarke on the other. Honestly, I do
> not know for myself (probably differently as a user or as a parser
> implementer).

There are indeed arguments pulling in both directions.  Thanks to
Frederic for citing Christophe's examples below - had he put them in
his previous message, I'd probably not have wavered, as I (still)
find that they make a convincing argument in favour of DEFERRING all
mixfix grouping analysis until after the initial parse.  Thus I'm now
back to supporting what Bjarke has implemented in ASF+SDF...  

The bottom line is that I think that Frederic should simplify his
parser to stop recognizing applicative structure in atomic formulae
and terms.

> > From: "Bjarke Wedemeijer" <wedemeij@wins.uva.nl> (edited !!)
> > Christophe's approach is wrong since (sub)terms of the following
> > form "f(x)" are always recognized as possible function application. I
> > tested his parser in june. Maybe he made a new version already?
> 
> I didn't check but probably not, 

However, it appears that Christophe's analysis of "f(x)" is reported
as "f"("x"), and one can't see whether an application has been
recognized or not... somewhat at variance with the discussion of this
point in Section 3 of:

  http://www.lsv.ens-cachan.fr/~tronche/cofi/csf-proposal.html

> but this is not the problem: do we consider
> it as a bug or as the normal behavior ! Currently I do like Christophe
> for "f(x)" on purpose... I can revert the choice but we have to choose the
> desired behavior.
> The problem was evoked in Christophe's mail (March 11) with the following
> example and constraint:
> 
> - we want to make life as easy as possible for people/parsers who use only
> ordinary functional application, and to recognize it as a special case

I'm not so sure about the "as easy as possible"!

> - however things are not that easy since with
>     t gt s (x + y)  , s is probably  a function
>     f xx * (x + y)  , * is probably a binary operator not a unary function.
> here we rather need to know the whole signature before choosing what to
> accept.

I think that the latter is the crucial point here.

> The same is true for f(x) as noted in the current thread of messages.
> 
> > > Therefore, I would say that "f(x)" is to be recognized at
> > > parsing time as a potential function application, the fact that
> > > "f" does exist or not (with the adequate signature) being
> > > checked during "static semantics" checks ? 
> > 
> > I fully disagree. It is possible to do a correct recognisition at parsing
> > time (see my mixfix parser in ASF+SDF). As far as I recall  the problem 
> > in Christophe's parser is that he favours "f(x)" too much as a function
> > application. As you have shown are there some situations where "f(x)"
> > is a single constant f and a parenthesized constant x.
> 
> I know that it is possible to do differently and I did that in 
> Asspegique's parser long time ago. But what is "parsing time" ? 
> "Context-free" parsing ? For "parsing" do we need to know the 
> full signature ?

Let's agree that unless otherwise qualified, "parsing" is indeed
regarded as context-free, without knowledge of the declared symbols.

Perhaps we may use the phrase "mixfix analysis" to refer to the
subsequent context-sensitive analysis of the exact structure of terms?

> What is the meaning of precedence levels:
> 
> Choice A: If, given the current global signature, there is only one structure
> for terms with appropriate sort checking, then use it (the structure depends
> on the environement) . If there are more than one, use the precedence levels
> to discard some of them.

Forget about the sort checking: sorts are IGNORED, it is only the
patterns of tokens in the declared IDs that are used.

> Choice B: First use the precedence level to structure the terms. Then use
> the computed structure as the starting point for the rest.

In my view, "precedence" is a relation between constructs, not between
a construct and a string of tokens.  I think B above is quite strange.
Its only advantage seems to be that it gives the right result straight
away in the special case that no mixfix IDs are used at all.  But this
efficiency might be obtained in other ways, if needed; see below.

> For terms, A and B make difference only for parenthesis since one cannot
> distinguish "context-freely" between prefix/postfix/mixfix for instance.
> 
> The current document does not clearly say (in my view) what choice
> has been made.  For the structuring operations, this is clearly
> B. For Formulae, I think there is no choice. 

I guess you mean regarding just the built-in notation for quantification,
conjuction, etc. - applications of predicates may be mixfix, and raise
the same problems as terms, I believe.  BTW, formulae can also occur
within terms (but are clearly bracketed by "when" and "else").

> For terms ? The
> explanations is given like for the other items, but it is also
> written at the start of C.3 "the exact applicative structure fo
> these constructs is left unspecified here, since it is inherently
> context-dependent".

I agree that we have amend C.3 to clarify what is intended.

> Let's have a last chance to all agreed upon what we want.

We seem to agree now that if (context-free) parsing recognizes "f(x)"
as an application of "f" to "x", and some mixfix symbols have been
declared, this may turn out to be the wrong analysis.  The examples
given by Christophe suggest that this may even happen quite often.

So my proposal must be that (context-free) parsing should do NO
analysis at all of the applicative structure of atomic formulae and
terms (except between "when" and "else" in conditional terms).  The
result of parsing an atomic formula or term should be simply a string
of TOKENs, interspersed with parentheses, commas, and keywords.  Even
compound identifiers should be left unrecognized at this stage.
Optionally, one might introduce some nesting structure in the string,
as determined by parentheses, according to the grammar given in
App C.2.1; but it's not clear that this partial structure is at all
useful.

The subsequent (context-dependent) mixfix analysis should consider
only the potentially-legal applicative structures - involving
applications only of declared patterns of TOKENs.  The precedence
information indicated in App C.3 (together with any precedence and
associativity annotations) should then be used to discard applicative
structures that don't conform.  A proposal for rewording App C.3 is
appended (suggestions for improvements are welcome).

So how might one get the right result more efficiently when one knows
that no mixfix notation is used in a spec?  Only by SPECIALIZING the
composition of the general parser and the mixfix analyser to that
case, I think.  But this composition is equivalent to an alternative
version of the parser that recognizes only ordinary notation for
application (rejecting mixfix notation altogether) with no need for a
mixfix analysis phase; one may just as well implement it directly.

One could either leave the choice of parser(+mixfix analyser?) to the
user, or introduce a parsing annotation (to occur at the beginning of
the specification/library) that could be inspected and used to select
the right parser.  The latter approach would seem to be the more user-
friendly.

Personally, I wouldn't invest effort in optimizing the composition of
parsing and mixfix analysis for a restricted language unless its
inefficiency turns out to be a significant problem.  We may return to
this point if it turns out that we get lots of CASL specs not using
mixfix notation.  We should perhaps defer the introduction of the
related parsing annotation until then too.

> Depending on the answer, some work I have to do will move from the
> "yacc" side to a later tool (not context free) side, or the reverse.
> 
> Frederic

The consequences of my proposal would be for Frederic's parser to be
changed so as to always to treat terms and atomic formulae essentially
as lists of tokens etc., not recognizing function or predicate
applications.

Bjarke has already written a prototype mixfix analyser - but this will
now need to be made stand-alone.  Till has written a sort checker.
Hopefully, these can eventually be combined with the parsers being
developed to give several full CASL analysers.


PROPOSAL FOR REWORDING:

  C.3 Disambiguation

  The given grammar for input syntax is quite ambiguous. This section
  provides some precedence rules for disambiguation. 
! 
  At the level of structured specifications, ambiguities of grouping
  are resolved as follows:
  ...
  The declaration of infix, prefix, postfix, and general mixfix
  operation symbols may introduce further potential ambiguities, which
  are partially resolved as follows (remaining ambiguities have to be
  eliminated by explicit use of grouping parentheses in terms, or by use
! of parsing annotations).  Note, however, that the recognition of the   
! exact applicative structure of mixfix atomic formulae and terms is
! inherently context-dependent.  Consequently, the precedence rules 
! for atomic formulae and terms given below cannot be incorporated in
! context-free parsing, but must be left to a subsequent phase of the 
! analysis.  For specifications that use only the standard prefix
! notation for application of predicates and operations, the
! precedence rules below are irrelevant, and the applicative structure
! of atomic formulae and terms is unambiguous.  A parsing annotation 
! may be introduced to indicate whether a particular specification or
! library uses mixfix notation or not.

-- Peter
_________________________________________________________
Dr. Peter D. Mosses             International Fellow  (*)

Computer Science Laboratory     mailto:mosses@csl.sri.com
SRI International               phone: +1 (650)  859-2200 ! CHANGED
333 Ravenswood Avenue           fax:   +1 (650)  859-2844
Menlo Park, CA 94025, USA       http://www.brics.dk/~pdm/

(*) on leave from DAIMI & BRICS, University of Aarhus, DK
    also affiliated to CS Department, Stanford University
_________________________________________________________