[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Literal syntax



Dear Till,

> may I remind you that there is a proposal for the concrete syntax
> of literals (numbers, string, lists, ...) in CASL, available under
> 
> 
http://www.informatik.uni-bremen.de/~cofi/LiteralExtension/literal.[tex,dvi,ps,ps.gz]

The title, date, and label of the above document are the same as for
the original Summary-Changes document, which is a bit confusing.

The full text of the proposal for literals is reproduced below, to
facilitate citing parts of it in any further e-mail discussion.

> The deadline for comments is (as for the list of changes to the summary)
> 
>         Monday, July the 5th
> 
> So far we have not received any comments, but we would like
> to have a discussion on literal syntax: it is not likely
> that the our proposal cannot be improved.

Sorry for the lateness of this response...

In summary: I'm sympathetic towards most of the proposed syntax for
literals themselves, but I'd much prefer to use annotations (rather
than attributes and new forms of datatype declarations) for linking
the literal syntax to declarations of operations.

In more detail (my comments are delimited by solid lines like this):
________________________________________________________________________

                                     CASL
                   The Common Algebraic Specification Language
                                     Summary
                      -- DRAFT LIST OF CHANGES: LITERALS --

                               by Till Mossakowski

                                    May 1999

  Changes

  App. C.2.1, p. C-4 and p. C-6

  Add:

            TERM   ::= ... | LITERAL

            TOKEN  ::= ... | DIGIT | ""

            NUMBER ::= DIGIT | NNUMBER

  App. C.4

  Replace:

       A WORD consists of a sequence of upper- and/or lower- case letters
       (A,...,Z and all the ISO Latin-1 national and accented letters
       except for the Icelandic `eth' and `thorn'), digits (0,...,9), and
       primes (').

  by:

       A WORD consists of a sequence of upper- and/or lower- case letters
       (A,...,Z and all the ISO Latin-1 national and accented letters
       except for the Icelandic `eth' and `thorn'), digits (0,...,9), and
       primes ('). It may not entirely consist of digits, nor may it
       contain an `E' preceding or following a digit. This special syntax
       allows to use the binary operation E within floating point numbers
       without the need to insert spaces between the E and the digits.
________________________________________________________________________

I agree that it would be nice to allow standard notation like 2.7E6 to
be written without spaces.  But I don't see that this requires banning
the use of `E' following or preceding a digit in all words.  My
suggestion is to replace the last two sentences above by:

       It may not start with a digit (various sequences of characters
       starting with a digit are reserved for literal numbers).

This should simplify the grammar for WORD below considerably.  Perhaps
NNUMBER can then be eliminated altogether too, with single digits
being reserved for literal numbers?  

One may then add something like:

            FLOATING       ::= NUMBER . NUMBER 'E' OPT-SIGN NUMBER
            OPT-SIGN       ::= + | - | 

Use of this notation would require that an operation for E has been
declared.  (Without the above addition of FLOATING, 2.7E6 would have
to be written as `2.7E 6' or `2.7E+6'.)

It might be good to cater also for binary, octal, and hexadecimal
notation for numbers.  Isn't there a rather standard notation for
indicating the base, by prefixing a sequence of digits by `2#', `8#',
or `16#'?  It should be easy to add this (now or in some later
extension), provided that words are not allowed to start with a digit;
otherwise, one might need to remove further patterns of letters and
digits from WORD...
________________________________________________________________________


       This is specified by the grammar:

            WORD ::= E NON-DIGIT-WORD
                   | NON-E-WORD

            NON-E-WORD ::= NON-DIGIT-E-WORD
                         | DIGIT NON-E-WORD

            NON-DIGIT-E-WORD ::= NON-E-LETTER NNUMBER
                               | NON-E-LETTER DIGIT
                               | NON-E-LETTER WORD
                               | NON-E-LETTER

            NON-DIGIT-WORD ::= NON-DIGIT-E-WORD
                             | E NON-DIGIT-WORD


  Furthermore, add:

            FRACTION       ::= NUMBER . NUMBER
            NNUMBER ::= DIGIT DIGIT | DIGIT NNUMBER
            NUMBER ::= DIGIT | NNUMBER

            CHAR ::= " " | ! | '\"' | # | $ | ... | '\'' | ... | '\\' | ...
                   | \n | \t | \r | ...
                   | \000 ... | \255
            QUOTED-CHAR ::= ' CHAR '
            STRING ::= '"' CLOSE-STRING
            CLOSE-STRING ::= CHAR '"'
                           | CHAR CLOSE-STRING

       Only WORD, FRACTION, NNUMBER, QUOTED-CHAR, STRING are the
       non-terminals of the lexical syntax. The other non- terminals are
       just auxiliary.

       (N.B. There is a difficulty in using double quotation marks in the
       grammar: sometimes, they have to stand for themselves. In this
       case, I enclosed them into single quotes. Till)

  The following section is meant to be added as a separate section to 
  appendix C.


  SYNTAX FOR LITERALS

  In this section, several attributes for operations are introduced that can
  be used to provide a literal syntax for numbers, strings and lists.


  Literal syntax for numbers

       LITERAL ::= NNUMBER
________________________________________________________________________

Each single digit could be reserved too - or is it important to allow
the use of the constants `0' and `1' in non-numerical specifications?
________________________________________________________________________

       OP-ATTR ::= ... | concatdigits

  The attribute for declaring an operation to be used for concatenation of
  digits within a number is written `concatdigits'. Only one binary operation
  f within a specification SPEC is allowed to have this attribute, otherwise,
  the specification is ill-formed.

  The attribute has the effect that an NNUMBER of form d1 ...dn (where
n>1 and
  each di is a DIGIT) is translated to the (abstract syntax of) the term
  f(f(...f(t_1,t_2) ...,t_n-1),t_n) where ti is the abstract syntax tree for
  di.

  Vice versa, an abstract syntax tree corresponding to a term of the above
  form which is maximal (i.e., it is not a subterm of a larger term of the
  same form) is expected to be printed as d1 ...dn.
________________________________________________________________________

It seems that the proposed attribute cannot be merely part of concrete
syntax, but must be added to the abstract syntax of CASL too.  Thus
corresponding changes would be needed in Apps. A and B, and in the
text of the body of the summary.  This should perhaps have been stated
explicitly in the proposal.

An alternative might be to regard the proposed attribute as an
ANNOTATION.  Parsers and formatters could take account of it -
complaining if they found more than one such annotation - and there
would be no need to let it affect well-formedness: the abstract syntax
tree produced by the above translation would determine the entire
semantics of the specification.

If there are vital reasons for preferring attributes to annotations
here, kindly explain them.

Similar objections apply to the other parts of the proposal below.
________________________________________________________________________

  If there is no operation with a `concatdigits' attribute, then an
NNUMBER is
  not recognized as a well-formed LITERAL.

       LITERAL ::= ... | FRACTION
       OP-ATTR ::= ... | decimalpoint

  The attribute for declaring an operation to be used for evaluating the
  decimal point within a sequence of digits within a number is written
  `decimalpoint'. Only one binary operation f within a specification
SPEC is
  allowed to have this attribute, otherwise, the specification is ill-formed.

  The attribute has the effect that a FRACTION of form n1.n2 (where each
ni is
  a NUMBER) is translated to the (abstract syntax of) the term f(t_1,t_2)
  where ti is the abstract syntax tree for ni, i=1,2.

  Vice versa, an abstract syntax tree corresponding to a term of the above
  form which is maximal (i.e., it is not a subterm of a larger term of the
  same form) is expected to be printed as n1.n2.

  If there is no operation with a `decimalpoint' attribute, then a
FRACTION is
  not recognized as a well-formed LITERAL.

  Due to the special status of the character E in the lexical syntax of WORDS,
  it is possible without introducing further syntax to write
exponentations of
  form FRACTION E NUMBER in the case that there is a binary infix
operation E
  in the signature.
________________________________________________________________________

The point above is that you are wanting to avoid the need for spaces
around the E.

If E is also used as a variable (as in E=mc^2 :-), the declaration of
E as an infix operation may give ambiguities in the presence of an
invisible `__ __' operation.  Since many users may want to import the
basic datatype for numbers, one should be careful to avoid making
assumptions about their general style of notation.

If there is no special status of E in the lexical syntax for words,
one could identify the binary operation to use to interpret E in
FLOATING (see above) in the same way as for the dot in FRACTION.
This would give extra uniformity to the proposal.
________________________________________________________________________

  Literal syntax for strings

       LITERAL ::= ... | STRING
       OP-ATTR ::= ... | concatchars

  The attribute for declaring an operation to be used for concatenation of
  digits within a number is written `concatchars'. Only one binary
operation f
  within a specification SPEC is allowed to have this attribute, otherwise,
  the specification is ill-formed.

  The attribute has the effect that an STRING of form "c1 ...cn" (where n>1
  and each ci is a CHAR) is translated to the (abstract syntax of) the term
  f(t_1,f(t_2, ...f(t_n,t)...)) where ti is the abstract syntax tree for ci,
  and t is the abstract syntax tree for "".
________________________________________________________________________

Is "" supposed to be declared as a constant?  It would seem more
uniform to let the declared constant have an arbitrary name, and
identify it in the same way as the concatenation operation.
________________________________________________________________________

  Vice versa, an abstract syntax tree corresponding to a term of the above
  form which is maximal (i.e., it is not a subterm of a larger term of the
  same form) is expected to be printed as "c1 ...cn".

  If there is no operation with a `concatchars' attribute, then a STRING is
  not recognized as a well-formed LITERAL.


  Literal syntax for lists

       DATATYPE-DECL   ::= SORT "::=" ALTERNATIVE "|"..."|"
                                      ALTERNATIVE , LIST-BRACKETS
       LIST-BRACKETS ::= brackets SIGNS-BACKETS PLACE SIGNS-BRACKETS

  The attribute for declaring a list-like syntax for a datatype is written
  `brackets b1 __ b2'. Thus enclosing datatype declaration declaring the sort
  s must consists of excatly two ALTERNATIVEs: the first has to be a constant
  c of arbitrary argument type, while the second is a constructor f with
  exactly two argument sorts, the second of which has to be s.
________________________________________________________________________

Although the above proposal allows the introduction of neat notation
for literals of various sorts of collections (e.g. {e1,...,en} for
sets, as well as [e1,...,en] for lists), I find the means used to
determine the translation quite ugly.  In particular, it seems strange
to REQUIRE the use of a DATATYPE-DECL; what about sub-languages that
don't support this construct?

Couldn't one instead use annotations for indicating that any outfix
operation `b1 __ b2' is a unary bracketing constructor, determining at
the same time the intended `nil' and `cons' (or `conc') operations
used in translating non-unary applications?  How the latter
constructors have been declared is then irrelevant.
________________________________________________________________________

  The attribute leads to an extension of the syntax for LITERALs:

       LITERAL ::= ... | b1 b2
                       | b1 TERM , ... , TERM b2

  A list of form b1 t1,...,tn b2 is (where n>=0 and each ti is a TERM) is
  translated to the (abstract syntax of) the term f(u_1,f(u_2,
  ...f(u_n,d)...)) where ui is the abstract syntax tree for ci, and d is the
  abstract syntax tree for c.

  Vice versa, an abstract syntax tree corresponding to a term of the above
  form which is maximal (i.e., it is not a subterm of a larger term of the
  same form) is expected to be printed as "b1 t1,...,tn b2".
  ------------------------------------------------------------------------

I hope that the above may help you clarify your proposal.  I'm
intending to finalize the new release of the Summary by this coming
weekend, and announce it early next week.  I'll ask Bernd to decide
whether your (perhaps revised) proposal for literals should be
included in the new release or not.  In the meantime, I look forward
to your reactions to my comments.

Cheers,

-- Peter
_________________________________________________________
Dr. Peter D. Mosses             International Fellow  (*)

Computer Science Laboratory     mailto:mosses@csl.sri.com
SRI International               phone: +1 (650)  859-2200
333 Ravenswood Avenue           fax:   +1 (650)  859-2844
Menlo Park, CA 94025, USA       http://www.brics.dk/~pdm/

(*) on leave from DAIMI & BRICS, University of Aarhus, DK
    also affiliated to CS Department, Stanford University
_________________________________________________________