Go backward to 3 Character encodings
Go up to Top
Go forward to 5 Markup and display formats

4 SGML markup for CASL

SGML stands for standard generic markup language. Its basic idea is to provide a format for defining grammars that can be used to identify structural elements of texts. The unstructured parts will be left as character sequences between the structural elements. The more formal the text, the more of it can be captured as structural components and less as unstructured sequences of characters. Programming and specification languages are examples of very formal texts, so is an individual program or an individual specification.

SGML basically defines LL(1) grammars with very simple syntactic properties from the lexeme and upwards. The syntactic property is to some extent user definable. An SGML document consists of three parts: SGML parameter part, the DTD (document type definition), and the document contents itself. The SGML parameter part defines things such as character sets, length of SGML identifiers being used, whether shorthands should be allowed, etc. The DTD is perhaps the most important part of a document data base, as it defines what structural properties the body may have, i.e., the actual grammar for the body. The DTD is flexible in that probably most LL(1) notations may be defined using an appropriate DTD and utilising shorthands. An example is the use of "text" to e.g. stand for <quote>text</quote>. This use requires heavy coding of the DTD, so normally the <TAG>...</TAG> tagged block notation will prevail in a document body.

The DTD provides the document structure. What this should be is in general by no means obvious, and this is also non-obvious even in the case for the CASL language. Immediately one would define a DTD for the abstract syntax of CASL, using the symbols of the abstract grammar as suitable tags. This gives a fairly primitive markup for any specific specification. Alternatively, we know that the declarations of a specification actually defines a context-free grammar, which has every type-correct term as its language (basic construction of the term-algebra). This language (barring complications of overloading) can be made LL(1) quite easily¹. Thus it is feasible to encode the full structure of an individual specification as SGML code, with no unstructured text elements (except for comments, that is) in the body of the document. This requires a specific DTD for every specification, but this may probably be achieved using a technique called HUB files.

A CASL specification S can then be represented in SGML the following way: A generic CASL-DTD defines the CASL language as such, and the CASL specification augments this DTD, giving a combined S-CASL-DTD. The S-CASL-DTD defines all the syntactic declarations of the specification S. The body of the SGML-document contains the axioms and expressions of the specification, and the whole SGML-document represents the complete specification S.

This gives a (linear) concrete syntax for every CASL specification, which also is the specification's interchange format. The format is fully parsable using standard SGML tools, and may be presented by supplying various formatting information packages for standard SGML formatting tools. It is also possible to utilise standard SGML-editors to input and edit specifications (on the syntactic level), providing a more user-friendly interface to writing specifications. However, this format will be much more restrictive than BKB-CASL.

CoFI Note: T-1 ---- 7 April 1997.
Comments to Magne.Haveraaen@ii.uib.no