ERights Home data / common-syntax 
Back to: The Wysiwyg-ASCII Format No Next Sibling

Common Lexing Spec


Common Character Classes

Name   Definition Ascii subset
hspace
::=
' ' | '\t'
' ' | '\t'
whitespace
::=
' ' | '\t' | '\n'
' ' | '\t' | '\n'
digit10
::=
isDigit
'0'..'9'
digit8
::=
 
'0'..'7'
digit16
::=
 
'0'..'9' | 'a'..'f' | 'A'..'F'
uric
::=
  IETF-URICs
| '\\' | '|' | '#'
  'a'..'z' | 'A'..'Z' | '0'..'9'
| anyof("_$.-;/?:@&=+,!~*'()%\\|#")

The non-ascii cases are not yet tested in the current implementations.

In the uric production, each '\\' (backslash) character is converted to '/', and each '|' (vertical bar) character is converted to ':'. Therefore, the possible semantic values associated with this production do not include the backslash or vertical bar characters.

Common Token Types

In the following table, the bold names with initial capitals are the token types. The others are supporting productions.

Name   Definition Denotes
digit10s
 ::= 
digit10 ('_'? digit10)*
 
Integer
 ::= 
  '-'? '0x' digit16 ('_'? digit16)*
| '-'? '0' ('_'? digit8)*  # not yet implemented
/ '-'? digit10s

Precision-unlimited integer.

wholePart
::=
'-'? digit10s
 
fraction
::=
'.' digit10s
 
exponent
 ::= 
('e' | 'E') '-'? digit10s
 
Real64
::=
  wholePart fraction exponent?
| wholePart fraction? exponent

A real number representable in IEEE double precision.

charConst
::=
  '\\' anyof("btnfr\"'\\")
| '\\x' digit16*2 # not yet implemented
| !'\'' !'"' .
Char
 ::= 
'\'' (charConst | '"') '\''

A Unicode character.

String
::=
'"' (charConst
     | '\'' 
     | '\\' '\n'
    )* '"'

A string of Unicode characters.

In a literal string, a backslash followed by a newline is ignored -- the backslash eats the newline.

Note that Real64 includes both 0.0 and -0.0. These are distinct, even though they represent the same real number.

Rationale

We allow '_' (underbar) characters within digit sequences so that long digit sequences can be broken up for readability. For example, the number of cents in 1.3 million dollars can be written as "1_300_000_00". (Is it PERL that also allows this?)

For convenience, we allow but do not require single quotes to be escaped in double quoted literals, and vice versa.

For convenience, we allow multi-line string literals without per-line delimeters, even though reviewers can become confused about what they're looking at. Syntax highlighting SHOULD be used to make literals visibly distinct from non-literal source text during reviews.

 
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home data / common-syntax 
Back to: The Wysiwyg-ASCII Format No Next Sibling
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign