This document specifies what is means for a sequence of ASCII characters to be in Wysiwyg-ASCII Format. This spec is used as a component of the E family Common Syntactic Elements spec and adopts its Conformance language. Typical use: The check as to whether text conforms to this Wysiwyg-ASCII Format is expected to occur, for E family languages, after UTF-J4 Encoding (to produce ASCII containing Unicode escape sequences), and prior to UTF-J4 Decoding and newline canonicalization. No Invisible Control Characters[Src] Wysiwyg-ASCII text MAY contain the whitespace characters:
but MUST NOT contain other control codes. (Characters whose general category is "Cc".) In particular, Wysiwyg-ASCII text MUST NOT contain any '\t' (tab) characters. Ephemeral Newline CanonicalizationIn the typical use of this spec, carriage returns will disappear later during newline canonicalization. Therefore, unfortunately, we perform a essentually[1] the same newline canonicalization calculation here, whose results are thrown away once conformance to Wysiwyg-ASCII is determined. The following steps occur after this ephemeral canonicalization. No Trailing Whitespace[Src] Wysiwyg-ASCII text MUST NOT contain the sequence ' ' '\n' (space, immediately followed by newline). [Src] Wysiwyg-ASCII text MUST end with a '\n'. [Src] This last newline MUST NOT be immediately preceded by whitespace.
[1] It doesn't necessarily give the same results as the actual newline canonicalization, since the actual one is performed after UTF-J4 decoding, which can introduce, for example, new carriage return characters. [Src] Source text SHOULD not engage in this practice. Therefore, an advisor SHOULD issue an informative warning for all such cases. (This all lends further weight to the argument that newline canonicalization should happen between UTF-J4 encoding and decoding. Does Java really do it after decoding Unicode escapes?) |
||||||||||||
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
|