The Power of Scheme's Ellipses

/ elang / quasi / terms

The Power of Ellipses
Borrowed from Scheme

Like regular expressions, our quasi-literal term-tree patterns and expressions have quantifiers. These quanitifiers annotate the pattern or expression to its left.

'?' means optional -- zero or one occurrence.
'+' means one or more occurrences.
'*' means zero or more occurrences.

These quantifiers normally just annotate patterns. Why do we also apply them to expressions? In order to incorporate the expressiveness of Scheme's "..." notation. This is best explained by transposing Scheme's examples into term trees:

The text in the boxes below is from the Scheme FAQ.

5.6. How are ellipses (...) "counted" during macro expansion?

Section 4.3.2 of R5RS states

Pattern variables that occur in subpatterns followed by one or more instances of the identifier ... are allowed only in subtemplates that are followed by as many instances of .... They are replaced in the output by all of the subforms they match in the input, distributed as indicated. It is an error if the output cannot be built up as specified.

It is important to understand what is meant by "followed" in the above. Take for example the pattern

(a (b c ...) (d e ...) ...)

What matters is how many ellipsis structurally follow a variable rather than lexically. The variables a, b, c, d and e are followed by three, three, three, two and two ellipses respectively, but struturally they are followed by none, none, one, one and two ellipses respectively. The difference between the structural and lexical ellipsis counts arises because ellipses only apply to the pattern/template immediatetly preceeding them. Thus, the above pattern can supply the input for the template

((a c ...) (b d ...) (e ...) ...)

The structural ellipsis count of the template variables matches that of the pattern, whereas the lexical template count is completely different.

Likewise for term trees of course. For us, the corresponding example quasi-literal expression ("template" above) might be

term`[[$a, $c*], [$b, $d*], [$e*]*]`

Note that the square brackes are just syntactic sugar for a term with ".tuple." for a functor name. So the above is equivalent to:

term`.tuple.(.tuple.($a, $c*), .tuple.($b, $d*), .tuple.($e*)*)`

Here's a full example where we use a quasi-literal pattern ("pattern" above) to bind the variables, which we then use as input for the above expression ("template"):

? pragma.syntax("0.8")

? def term`[@a, [@b, @c*], [@d, @e*]*]` := term`[1, [2, 3, 4], [5], [6, 7, 8]]`
# value: term`[1,
#              [2, 3, 4],
#              [5],
#              [6, 7, 8]]`
 
? a
# value: term`1`
 
? b
# value: term`2`
 
? c
# value: [term`3`, term`4`]
 
? d
# value: [term`5`, term`6`]
 
? e
# value: [[], [term`7`, term`8`]]
 
? term`[[$a, $c*], [$b, $d*], [$e*]*]`
# value: term`[[1, 3, 4],
#              [2, 5, 6],
#              [],
#              [7, 8]]`

One issue that is not addressed by the R5RS explanation of macro expansion is how ellipses templates are expanded when they combine variables that matched input of different size. The canonical example for this is

(define-syntax foo
  (syntax-rules ()
    ((foo (a ...) (b ...)) '((a b) ...))))
(foo (1 2) (3 4 5)) ;=> ?

In different Schemes this is either an error or produces the result '((1 3) (2 4)), i.e. the "oversized" matches are automatically shortend as needed.

When these variables are different sizes, since this is a likely indicator of a bug, and because its easily repaired when it's intentional, E considers this case to be an error.

? def term`foo([@a*],[@b*])` := term`foo([1,2],[3,4,5])`
# value: term`foo([1, 2],
#                 [3, 4, 5])`
 
? a
# value: [term`1`, term`2`]
 
? b
# value: [term`3`, term`4`, term`5`]
 
? term`[[$a,$b]*]`
# problem: Failed: Inconsistent shape: 2 vs 3

So let's shorten b and see what success looks like

? def b2 := b(0,2)
# value: [term`3`, term`4`]
 
? term`[[$a,$b2]*]`
# value: term`[[1, 3],
#              [2, 4]]`

But when the variable is too flat, so to speak (insufficient dimensions or rank), E takes the same permissive attitude as shown by the following Scheme example.

Some Schemes relax the above "matching ellipses count" requirement by allowing allowing template variables to be followed by at least as many ellipses as their corresponding pattern variables. The resulting expansion repeats the matched input. Thus

(define-syntax foo
  (syntax-rules ()
    ((foo (a ...) (b ...) ...) '(((a b) ...) ...))))
(foo (1 2) (3 4) (5 6 7)) ;=> '(((1 3) (1 4)) ((2 5) (2 6) (2 7)))

produces the desired result. Note however that in any ellipses sub-template there must be at least one template variable for which the ellipsis count is exactly the same as in the pattern, otherwise the macro expander would not be able to determine how many times to repeat the matched input of the template variables with ellipsis counts higher than in the pattern.

? def term`foo([@a*], [@b*]*)` := term`foo([1,2], [3,4], [5,6,7])`
# value: term`foo([1, 2],
#                 [3, 4],
#                 [5, 6, 7])`
 
? a
# value: [term`1`, term`2`]
 
? b
# value: [[term`3`, term`4`], [term`5`, term`6`, term`7`]]
 
? term`[[[$a,$b]*]*]`
# value: term`[[[1, 3],
#               [1, 4]],
#              [[2, 5],
#               [2, 6],
#               [2, 7]]]`

Since literal data is repeated as many times as necessary, variables that are too flat, like a above, are treated in a way that's intermediate between the treatment of literal data and the treatment of variables, like b above, of adequate dimensionality.

What are some other good examples of the power of Scheme's "..." that we should also use here?

Acknowledgements

Who should we acknowledge for Scheme's "..." system? Darius Bacon writes:

I'm not sure, but the earliest I've seen it was in a Compiler Construction volume around 1972 -- there was a paper (by William Waite? can't remember) about an essentially equivalent rewriting system with a different syntax. The Scheme report did not acknowledge it, so I guess it's a reinvention.

(any further information about the origin of the idea would be appreciated)

We also thank Matthias Radestock, the editor of the FAQ we quote from above.

Dean Tribble, who suggested incorporating the power of Scheme's "..." into the quasi-literal handing of Term trees.

Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.

/ elang / quasi / terms

ELib E Language Smart Contracts Related

Download FAQ API Mail Archive Donate

report bug (including invalid html)