REGULAR EXPRESSION TYPES FOR XML

                           Benjamin C. Pierce
                       University of Pennsylvania

The recent rush to adopt XML can be attributed, in part, to the hope that
the static typing provided by DTDs (or more sophisticated mechanisms such
as XML-Schema) will improve the robustness of data exchange and
processing.  However, although XML DOCUMENTS can be checked for
conformance with DTDs, current XML processing languages offer no way of
verifying that PROGRAMS operating on these documents will always produce
conforming outputs.

We are designing a domain-specific language called XDuce for XML
processing.  The main novelties of XDuce are

  1) A type system based on REGULAR EXPRESSION TYPES.  Regular expression
     types are a natural generalization of DTDs, describing structures in
     XML documents using regular expression operators (@*@, @?@, @|@,
     etc.) and supporting a powerful notion of subtyping.

  2) A corresponding notion of REGULAR EXPRESSION PATTERN MATCHING, which
     supports very concise patterns for extracting information "from the
     middle" of structured sequences.

[Joint work with Haruo Hosoya and Jerome Vouillon, with lots of
contributions from Phil Wadler and Peter Buneman.]