Mappings between presentation markup and semantic markup for
variable size objects
Bill Naylor
Ontario Research Centre for Computer
Algebra
The University of Western Ontario
London Ontario CANADA N6A
5B7
bill@orcca.on.ca
Abstract:
In this paper we give a brief overview of the paper "meta stylesheets
for the Conversion of Mathematical Documents into Multiple forms"
[
1]. We propose an extension to the techniques proposed
in [
1] in order to overcome the problem of
specifying notation for objects of a varying size, e.g.
matrices
or
vectors. The extensions proposed
in this paper are based on regular expression or XML Schema like
ideas. We discuss the basics of using XSLT for performing
the calculations defined by the template functions. We discuss how we
may use the
document element to access external processes which
may be required to perform calculations for which XSLT is
ill-suited. Finally, we suggest a way in which a user of the extended MathML
we propose may specify different notation styles at different levels
within an object.
It is an unfortunate fact that due to the varied history of
mathematics, mathematical notation is ambiguous. There is a many to
many relationship between the meaning of a mathematical object and the
notation which mathematicians use to write them down (as noted in
chapter 4 of the MathML specification [2]). A document
marked up using the OpenMath ([4] and [3])
markup language is a possible alternative approach which does not
suffer from this problem. However, the OpenMath concept does not
address the issue of presentation markup. Ideally, we want some markup
which gives flexible presentation markup for semantically unambiguous
mathematics.
We discuss a possible solution to this quandary in our paper
[1] in which a method is proposed whereby a set of
ntn files (which may be held in a parallel directory structure
to the OpenMath content dictionaries, in a similar way to the
sts small type system) hold information which declares a mapping
between an OpenMath symbol and a variety of different
presentations for that symbol, marked up using presentation
MathML. These different presentations will be matched against a semantic-template, which will provide a prototype for the
mathematical object. The prototype will typically be an OMA
element, where the first child is the symbol for which we are giving
notations. The ordering of the parameters for a symbol are
conserved by means of xref and id attributes on the
presentation and the semantic side of the mapping respectively.
Example: 1
We give an example of the entries to the notation element for the
BesselJ symbol, which detail the notations given in the traditional
style; , and that used by Maple 7; BesselJ(n,m)
:
<Notation>
<version precedence="10"> <!-- The default style -->
<math><mrow>
<msub><mi>J</mi>
<mrow>
<value xref="arg1">ν</value>
</mrow>
</msub>
<mo>⁡</mo>
<mfenced>
<value xref="arg2">z</value>
</mfenced>
</mrow></math>
</version>
<version precedence="10" style="maple"> <!-- The Maple Notation -->
<math><mrow>
<mi>BesselJ</mi><mo>⁡</mo>
<mfenced>
<value xref="arg1">n</value>
<value xref="arg2">z</value>
</mfenced>
</mrow></math>
</version>
<semantic_template><OMOBJ>
<OMA><OMS cd="bessel" name="BesselJ"/>
<OMV name="nu" id="arg1"/>
<OMV name="x" id="arg2"/>
</OMA>
</OMOBJ></semantic_template>
</Notation>
Occasionally it occurs that part of the presentation for a notation
for a particular symbol do not correspond to a particular part of the
prototype of the semantic template, for example the superscript n in the
partial derivative presentation:
is redundant, (implicitly ), and so has no corresponding
part in an efficient scheme for a semantic template. As a solution to
this problem, we have proposed that a template function, marked
up in OpenMath markup, should be present in the ntn file. The function
may have parameters, the example above however would have a nullary
template function. A calculation
represented in the body of the function, will be an (OpenMath)
expression involving its parameters and possible references (via xref pointers) to the
prototype. For the example above the template function would be
, where and would be references
to objects in the prototype (the XML markup for this particular case
is given in Appendix 1). The particular presentation which is required
in a specific document will be specified either by a particular
attribute, or via a set of defaulting mechanisms (which are detailed
in [1]). This mechanism allows an author freedom to use
the notation he (or she) wants, whilst retaining precise mathematical
meaning.
Example: 2
In this example we assume the availability of the Notation element
given in example 1. An author may select the Maple style of
notation by giving the following markup. It should be noted that the
ordering of the parameters will be as given in the respective content
dictionary. The following shall be displayed as BesselJ(1,2):^{1}
<xm:apply><xm:BesselJ style="maple"/>
<xm:mn> 1 </xm:mn> <xm:mn> 2 </xm:mn>
</xm:apply>
It would appear that these mechanisms are sufficient for providing a
presentation/semantics mapping for mathematical objects which take a
specific number of parameters. However the situation is more complex in
the case of mathematical operators which take a variable, or arbitrary
number of parameters, for example nary functions like plus which in
OpenMath terminology is represented by the symbol:
<OMS cd="arith1" name="plus"/>
or a constructor of objects with no predetermined size, for example
the basic vector constructor, in OpenMath this is represented by the symbol:
<OMS cd="linalg2" name="vector"/>
Some attempt has been made to address this problem in the OMDoc
[5] specification with the introduction of the presentation element. The style of the presentation may be
controlled via a number of attributes which control various aspects
of the presentation, e.g. the bracketing style, fixity etc. However
notations exist for which there is no obvious way to include the
relevant presentation markup in a
presentation element apart from including XSLT in an XML
CDATA section in the OMDoc. This is discouraged even in the OMDoc
specification, which recognizes that hand-coding XSLT is "a tedious
and error-prone process". An example of such a presentation is the
standard presentation for a simple continued fraction:
A regular grammar allows specification of classes of sentences in a
string. For example:
- The regular expression ''
.
'' matches any single character,
- The regular expression ''
*
'' matches any sequence of characters.
- The regular expression ''
.\{4,6\}
'' matches any sequence of
characters which is 4, 5 or 6 characters long.
It is matches of the final type with which we shall be primarily interested.
In a similar way XML Schema [6] allows
specification of a class of objects in an XML document in an XML
format. One may specify sequences of elements with particular types,
containing children of particular types. It is also possible in XML
Schema to restrict the names and values of attributes belonging to a
specific element. Text content of an element may be restricted in an
exact manner. The subset of the regular expression grammar that we are
especially interested in when expressed in the XML Schema framework,
is shown by the following example:^{2}
<xsd:sequence minOccurs="1" maxOccurs="5">
<xsd:element name="mn" type="xsd:decimal"/>
</xsd:sequence>
This fragment would match a sequence of elements which
consisted of between 1 to 5 occurrences of the element
"<mn> nn.nn </mn>"
, where nn.nn
denotes some decimal content.
We incorporate some of the ideas from XML Schema in the extended
MathML language we propose. However we have necessarily made some
changes for the following reason. In order to use the template
function technology outlined above, it is not sufficient to merely
have a way of specifying the sets of objects we would like to operate
on (as with XML Schema or regular grammars), we must also have a
handle on the individual instances of the objects. We also realise
that we do not require the full power of a regular grammar (or XML
Schema) for our purposes, for example, we do not require the ability
to choose members of a class.
3 Extension to the Specification Presentation MathML
With the considerations of the previous section in mind we propose a
slight change to the part of the XML Schema language that we shall
utilise. A dtd fragment detailing this is shown below:
<!ELEMENT sequence (minOccurs,maxOccurs,particle)>
<!ATTLIST sequence var NMTOKEN #REQUIRED>
<!ELEMENT minOccurs (#PCDATA)>
<!ATTLIST minOccurs val NMTOKEN #IMPLIED
xref NMTOKEN #IMPLIED>
<!ELEMENT maxOccurs (#PCDATA)>
<!ATTLIST maxOccurs val NMTOKEN #IMPLIED
xref NMTOKEN #IMPLIED>
<!ELEMENT particle ANY>
The meaning and reasons for including these different elements follows:
- The sequence element is the container for the
specification of a repeated sequence.
- The minOccurs and maxOccurs elements have similar purposes
to the minOccurs and maxOccurs attributes of XML Schema
and specify lower and upper bounds respectively to the values that the
identifier variable var may take. Together these two elements
imply the number of repetitions of the body within the particle
element which must occur.
- The particle element holds the body of the MathML fragment which
is to be repeated.
- The attribute var of the sequence element holds an
identifier to identify the repeated instances of the body held in the
particle element.
- The attribute val (or xref) of the elements maxOccurs and minOccurs respectively hold the minimum and
maximum values (or references to template functions which calculate
these values) taken by the repetition index (identified by the
attribute var).
Example: 3
we give an example of a notation element for the nary
function plus.
<Notation name="plus">
<version precedence="10">
<math>
<value xref="itheltPlus">1</value>
<sequence var="i">
<minOccurs val="2"/><maxOccurs xref="sizePlus"/>
<particle>
<mo>+</mo><value xref="itheltPlus">i</value>
</particle>
</sequence>
</math>
</version>
<semantic_template>
<OMOBJ><OMA>
<OMS cd="arith1" name="plus"/>
<OMV name="seq" id="sequencePlus"/>
</OMA></OMOBJ>
<OMBIND id="sizePlus"><!-- return the length of a sequence -->
</OMBIND>
<OMBIND id="itheltPlus"> <!-- return the i'th element of a sequence -->
</OMBIND>
</semantic_template>
</Notation>
The template functions of the preceding, perform utility operations
like selecting the i'th parameter to an nary operator, or selecting
the i'th element of a vector.
More complex situations, for example
notation specification for the structured matrices, specified in the
content dictionary linalg5 involve complicated mappings
between the presentation and semantics, these often involve a two
dimensional nesting of the XML Schema (like) expressions.
It is sometimes necessary to specify different styles at different
nesting levels within the notation for some mathematical objects, for
example, if it was required to print a matrix with style="round" which had entries which where rational numbers and it
was required that they appear with style="display_style". This
is no problem if all the content of the matrix appears in the Extended
MathML markup, for example, in order to markup a matrix which was to
appear as:
(we are assuming that this is the display intended by the options style="round" for matrix and style="display_style" for
rationals.)
one could use the following Extended MathML markup:
<xm:apply>
<xm:matrix style="round"/>
<xm:apply>
<xm:matrixrow/>
<xm:apply>
<xm:rational style="display_style"/>
<xm:mn>1</xm:mn><xm:mn>2</xm:mn>
</xm:apply>
<xm:apply>
<xm:rational style="display_style"/>
<xm:mn>1</xm:mn><xm:mn>3</xm:mn>
</xm:apply>
</xm:apply>
<xm:apply>
<xm:matrixrow/>
<xm:apply>
<xm:rational style="display_style"/>
<xm:mn>1</xm:mn><xm:mn>5</xm:mn>
</xm:apply>
<xm:apply>
<xm:rational style="display_style"/>
<xm:mn>1</xm:mn><xm:mn>7</xm:mn>
</xm:apply>
</xm:apply>
</xm:apply>
However we do have a problem if the elements of the output are
implicit in the style for that object. For example, the markup for
a 3x3 identity matrix with style="square" in Extended MathML would be:
<xm:apply>
<xm:identity style="square"/>
<cn>3</cn> <!-- N.B. this element does not occur in the presentation, -->
<!-- so we use content MathML. -->
</xm:apply>
This gives us no freedom to specialise the specification of the
'0
's and '1
's which will appear in this markup. One
could give a different version associated with the identity symbol for each different style of '0
' or '1
',
but this was thought undesirable as it would mean that the number of
version elements necessary to describe every presentation for a
symbol would be exponential in the depth of implicit features in the
notation. Another reason why do not proceed in this direction is that
it is conceivable that there may exist notations, which have some
recursive nature to their implicit features, and this would imply an
infinite nature to the number of versions which where
necessary. Clearly it is impossible to provide these. In order to
avoid this problem we propose supplying this information in the value
of the style attribute which must be supplied to specify anything but a
default style.
The value of the style element should be a concatenation of the value
used to specify the style for the outermost element, the '[
'
character, a sequence of strings to specify styles for inner elements
separated by the ';
' character and the ']
' character. The strings
used to specify inner elements should be modeled on the string
"name:val"
, where name
is the name of the element and
val is the value used to specify the style
attribute for this
element (which may have internal elements, in which case the same
scheme is used recursively to provide a value for val
). i.e.
style="val[name1:val1;name2:val2
]"
where val
is the name of the outermost
element, and name1
, name2
, etc are the names of the
inner elements, which have values val1
, val2
, etc.
For example if it was required to display a 3x3
identity matrix as:
we would use the following markup:
<xm:apply>
<xm:identity style="square[zero:italics;one:bold]"/>
<cn> 3 </cn>
</xm:apply>
The main theme of the paper [1] concerns the process of
producing meta stylesheets to construct XSLT stylesheets [7] for
converting documents written in an extended MathML into another
form, for example presentation MathML. Now, one of the most difficult
problems encountered in the construction of these meta stylesheets, is
how to deal with the template functions. It is clear that for
different template functions one may (not always, as certain
mathematical objects are structurally similar with respect to their
components, e.g. consider a vector and a list object) require
different meta stylesheets in order to perform the
desired translation. This could be seen as a never ending job if one
is allowed total freedom in the OpenMath symbols used in the template
functions (especially since OpenMath is an extensible markup
mechanism). It is therefore necessary to specify a restricted
vocabulary for use within the template functions. It may also occur
that XSLT is ill-suited for performing some of the calculations
implied by the template functions. We may use the XSLT document
function, element in this case.
One use of the document element would be to execute a Java
method designed to perform the required calculation via a
servlet. The Java method could take XML elements as its parameters and
return XML as its return value, e.g. it might operate on an OpenMath
expression and return a minimal form.
By the use of the extensions suggested in section 3 to
the scheme suggested in [1] we may specify effective
presentation/semantic mappings for mathematical objects which are not
of a fixed size. With respect to a database of notation/content
dictionary mappings written using this scheme, it will then be
possible to specify mathematical notation which has an unambiguous
meaning. We also suggest an effective markup which allows
specification of different styles for implicit objects appearing at
any level within a notation. With respect to implementing some
conversions of this extended markup to some other form (for example,
presentation MathML), we suggest the use of the XSLT document element, in conjunction with servlets to communicate with
external processes for performing calculations which are ill-suited
to XSLT.
^{3} We display a
notation element for the partialdiff
symbol. The version of the
notation used here is that which will display as
. Where the function f is differentiated with
respect to x and y. The order of the
differentiation is and
with respect to x and y respectively and .
<Notation>
<version precedence="100">
<image src="partialDiffDefault.gif"/>
<tex>
\frac{\partial^
...
</tex>
<math>
<mrow>
<mfrac>
<msup><mo>∂</mo><mi xref="sum">n</mi></msup>
<mrow><msup>
<mrow><mo>∂</mo>
<mi xref="x">x</mi>
<mrow>
<msub xref="n1"><mi>n</mi><mn>1</mn></msub></msup>
<mrow><mo>∂</mo>
<mi xref="y">y</mi>
<mrow>
<msub xref="n2"><mi>n</mi><mn>2</mn></msub></msup>
</mrow>
</mfrac>
<mi>f</mi>
<mo>⁡</mo>
<mfenced>
<mi xref="x">x</mi>
<mi xref="y">y</mi>
</mfenced>
</mrow>
</math>
</version>
<semantic_template>
<OMOBJ> <!-- this OMOBJ is the prototype -->
<OMA><OMS cd="calculus2" name="partialdiff"/>
<OMA><OMS cd="list1" name="list"/>
<OMA><OMS cd="list1" name="list"/>
<OMI id='x'> 1 </OMI>
<OMV name="n1" id="n1"/>
</OMA>
<OMA><OMS cd="list1" name="list"/>
<OMI id='y'> 2 </OMI>
<OMV name="n2" id="n2"/>
</OMA>
</OMA>
<OMV name="f" id="functionName"/> <!-- this is the function -->
</OMA>
</OMOBJ>
<!-- the following function specification calculates the sum of the the two
orders located in the prototype -->
<OMBIND id="sum">
<OMS cd="fns1" name="lambda"/>
<OMBVAR>
</OMBVAR>
<OMA>
<OMS cd="arith1" name="plus"/>
<OMV name="n1" xref="n1"/>
<OMV name="n2" xref="n2"/>
</OMA>
</OMBIND>
</semantic_template>
</Notation>
We give some definitions of symbols which have been used in the
preceding, but which are not part of the standard OpenMath content
dictionaries at present.
- partialdiff:
- This symbol represents the partial-differentiation
of a function with respect to a number of
variables. The arguments should be given in the following form:
- first
- argument - A list of pairs, where the first element of
each pair is an index to the variable with respect to which the
differentiation is taken, the second element is the order of that
differentiation.
- second
- argument - The function on which the partial
differentiation is taking place.
- 1
- Bill Naylor, Stephen Watt: Meta Stylesheets for
the Conversion of Mathematical Documents into Multiple Forms,
Proceedings of the first International Workshop on Mathematical
Knowledge Management, September 2001
- 2
- MathML: http://www.w3.org/TR/MathML2
- 3
- O.Caprotti, D.P.Carlisle, A.M.Cohen: The
OpenMath Standard, February 2000
- 4
- The OpenMath Society: http://www.openmath.org/
- 5
- OMDoc, A standard for Open Mathematical
Documents:
http://www.mathweb.org/omdoc/
- 6
- XML Schema: http://www.w3.org/XML/Schema
- 7
- XSLT: http://www.w3.org/TR/xslt
Footnotes
- ...BesselJ(1,2):^{1}
- in this markup it is assumed that the prefix
xm has been bound (in an enclosing element) to a URI indicating
the Extended MathML namespace.
- ... example:^{2}
- The xsd prefixes
which appear on some elements will be declared to imply the XML Schema
namespace; http:www.w3.org/2001/XMLSchema. This declaration should
appear in some ancestor element.
- ... ^{3}
- We note here that
the symbol partialdiff from the content dictionary calculus2 are
not part of the OpenMath standard at present.
Bill Naylor
2002-03-30