1. Introduction
The goal of XML is to provide many of SGML′s benefits not available in HTML and to provide them in a language that is easier to learn and use than complete SGML. These benefits include user-defined tags, nested elements, and an optional validation of document structure with respect to a Document Type Descriptor (DTD). 数据挖掘研究院
One important application of XML is the interchange of electronic data (EDI) between two or more data sources on the Web. Electronic data is primarily intended for computer, not human, consumption. For example, search robots could integrate automatically information from related sources that publish their data in XML format, e.g., stock quotes from financial sites, sports scores from news sites; businesses could publish data about their products and services, and potential customers could compare and process this information automatically; and business partners could exchange internal operational data between their information systems on secure channels. New opportunities will arise for third parties to add value by integrating, transforming, cleaning, and aggregating XML data. In this paper, we focus on XML′s application to EDI. Specifically, we take a database view, as opposed to document view, of XML. We consider an XML document to be a database and a DTD to be a database schema. 数据挖掘研究院
EDI applications require tools that support the following tasks:
- extraction of data from large XML documents,
- conversion of data between relational or object-oriented databases and XML data,
- transformation of data from one DTD to a different DTD, and/or
- integration of multiple XML data sources.
Data extraction, conversion, transformation, and integration are all well-understood database problems. Their solutions rely on a query language, either relational (SQL) or object-oriented (OQL). We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL has the following features: 数据挖掘研究院
- It is declarative.
- It is ``relational complete′′; in particular, it can express joins.
- It is simple enough that known database techniques for query optimization, cost estimation, and query rewriting could be extended to XML-QL.
- It can extract data from existing XML documents and construct new XML documents.
- It can support both ordered and unordered views on an XML document.
An initial draft of the query language is a W3C note, http://www.w3.org/TR/NOTE-xml-ql/. 数据挖掘研究院
One salient question is why not adapt SQL or OQL to query XML. The answer is that XML data is fundamentally different than relational and object-oriented data, and therefore, neither SQL nor OQL is appropriate for XML. The key distinction between data in XML and data in traditional models is that is XML is not rigidly structured. In the relational and object-oriented models, every data instance has a schema, which is separate from and independent of the data. In XML, the schema exists with the data as tag names. For example, in the relational model, a schema might define the relation person with attribute names name and address, e.g., person(name, address). An instance of this schema would contain tuples such as ("Smith", "Philadelphia"). The relation and attribute names are separate from the data and are usually stored in a database catalog.
In XML, the schema information is stored with the data. Structured values are called elements. Attributes, or element names, are called tags, and elements may also have attributes whose values are always atomic. For instance, <person><name>Smith</name><address>Philadelphia</address></person>. is well-formed XML. Thus, XML data is self-describing and can naturally model irregularities that cannot be modeled by relational or object-oriented data. For example, data items may have missing elements or multiple occurrences of the same element; elements may have atomic values in some data items and structured values in others; and collections of elements can have heterogeneous structure. Even XML data that has an associated DTD is self-describing (the schema is always stored with the data) and, except for restrictive forms of DTDs, may have all the irregularities described above. Most importantly, this flexibility is crucial for EDI applications. 数据挖掘研究院
Self-describing data has been considered recently in the database research community. Researchers have found this data to be fundamentally different from relational or object-oriented data, and called it semistructured data. Semistructured data is motivated by the problems of integrating heterogeneous data sources and modeling sources such as biological databases, Web data, and structured text documents, such as SGML and XML. Research on semistructured data has addressed data models, query-language design, query processing and optimization, schema languages, and schema extraction. The key observation in this paper is that XML data is an instance of semistructured data.
In designing XML-QL, we drew from other query languages for semistructured data [1, 2, 5]: tutorials describing some of the work on semistructured data can be found in [3] and [4]. XML-QL includes most features found in these languages, but it differs from all of them in several important respects. Specifically, this paper makes the following contributions: 数据挖掘研究院
- We propose a data model for XML data that extends the semistructured-data model with order. This extension is necessary for XML documents, which are ordered.
- We design a syntax for XML-QL that combines elements of the XML syntax with traditional query-language syntax.
- We propose a novel semantics for XML-QL to support order in the input and output data.
- We combine two powerful data-construction mechanisms, nested queries and Skolem functions, in a novel way.
- We illustrate that XML-QL can be used for the tasks it has been designed for, such as data extraction, transformation, and integration.
In Sec. 2, we introduce XML-QL through examples, and in Sec. 3 , we describe formally its underlying data models. We show in Sec. 4 how XML-QL can be used for the tasks of data extraction, translation, and integration. Sec. 5 describes future extensions and discusses open issues.
2. Examples in XML-QL
The simplest XML-QL queries extract data from an XML document. Our example XML input is in the document www.a.b.c/bib.xml, and we assume that it contains bibliography entries that conform to the following DTD:
<!ELEMENT book (author+, title, publisher)> <!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?, (shortversion|longversion))> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)> 数据挖掘研究院 |
Matching Data Using Patterns. XML-QL uses element patterns to match data in an XML document. This example produces all authors of books whose publisher is Addison-Wesley in the XML document www.a.b.c/bib.xml. Any URI (uniform resource identifier) that represents an XML-data source may appear on the right-hand side of IN.
WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t </title> <author> $a </author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a |
For convenience, we abbreviate </element> by </>. This abbreviation is a relaxation of the XML syntax that will be handy later, when we introduce tag variables and regular expressions over tags. Thus the query above can be rewritten as:
WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t </>
<author> $a </>
</> IN "www.a.b.c/bib.xml"
CONSTRUCT $a 数据挖掘研究院
|
WHERE <book>
<publisher> <name>Addison-Wesley </> </>
<title> $t </>
<author> $a </>
</> IN "www.a.b.c/bib.xml"
CONSTRUCT <result>
<author> $a </>
<title> $t </>
</> 数据挖掘研究院
|
<bib>
<book year="1995">
<!-- A good introductory text -->
<title>An Introduction to Database Systems </title>
<author><lastname>Date </lastname></author>
<publisher><name>Addison-Wesley </name > </publisher>
</book>
<book year="1998">
<title>Foundation for Object/Relational Databases</title>
<author><lastname>Date </lastname></author>
<author><lastname>Darwen </lastname></author>
<publisher><name>Addison-Wesley </name > </publisher>
</book>
</bib> 数据挖掘研究院
|
| Figure 1. Example bibliographic data |
<result> <author><lastname>Date </lastname></author> <title>An Introduction to Database Systems </title> </result> <result> <author><lastname>Date </lastname></author> <title>Foundation for Object/Relational Databases</title> </result> <result> <author><lastname>Darwen </lastname></author> <title>Foundation for Object/Relational Databases</title> </result> |
WHERE <book > $p </> IN "www.a.b.c/bib.xml", <title > $t </> <publisher> <name> Addison-Wesley </> </> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a </> </> 数据挖掘研究院 |
Since binding the content of an element and matching patterns in the bound element is a common idiom, we introduce a syntactic shorthand that preserves the original nesting. The CONTENT_AS $p following a pattern binds the content of the matching element to the variable $p: 数据挖掘实验室
WHERE <book> <title> $t </> <publisher> <name> Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml" CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a </> </>
On the example data, the query would produce:
<result>
<title> An Introduction to Database Systems </title>
<author> <lastname> Date </lastname> </author>
</result>
<result>
<title> Foundation for Object/Relational Databases</title>
<author> <lastname> Date </lastname> </author>
<author> <lastname> Darwen </lastname> </author>
</result> 数据挖掘研究院
|
WHERE <article>
<author>
<firstname> $f </> // firstname $f
<lastname> $l </> // lastname $l
</>
</> CONTENT_AS $a IN "www.a.b.c/bib.xml",
<book year=$y>
<author>
<firstname> $f </> // join on same firstname $f
<lastname> $l </> // join on same lastname $l
</>
</> IN "www.a.b.c/bib.xml",
y > 1995
CONSTRUCT <article> $a </> 数据挖掘实验室
|
WHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml" ... CONSTRUCT $e
3. A Data Model for XML
To better understand XML-QL, it is necessary to describe its data models. We describe an unordered data model in detail first and then an ordered data model. 数据挖掘研究院
Unordered Model. To understand XML-QL′s data model, consider the example book elements in Figure 1. The first record has three elements (one title, one author, and one publisher) and the second record has two authors. The XML document, however, contains additional information that is not directly relevant to the data itself, such as, 数据挖掘研究院
- the comment at the beginning of the first book element,
- the fact that that title precedes authors, and
- the fact that the 1995 book precedes the 1998 book.
This information is not always relevant to an EDI application that ignores comments and treats its data as unordered structured values. We assume that a distinction can be made between information that is intrinsic to the data and information, such as document layout specification that is not. Our unordered model ignores comments and relative order between elements, but preserves all other essential data. These compromises simplify XML-QL′s semantics and can effect the efficiency of a query interpretor.
Definition. An unordered XML Graph consists of:
- A graph, G, in which each node is represented by a unique string called an object identifier (OID),
- G′s edges are labeled with element tags,
- G′s nodes are labeled with sets of attribute-value pairs,
- G′s leaves are labeled with one string value, and
- G has a distinguished node called the root.
For example, the example data would be represented by the XML graph in Figure 2. Attributes are associated with nodes and elements are represented by edge labels. We use the terms node and object interchangeably. We omit object identifiers to reduce clutter.
|
|
It is important to note that XML graphs are not only derived from XML documents, but are also generated by queries.
Ordered Model. An ordered XML graph is an XML graph in which there is a total order on all nodes in the graph. For graphs constructed from XML documents a natural order for nodes is their document order. Given a total order on nodes, we can enforce a local order on the outgoing edges of each node. In an ordered model, we may have arbitrarily many edges with the same source, same edge label, and same destination value. For example, the example data would be represented by the ordered graph in Figure 3. Nodes are labeled by their index (parenthesized integers) in the total node order and edge labels are labeled with their local order (bracketed integers). In Section 4.4, we discuss XML-QL′s features that support the ordered model.
|
|
3.1 Element Identity, IDs, and ID References
To support element sharing, XML reserves an attribute of type ID (often called ID) to specify a unique key for an element. An attribute of type IDREF allows an element to refer to another element with the designated key, and an attribute of type IDREFS may refer to multiple elements. In the data model, we treat these attributes differently from all others. For example, assume attributes ID and author have types ID and IDREFS respectively:
<!ATTLIST person ID ID #REQUIRED> <!ATTLIST article author IDREFS #IMPLIED>
For example, in the XML fragment below, the first element associates the keys o123 and o234 with the two <person> elements, and the second defines an <article> element whose authors refer to the these persons. 数据挖掘研究院
<person ID="o123"> <firstname>John</firstname> <lastname>Smith<lastname> </person> <person ID="o234"> . . . </person> <article author="o123 o234"> <title> ... </title> <year> 1995 </year> </article> 数据挖掘研究院
In an XML graph, every node has a unique object identifier (OID), which is the ID attribute associated with the corresponding element or is a generated OID, if no ID attribute exists. Elements can refer to other elements by IDREF or IDREFS attributes. 数据挖掘研究院
Unlike all other attributes, which are name-value pairs associated with nodes, an IDREF attribute is represented by an edge from the referring element to the referenced element; the edge is labeled by the attribute name. ID attributes are also treated specially, because they become the node′s OID. For example, the elements above are represented by the XML graph in Fig. 4:
|
| Figure 4: Representation of IDREF attributes in XML graph. |
WHERE <article><author><lastname>$n</></></> IN "abc.xml" 数据挖掘研究院
It is also possible to write explicit join expressions on ID and IDREF values to access referenced objects. For example, the following query produces all lastname, title pairs by joining the author element′s IDREF attribute value with the person element′s ID attribute value.
WHERE <article author=$i> <title></> ELEMENT_AS $t </>, <person ID=$i> <lastname></> ELEMENT_AS $l </> IN "abc.xml" CONSTRUCT <result> $t $l</>
We note here that the idiom <title></> ELEMENT_AS $t binds $t to a <title> element with arbitrary contents. The element expression <title/> matches a <title> element with empty contents. 数据挖掘研究院
We note that although we choose to represent both elements and IDREF attributes by labeled edges it is possible, given an XML graph generated from an XML document, to recover the original document. To do this, we distinguish between the types lexically, by assuming that all attributes of type IDREF are distinct from all tags in the document; this restriction could be enforced by the DTD. Therefore, given a graph node with multiple incoming edges, we can distinguish between the node′s unique element name and the (possibly) multiple IDREF attributes that refer to the node.
3.2 Scalar Values
Only leaf nodes in the XML graph may contain values, and they may have only one value. As a consequence, the XML fragment: 数据挖掘研究院
<title>A Trip to <titlepart> the Moon </titlepart></title>
cannot be represented directly as an XML graph. In such cases, we introduce extra edges to enforce the invariant of one value per leaf node. The data above would be transformed into:
<title><CDATA>A Trip to <CDATA><titlepart><CDATA> the Moon <CDATA></titlepart></title>
which is modeled by the XML graph:
|
| Figure 5. Representation of CDATA values. |
3.3 Mapping XML Graphs into XML Documents
XML graphs may result from parsing an XML document or from querying or transforming an existing XML graph. In general, XML graphs do not have a unique representation as an XML document, because
- element order is unspecified (in the unordered model), i.e., we have to choose an order for the outgoing edges of each graph node, and
- shared nodes with multiple incoming edges do not have a unique representation in XML.
Both issues can be handled when the graph is emitted as an XML document. Given a DTD for a query′s result, an XML graph can be emitted as an XML document that conforms to that DTD and therefore all elements will be ordered. We do not address the algorithmic issues of externalizing an XML graph here. 数据挖掘研究院
4. Advanced Examples in XML-QL
The following examples introduce advanced features of XML-QL. These include tag variables, which allow a variable to be bound to any edge (element) in an XML graph, and regular-path expressions, which can specify arbitrary paths through an XML graph. We also illustrate how XML-QL can be used to integrate and transform XML data.
4.1 Tag Variables
XML-QL supports querying of element tags using tag variables, which support querying of the data′s schema. For example, the following query finds all publications published in 1995 in which Smith is either an author or an editor:
WHERE <$p>
<title> $t </title>
<year> 1995 </>
<$e> Smith </>
</> IN "www.a.b.c/bib.xml",
$e IN {author, editor}
CONSTRUCT <$p>
<title> $t </title>
<$e> Smith </>
</>
|
4.2 Regular-path Expressions
XML data can specify nested and cyclic structures, such as trees, directed-acyclic graphs, and arbitrary graphs. Element sharing is specified using element IDs and IDREFs as described above. Querying such structures often requires traversing arbitrary paths through XML elements. For example, consider the following DTD that defines the self-recursive element part:
<!ELEMENT part (name brand part*)>
<!ELEMENT name CDATA>
<!ELEMENT brand CDATA> 数据挖掘研究院
Any part element can contain other nested part elements to an arbitrary depth. To query such a structure, XML-QL provides regular-path expressions, which can specify element paths of arbitrary depth. For example, the following query produces the name of every part element that contains a brand element equal to Ford, regardless of the nesting level at which r occurs.
WHERE <part*> <name> $r </> <brand> Ford </> </> IN "www.a.b.c/bib.xml" CONSTRUCT <result> $r </> |
<part*> <name> $r </> <brand> Ford </> </>
is equivalent to the union of the following infinite sequence of patterns: 数据挖掘研究院
<name> $r </> <brand> Ford </> <part> <name> $r </> <brand> Ford </> </> <part> <part> <name> $r </> <brand> Ford </> </> </> <part> <part> <part> <name> $r </> <brand> Ford </> </> </> </> . . .
The wildcard $ matches any tag and can appear wherever a tag is permitted. For example, this query is like the one above, but matches any sequence of elements, not just part:
WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml", CONSTRUCT <result>$r</> 数据挖掘研究院
We abbreviate $* by *. Also, . denotes concatenation of regular expressions, hence the pattern <*.brand> Ford </> would match the brand Ford at any depth in the XML graph. The notation *.brand is reminiscent of Unix-like wild-cards in file names: it means ``anything followed by brand′′. 数据挖掘研究院
XML-QL′s regular-path expressions provide the alternation (|), concatenation (.), and Kleene-star (*) operators, similar to those used in regular expressions. A regular-path expression is permitted wherever XML permits an element. For example, this query considers any sequence of nested <part> elements and produces any nested <subpart> element or any <piece> element contained in a nested <component> element. The + operator means ``one or more′′, e.g., part+ is equivalent to part.part*.
WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</> 数据挖掘实验室 |
4.3 Transforming XML data with Skolem functions
An important use of XML-QL is transforming XML data, for example, from one DTD into another. To illustrate, assume that in addition to our bibliography DTD, we have a DTD that defines a person: 数据挖掘实验室
<!ELEMENT person (lastname, firstname, address?, phone?, publicationtitle*)> 数据挖掘实验室
We can write a query to transform data that conforms to the bibliography DTD into data that conforms to the person DTD. This query extracts authors from the bibliography database and transforms them into <person> elements conforming to the person DTD:
WHERE <___FCKpd___25gt; <author> <firstname> $fn </>
<lastname> $ln </>
</>
<title> $t </>
</> IN "www.a.b.c/bib.xml",
CONSTRUCT <person ID=PersonID($fn, $ln)>
<firstname> $fn </>
<lastname> $ln </>
<publicationtitle> $t </>
</> 数据挖掘实验室
|
Note that in our discussion of Skolem functions we assume an unordered data model. The ordered data model has a more complex and somewhat less intuitive semantics, which we describe in Sec. 4.5.
4.4 Integrating data from multiple XML sources
In XML-QL, we can query several sources simultaneously and produce an integrated view of their data. In this example, we produce all pairs of names and social-security numbers by querying the sources www.a.b.c/data.xml and www.irs.gov/taxpayers.xml. The two sources are joined on the social-security number, which is bound to ssn in both expressions. The result contains only those elements that have both a name element in the first source and an income element in the second source.
WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn </> </> IN "www.a.b.c/data.xml", <taxpayer> <ssn> $ssn </> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> <ssn> $ssn </> $n $i </> 数据挖掘研究院 |
{ WHERE <person>
<name> </> ELEMENT_AS $n
<ssn> $ssn </>
</> IN "www.a.b.c/data.xml"
CONSTRUCT <result ID=SSNID($ssn)> <ssn> $ssn </> $n </>}
{ WHERE <taxpayer>
<ssn> $ssn </>
<income> </> ELEMENT_AS $i
</> IN "www.irs.gov/taxpayers.xml"
CONSTRUCT <result ID=SSNID($ssn)> ssn> $ssn </> $i </>}
|
This query contains, in addition to the previous query, all persons that are not taxpayers, and vice versa. This is the outer join of "www.a.b.c/data.xml" and "www.irs.gov/taxpayers.xml".
XML-QL queries are structured into blocks and subblocks: the latter are enclosed in braces ({ ... }), and their semantics is related to that of outer joins in relational databases. In this example, the root block is empty and there are two subblocks. In the first subblock, all persons′ names are produced. Each result has a unique OID given by that person′s SSN. In the second subblock, persons and their incomes are produced. Wherever there is a match with a previously generated OID, the <income> will be appended to the object rather than included in a new <person>. 数据挖掘研究院
Query blocks are powerful. The following query retrieves all titles published in 1995 from the bibliography database, and it also retrieves the publication month for journal articles and the publisher for books.
WHERE <$e> <title> $t </><year> 1995 </> </> CONTENT_A $p IN "www.a.b.c/bib.xml"
CONSTRUCT <result ID=ResultID($p)> <title> $t </> </>
{ WHERE $e = "journal-paper", <month> $m </> IN $p
CONSTRUCT <result ID=ResultID($p)> <month> $m </> </>
}
{ WHERE $e = "book", <publisher>$q </> IN $p
CONSTRUCT <result ID=ResultID($p)> <publisher>$q </> </>
}
|
4.5 Support for ordered model
Recall that in the ordered data model, there exists a total order on nodes, usually document order, which induces a local order on outgoing edges of each node. In Figure 3, nodes are labeled by their index (parenthesized integers) in the total node order and edge labels are labeled with their local order (bracketed integers). The order of the outgoing edges is determined by the order of their destination nodes.
Under the ordered data model, the patterns in the WHERE clause are still interpreted as unordered. For example, the pattern<a> <b> $x </b> <c> $y </c> </a> will match both the XML data : <a> <b> string1 </b> <c> string2 </c> </a> and the data : <a> <c> string2 </c> <b> string1 </b> </a>.
<a> <b> string1 </b> <c> string2 </c> <b> string3 </b> <c> string4 </c> </a> 数据挖掘实验室 |
$x $y string1 string2 string1 string4 string3 string2 string3 string4 |
<a> <c> $y </c> <b> $x </b> </a>
then the variables are bound in a different order, sorted first by $y, then by $x.
XML-QL supports element-order variables, which extract the index of some element. For example, the patterns: 数据挖掘实验室
<a[$i]> ... </>
<$x[$j]> ... </> 数据挖掘实验室
bind $i or $j to an integer 0, 1, 2, ... that represents the index in the local order of the edges. For example, the following query retrieves all persons whose lastname precedes the firstname:
WHERE <person> $p </> in "www.a.b.c/people.xml", <firstname[$i]> $x </> in $p, <lastname[$j]> $y </> in $p, $j < $i CONSTRUCT <person> $p </> 数据挖掘实验室
WHERE <pub> $p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY value($y), value($z) CONSTRUCT <result> $t </>
The value function produces the CDATA value of a node. In the absence of value function, the results would be ordered by the OIDs of the nodes bound to $y, $z. For a more complex example, the following query reverses the order of all authors in a publication:
WHERE <pub> $p </> in "www.a.b.c/bib.xml" CONSTRUCT <pub> WHERE <author[$i]> $a </> in $p ORDER-BY $i DESCENDING CONSTRUCT <author> $a </> WHERE <$e> $v </> in $p, $e != "author" CONSTRUCT <$e> $v </> </pub>
4.6 Function definitions and DTD′s
XML-QL supports functions, which take XML documents as arguments. Functions can increase query reuse, because the same query can be applied to multiple documents. This code defines a function that takes two XML documents as inputs:
FUNCTION findDeclaredIncomes($Taxpayers, $Employees) {
WHERE <taxpayer> <ssn> $s </>
<income> $x </> </> IN $Taxpayers,
<employee> <ssn> $s </>
<name> $n </> </> IN $Employees
CONSTRUCT <result> <name> $n </>
<income> $x </> </>
}
|
findDelcaredIncomes("www.irs.gov/taxpayers.xml", "www.a.b.c/employees.xml") 数据挖掘研究院
The result of a function is always well-formed XML data. Function arguments and results can be restricted by DTDs. For example, the function above could be re-written as:
FUNCTION findDeclaredIncomes($Taxpayers:"www.irs.gov/tp.dtd",
$Employees:"www.employees.org/employees.dtd") :
"www.my.site.com/myresult.dtd" {
WHERE ...
CONSTRUCT ...
}
which specifies that the two inputs must conform to tp.dtd and employees.dtd respectively and that the function′s result must conform to myresult.dtd. Determining that the output of a query conforms to the output DTD statically, i.e., at query compile time, is an open research problem and the subject of future work. Currently, the XML-QL query processor checks at runtime that the input documents conform to their respective DTDs. 数据挖掘研究院
5. Future work
Future work on XML-QL falls into two categories: extending the expressiveness of the language and providing better support for existing XML standards.
5.1 Query language features
We expect to support aggregates like those provided in SQL. For example, the following query retrieves the lowest and highest price for each publisher:
WHERE <book>
<publisher> </> ELEMENT_AS $p
<price> $x </>
</>
GROUP-BY $p
CONSTRUCT <result ID=f($p)> $p
<lowest-price> $min($x) </>
<highest-price> $max($x) </>
 

