1. Introduction
To enable the resource discovery of audiovisual documents over the WWW, it will be necessary to define content description standards or metadata standards for complex, multi-layered, time-dependent information-rich data streams. In particular, this is the primary goal of the developing MPEG-7 standard, the "Multimedia Content Description Interface" [1], under development by the MPEG group.
A number of papers have considered the application of Dublin Core (DC) and the Resource Description Framework (RDF) to video indexing [2, 3, 4, 5]. An example of such an application is described briefly below. However, very little work has been done on defining schemas which are capable of actually validating and constraining video descriptions and their associated data models. Such schemas will be necessary for the development of cost-efficient, user-friendly, semi-automatic metadata generation and editing tools for video. Such a schema would also provide a solution for the Description Definition Language (DDL) component of the MPEG7 requirements.
This paper first briefly presents a video description scheme based on Dublin Core and MPEG-7. From this description format, a list of schema requirements are generated. It then compares the ability of a number of existing schemas and schema proposals, including the RDF Schema, XML DTDs, DCD and SOX, to satisfy descriptions of hierarchical video structures. Examples of schema definitions are given to illustrate their capabilities.
Finally this paper proposes a hybrid schema based on specific features from each of these schemas and schema proposals which would satisfy the MPEG-7 Description Definition Language (DDL) requirements. 数据挖掘研究院
--------------------------------------------------------------------------------
2. Proposed Video Description Scheme
Dublin Core was designed specifically for generating metadata to facilitate the resource discovery of textual documents. Although a number of workshops have been held to discuss the applicability of Dublin Core to non-textual documents such as images, sound and moving images, they have primarily focused on extensions to the 15 core elements through the use of subelements and schemes specific to audiovisual data, to describe bibliographic-type information rather than the actual content. 数据挖掘研究院
It has been shown [2] that it is possible to describe both the structure and fine-grained details of video content by using the fifteen Dublin Core elements plus qualifiers and encoding this within RDF . This "pure Dublin Core" approach provides multiple levels of descriptive information. At the top level the 15 basic Dublin Core elements can be used to describe the bibliographic type information about the complete document (e.g. Title, Author, Contributor, Date etc.). This enables non-specialist inter-disciplinary searching, independent of media type. Extensions or qualifiers to specific DC elements (Type, Description, Relation, Coverage) can be applied at the lower levels (scenes, shots, frames) to provide fine-grained, discipline- and media-specific searching (e.g. Description.Camera.Angle). The disadvantage of this approach is that the semantic refinement of Dublin Core through the use of qualifiers eventually leads to a loss of semantic interoperability.
The alternative is a "hybrid" approach in which RDF (or some other framework) is used to combine both simple unqualified Dublin Core and MPEG-7 descriptors within a single description container. Dublin Core can be used for generic media-independent search and retrieval while MPEG-7 can be used for object-specific fine-grained queries. Our future research will compare and evaluate these two approaches for multimedia resource discovery and determine the best balance between semantic interoperability, extensibility and modularity. At this stage, we don′t know the specific attributes of each level, we can only assume that each structural component will possess both a set of Dublin Core attributes plus a set of MPEG-7 attributes, as illustrated in Figure 1 below.
For example, if DC.Type = "Image.Moving.TV.News.Scene" then valid descriptors will include both the DC simple elements plus MPEG-7 descriptors such as script, transcript, editlist, keyframe etc. If DC.Type = "Image.Moving.TV.News.Scene.Shot" then valid descriptors will include both the DC elements plus keyframe, camera_distance, camera_angle, camera_motion, opening_transition, closing_transition. If DC.Type = "Image.Moving.TV.News.Scene.Shot.Frame" then a valid descriptors will be the DC elements plus colour_histogram. 数据挖掘研究院
Figure 1 shows the logical structure, the structural components and their associated Dublin Core attributes and some assumed MPEG-7 attributes for the proposed video description scheme.
Figure 1: Multilayered Hierarchical Structure and Attributes of Video
-------------------------------------------------------------------------------- 数据挖掘实验室
3. Video Metadata Schema Requirements
In order to represent the video structure and Dublin Core descriptors outlined in Figure 1, a suitable schema must be able to support the following: 数据挖掘研究院
Hierarchical structure definitions. The schema must be able to constrain the structure to a precise hierarchy in which complete video documents sit at the top level. These in turn contain sequences, which contain scenes, which contain shots, which contain frames, which contain objects or actors. Figure 1 illustrates this hierarchy.
Each level (or class) within the hierarchy must be constrained to possess only specific attributes. In our description scheme, we assume that each layer possesses the 15 simple, optional and extensible DC elements plus a set of class-specific attributes unique to that layer. These represent the set of MPEG-7 descriptors for that class when they become available.
Element and attribute inheritance. It should be possible to specify sub-classing with inheritance of attributes and elements from the upper to lower classes. In addition, sub-classes should be able to have their own additional attributes and elements. This allows efficient reuse and customization of document schemas.
Data Typing. It must be possible to constrain the values of attributes to certain data types. Data types supported should include primitive data types as well as Schemes (e.g. SMPTE), enumerated data types, controlled vocabularies, file types (images), URIs and complex data types (e.g.colour histograms, 3D vectors, graphs, RGB values etc.). It should also be possible to specify multiple alternative schemes or data types for a particular attribute.
Cardinality within attributes should be representable. It must be possible to specify that an attribute can have zero, one or multiple values. Ideally the minimum and maximum number of attributes should also be specifiable e.g. a scene must contain between 2 and 5 shots.
Spatio-temporal specifications. The Schema must be able to support the specification of temporal characteristics e.g. begin and end time of segments and their duration. Similarly, it should be able to support spatial representation e.g. regions within an image or motion along a line. 数据挖掘研究院
Spatial, temporal and conceptual relations. Spatial relations such as neighbouring objects and temporal relations such as sequential or parallel segments should be supported. Given such a relationship between two classes, it should also be possible to constrain specific attribute values of these classes. For example, the start and end times of scenes contained within a sequence, must lie within the start and end time of that sequence.
Human-readability. It is desirable rather than mandatory that both the schema and the description output from the schema should be human-readable.
Availability of supporting technologies such as parsers (capable of validating input descriptions), databases and query languages.
These requirements are similar and compatible with the DDL requirements listed in section 4.1.1 of the MPEG-7 Requirements Document [7]. 数据挖掘研究院
-------------------------------------------------------------------------------- 数据挖掘研究院
4. Resource Description Framework (RDF) Schema
The Resource Description Framework (RDF) enables interoperability between applications which exchange machine-understandable information on the Web. A model for representing metadata as well as a syntax for encoding RDF, based on XML has been defined in the RDF Model and Syntax Specification document [8]. 数据挖掘研究院
RDF is based on a resource and property data model system. A collection of classes (typically authored for a specific purpose or domain) and the definition of their properties (attributes) and corresponding semantics represent an RDF schema. A schema defines not only the properties of the resource or class (Title, Author, Subject, Size, Color etc.) but also may define the kinds of resources being described (books, webpages, people, companies, etc.). The details of RDF schemas have been defined in the RDF Schema Specification document [9].
Classes are organized in a hierarchy, and offer extensibility through subclass refinement. This way, in order to create a schema slightly different from an existing one, one can just provide incremental modifications to the base schema. Through the sharability of schemas RDF will support the reusability of metadata definitions. Due to RDF′s incremental extensibility, agents processing metadata will be able to trace the origins of schemes they are unfamiliar with back to known schemes, and perform meaningful actions on metadata they weren′t originally designed to process. The sharability and extensibility of RDF also allows metadata authors to use multiple inheritance to "mix" definitions, to provide multiple views to their data, taking advantage of work done by others. The XML namespace mechanism serves to identify different RDF Schemas. 数据挖掘研究院
RDF schemas can be compared to XML Document Type Descriptions (DTDs). Unlike an XML DTD, which gives specific constraints on the syntactical structure of a document, an RDF schema provides semantical information about the interpretation of the statements given in an RDF data model. Given its goals, RDF appears to be the ideal approach for supporting descriptors from multiple description schemes simultaneously, as required by the MPEG-7 DDL.
--------------------------------------------------------------------------------
4.1 Example of a Suitable RDF Schema
This section describes an RDF schema definition that attempts to map to the diagram in Figure 1 and support the requirements listed above.
Since we want the DC simple attributes to be applicable to every component or layer, videos, sequences, scenes, shots, frames and objects are all sub-classes of a top level document class which possesses the DC attributes. In addition each sub-class has its own additional descriptive properties or attributes which will correspond to MPEG-7 descriptors when they become available. 数据挖掘实验室
<rdf: RDF
xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdfs= "http://www.w3.org/TR/WD-rdf-schema#"
xmlns:dc= "http://purl.org/metadata/dublin_core#">
<rdfs:Class ID="MM_document">
<rdfs:comment>Class for representing a generic multimedia document</rdfs:comment>
</rdfs:Class>
<rdfs:comment>Define all of the DC elements for MM_document </rdfs:comment>
<rdf:PropertyType ID="Title">
<rdfs:comment>This is the DC Title element </rdfs:comment>
<rdfs:domain rdf:resource="#MM_document">
<rdfs:range rdf:resource="http://purl.org/metadata/dublin_core#Title"/>
</rdf:PropertyType>
<rdf:PropertyType ID="Creator">
<rdfs:comment>This is the DC Creator element </rdfs:comment>
<rdfs:domain rdf:resource="#MM_document">
<rdfs:range rdf:resource="http://purl.org/metadata/dublin_core#Creator"/>
</rdf:PropertyType>
.
etc.
.
<rdfs:Class ID="Video">
<rdfs:comment>Class for representing a video document. It is a subclass of MM_document</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class> 数据挖掘研究院
<rdfs:Class ID="Sequence">
<rdfs:comment>Class for representing a sequence from a video document. It is a subclass of MM_document</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class>
<rdfs:Class ID="Scene">
<rdfs:comment>Class for representing a scene. It is a subclass of MM_document</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class> 数据挖掘研究院
<rdfs:Class ID="Shot">
<rdfs:comment> Class representing a shot</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class>
<rdfs:Class ID="Frame">
<rdfs:comment> Represents a single frame. It is a subclass of #MM_document</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class> 数据挖掘实验室
<rdfs:Class ID="Object">
<rdfs:comment> Represents an object within a frame. It is a subclass of #MM_document</rdfs:comment>
<rdfs:subClassOf rdf:resource="#MM_document"/>
</rdfs:Class>
One of the problems with RDF is to create a generic property such as contains by which the hierarchical structure can be defined i.e. videos contain sequences which contain shots which contain frames which contain objects and actors. If you create a property contains for #video then how do you also apply it to #sequence, #scene and #shot? Since each property requires a single range, then generic relationships such as contains cannot be used. Instead, a separate property must be defined for each domain-range pair. This is tedious and repetitive. The lack of class-specific constraints on domain and range of properties is a major limitation of RDF, particularly when applied to complex multilayered documents in which you want to specify constraints on structural, spatial, temporal and conceptual relationships between components. 数据挖掘研究院
<rdf:PropertyType ID="contains_sequences">
<rdfs:comment> Property related to a video asset stating that a video consists of a number of sequences. </rdfs:comment>
<rdfs:domain rdf:resource="#Video">
<rdfs:range rdf: resource="#Sequence">
</rdfs:PropertyType>
<rdf:PropertyType ID="contains_scenes">
<rdfs:comment> Property related to a sequence asset stating that a sequnce consists of a number of scenes. </rdfs:comment>
<rdfs:domain rdf:resource="#Sequence">
<rdfs:range rdf: resource="#Scene">
</rdfs:PropertyType> 数据挖掘研究院
<rdf:PropertyType ID="contains_shots">
<rdfs:comment> Property related to a scene asset stating that a scene consists of a number of shots. </rdfs:comment>
<rdfs:domain rdf:resource="#Scene">
<rdfs:range rdf: resource="#Shot">
</rdfs:PropertyType> 数据挖掘研究院
<rdf:PropertyType ID="contains_frames">
<rdfs:comment> Property related to a shot asset stating that a shot consists of a number of frames. </rdfs:comment>
<rdfs:domain rdf:resource="#Shot">
<rdfs:range rdf: resource="#Frame">
</rdfs:PropertyType> 数据挖掘研究院
<rdf:PropertyType ID="contains_objects">
<rdfs:comment> Property related to a frame asset stating that a frame consists of a number of objects. </rdfs:comment>
<rdfs:domain rdf:resource="#Frame">
<rdfs:range rdf: resource="#Object">
</rdfs:PropertyType>
Another problem is the limited data typing within RDF. There are three ways of specifying data types within RDF:
Use the primitive Literal data type available within the RDF schema definition. This is any quoted string.
Implement a kind of enumerated data type by defining the range to be a class with a number of predefined instance values. This is used in the example below to define the possible values for shot transitions.
Point to a separate namespace in which the data types have been defined. In the example below we refer to "http://www.w3.org/TR/datatypes" for any data types other than literal. This namespace doesn′t currently exist but it is intended to define this within the W3C XML Schema Working Group [10] which has recently been set up.
Below is an example of the RDF Schema code defining some of the scene, shot , frame and object properties. It illustrates the three data typing methods available.
<rdf:PropertyType ID="startTime">
<rdfs:domain rdf:resource="#Scene">
<rdfs:domain rdf:resource="#Shot">
<rdfs:range rdf:resource="http://wwww.w3.org/TR/datatypes#Time"/>
</rdf:PropertyType> 数据挖掘研究院
<rdfs:PropertyType ID="keyFrame">
<rdfs:domain rdf:resource="#Scene">
<rdfs:domain rdf:resource="#Shot">
<rdfs:range rdf:resource="http://www.w3.org/TR/datatypes#Image"/>
</rdfs:PropertyType>
<rdfs:PropertyType ID="openTrans">
<rdfs:domain rdf:resource="#Shot">
<rdfs:range rdf:resource="#Transitions">
</rdfs:PropertyType>
<rdfs:PropertyType ID="closeTrans">
<rdfs:domain rdf:resource="#Shot">
<rdfs:range rdf:resource="#Transitions">
</rdfs:PropertyType> 数据挖掘研究院
<rdfs:Class ID="Transitions"/>
<Transitions ID="Cut"/>
<Transitions ID="Fade"/>
<Transitions ID="Wipe"/>
<Transitions ID="Dissolve"/> 数据挖掘研究院
<rdfs:PropertyType ID="position">
<rdfs:domain rdf:resource="#Object">
<rdfs:range rdf:resource="http://www.w3.org/TR/datatypes#Point">
</rdfs:PropertyType>
<rdfs:PropertyType ID="shape">
<rdfs:domain rdf:resource="#Object">
<rdfs:range rdf:resource="http://www.w3.org/TR/datatypes#Polygon">
</rdfs:PropertyType> 数据挖掘实验室
<rdfs:PropertyType ID="colorHistogram ">
<rdfs:domain rdf:resource="#Frame">
<rdfs:domain rdf:resource="#Object">
<rdfs:range rdf:resource="http://www.w3.org/TR/datatypes#Histogram">
</rdfs:PropertyType> 数据挖掘实验室
-------------------------------------------------------------------------------- 数据挖掘研究院
4.2 Advantages of the RDF Schema for Video Metadata
RDF Schemas, within the context of this application, have the following advantages: 数据挖掘实验室
RDF Schema is able to provide meanings to elements or semantic structure not possible using purely syntactic schemas such as XML DTDs.. However the sorts of machine-understandable meanings provided in the current version of RDF Schema is very limited - so the advantage of "semantic validation" is virtually negligible.
The other schemas really only provide implicit child or contains relationships between elements. With RDF you can specify any relationship types explicitly through properties but this is limited by the need to specify a single range. It isn′t possible to constrain a particular relationship to multiple range/domain pairs e.g. sequences can only contains scenes which can only contain shots etc.
Multiple namespaces. This enables the same feature to have different descriptors which correspond to different domains or description schemes. The ability to mix classification vocabularies within one XML-based encoding allows video authors or others to deliver richer domain-specific content descriptions thus increasing the re-usability of the video on the Web. This is a key requirement of the MPEG-7 DDL.
Inheritance is supported through sub-classes and sub-properties. This provides easy extensibility and reuse of code.
A simple RDF parser (SiRPAC [11] ) exists but it has limited validation capabilities, checking only that the domain and range constraints are satisfied.
It is human-readable, simple to understand and thus simple to extend or customize.
-------------------------------------------------------------------------------- 数据挖掘研究院
4.3 Limitations of the RDF Schema for Video Metadata
RDF Schema has the following problems or limitations: 数据挖掘研究院
Unstable. The RDF Schema specifications are still under development and change frequently.
Limited or no data typing. Almost all data typing will need to be provided by external namespaces, which don′t yet exist.
No cardinality. It isn′t possible to specify optional, zero or multiple values for an attribute.
Range constraints such as minimum and maximum values are not supported.
Class-specific range constraints are not possible. Only one range is possible for a given property. The only way to do provide multiple ranges is to create multiple properties e.g. secs_start_time, frame_start_time, SMPTE_start_time.
RDF Schema can′t describe multilayered structures using a single generic "contains" property. This requires multiple specific "contains" properties i.e. "contains_sequences", "contains_scenes", "contains_shots", "contains_frames". The alternative is to implement code outside of the schema which understands DC.Relation.HasParts semantics and can perform the validation. 数据挖掘研究院
Property-centricity makes readability difficult. The link between properties and classes is defined within the property definitions not the class definitions.
No query language exists for RDF. Given a video structure, to find videos with similar structures, you need to be able to store RDF structures in a directed graph with associated attribute values in a database.
The simplest way to specify spatial and temporal relationships is via the Collection elements: Seq, Bag and Alt, but these provide limited semantics. Since no <Par> element exists within RDF, the <Bag> element must be used to specify parallelism. For spatial relationships such as neighbours, if the list of neighbours is in a collection, can we assume that the first one is the nearest neighbour?
Cannot map relationship-type properties between classes to constraints on the attribute values of the classes involved. For example, if two scenes abutt then their respective end and start frame numbers must be consecutive. If a sequence "contains" a scene, then the start and end times of the scene, must lie within the start and end times of the sequence. This is not supported by RDF Schema.
RDF Schema is an incomplete mapping of the RDF Syntax and Data model. There are very useful features available within the RDF Syntax and Data Model Spec. which aren′t supported in the RDF Schema. 数据挖掘研究院
--------------------------------------------------------------------------------
5. XML DTDs
Extensible Markup Language (XML) Document Type Definitions (DTDs) provide a subset of SGML for describing documents. XML was developed by the XML Working Group under the World Wide Web Consortium (W3C) in 1996. The complete XML spec. is available from the W3C.[12].
Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly. 数据挖掘实验室
The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute-value pairs with its logical structures. XML provides the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it. Document type declarations are made in a Document Type Definition (DTD) file. The DTD file then contains a formal definition of a particular type of document outlining the element names and the structure of the document.
--------------------------------------------------------------------------------
5.1 An Example of an XML DTD for Video Documents
The structure is defined in the element definitions at the top of the DTD. Each element has a set of associated attributes. All elements have an ID attribute plus the DC attributes. In addition, sequences, scenes and shots also have a set of time attributes (begin, end, duration). Each element also has its own set of level-specific attributes (which will correspond to the MPEG-7 descriptors when they become available). 数据挖掘研究院
<?xml version="1.0"?> 数据挖掘实验室
<!DOCTYPE videodoc [ 数据挖掘研究院
<!-- hierarchical structure of videodoc --!>
<!ELEMENT videodoc (sequence*) >
<!ELEMENT sequence (scene*)>
<!ELEMENT scene (shot*)>
<!ELEMENT shot (frame*)>
<!ELEMENT frame(object*)>
<!ELEMENT object(object*)>
<!-- ID attribute for every element --!>
<!ENTITY % id_attr "id ID #IMPLIED"> 数据挖掘研究院
<!-- Set of Dublin Core Attributes --!>
<!ENTITY % dc_attr "
Title CDATA #IMPLIED
Creator CDATA #IMPLIED
Subject CDATA #IMPLIED
Description CDATA #IMPLIED
Publisher CDATA #IMPLIED
Contributor CDATA #IMPLIED
Date CDATA #IMPLIED
Type CDATA #IMPLIED
Format CDATA #IMPLIED
Identifier CDATA #IMPLIED
Source CDATA #IMPLIED
Language CDATA #IMPLIED
Relation CDATA #IMPLIED
Coverage CDATA #IMPLIED
Rights CDATA #IMPLIED">
<!ENTITY % scene_attr "
Transcript CDATA #IMPLIED
Script CDATA #IMPLIED
EditList CDATA #IMPLIED
Keyframe CDATA #IMPLIED
Locale CDATA #IMPLIED
Cast CDATA #IMPLIED
Objects CDATA #IMPLIED"> 数据挖掘实验室
<!ENTITY % shot_attr "
Keyframe CDATA #IMPLIED
CameraDist NMTOKEN #IMPLIED
CameraAngle NMTOKEN #IMPLIED
CameraMotion NMTOKEN #IMPLIED
Lighting NMTOKEN #IMPLIED
OpenTrans NMTOKEN #IMPLIED
CloseTrans NMTOKEN #IMPLIED"> 数据挖掘实验室
<!ENTITY % frame_attr "
Image CDATA #IMPLIED
Timestamp CDATA #IMPLIED
ColourText NMTOKEN #IMPLIED
ColourHistogram CDATA #IMPLIED
Texture CDATA #IMPLIED
Annotation CDATA #IMPLIED
Anno_Position CDATA #IMPLIED">
<!ENTITY % object_attr "
Position CDATA #IMPLIED
Shape CDATA #IMPLIED
Trajectory CDATA #IMPLIED
Speed CDATA #IMPLIED
ColourText NMTOKEN #IMPLIED
ColourHistogram CDATA #IMPLIED
Texture CDATA #IMPLIED
Volume CDATA #IMPLIED 数据挖掘研究院
Annotation CDATA #IMPLIED
Anno_Position CDATA #IMPLIED"> 数据挖掘研究院
<!ENTITY % time_attr "
begin CDATA #IMPLIED
end CDATA #IMPLIED
dur CDATA #IMPLIED"> 数据挖掘研究院
<!ATTLIST videodoc
%id_attr;
%dc_attr;>
<!ATTLIST sequence
%id_attr;
%dc_attr;
%time_attr;>
<!ATTLIST scene
%id_attr;
%dc_attr;
%scene_attr;
%time_attr;> 数据挖掘研究院
<!ATTLIST shot
%id_attr;
%dc_attr;
%shot_attr;
%time_attr;>
<!ATTLIST frame
%id_attr;
%dc_attr;
%frame_attr;>
<!ATTLIST object
%id_attr;
%dc_attr;
%object_attr;> 数据挖掘研究院
]>
-------------------------------------------------------------------------------- 数据挖掘实验室
5.2 Advantages of XML DTDs for Video Metadata 数据挖掘研究院
Work is progressing on a query languages for XML e.g. XML-QL [13].
XML parsers exist.
Simplicity associated with a single namespace. Users only have to understand one namespace.
XML is simpler than SGML, HyTime etc.
XML DTDs are easy to read and understand. Short and sweet without all that data typing.
Hierarchical structures are supported but only on a syntactical basis.
--------------------------------------------------------------------------------
5.3 Disadvantages of XML DTDs for Video Metadata 数据挖掘实验室
No name spaces. Since name spaces are not supported, definitions such as Dublin Core attributes will need to be redefined unless external entities are used. External entities provide a similar capability to namespaces. An external entity can be retrieved from an external DTD via a URI to this DTD and the entity′s ID.
Cardinality of attributes is zero or one in XML DTDs. This creates problems with DC attributes which are optional and repeatable. They may need to be declared as elements.
There is very limited support for data typing. Only three kinds of attribute types are supported: a string type, a set of tokenized types and enumerated types. However Bray [14] has shown that it is possible to attach strong type declarations to XML elements using reserved attributes.
It is a purely ′syntactic′ machine-understandable schema which can′t provide any of the semantics associated with complex structured multimedia data or support object-oriented data modelling concepts.
There is no inheritance.
There are no relationships possible other than the implicit contains.
--------------------------------------------------------------------------------
6. Document Content Description (DCD) for XML
The Document Content Description (DCD) [15] facility for XML is an RDF vocabulary designed for describing constraints to be applied to the structure and content of XML documents. It consists of a set of properties used to constrain the types of elements and names of attributes that may appear in an XML document, the contents of the elements and the values of the attributes. It was designed to provide semantics over and above the purely syntactical XML DTDs. It was also designed to be conformant with the RDF Model and Syntax Specification (with some simplifications). DCD also incorporates a subset of an earlier submission to W3C, the XML-Data Submission [16]. 数据挖掘研究院
The introduction to the XML-Data Submission says that it "describes an XML vocabulary for schemas, that is, for defining and documenting object classes. It can be used for classes which are strictly syntactic (for example, XML) or those which indicate concepts and relations among concepts (as used in relational databases, KR graphs and RDF). The former are called ′syntactic schemas;′ the latter ′conceptual schemas′." Thus, XML-Data and DCD add object-oriented and data modelling concepts such as class inheritance to purely syntactic schemas such as XML DTDs. 数据挖掘研究院
DCD Schemas are based on elements and attributes. Elements correspond to RDF property types. DCD declarations constrain the content and attributes of elements in document instances, by assigning properties to objects of type ElementDef and AttributeDef. 数据挖掘研究院
--------------------------------------------------------------------------------
6.1 Example of a DCD Schema
The DCD Schema below is based on the following assumptions:
The Dublin Core elements are all described in a separate name space.
The root element video_doc contains video_sequences which contains video_scenes etc.
The Dublin Core elements apply to every level.
In addition the sequence, scene and shot elements possess start_time, end_time and duration elements.
In addition, each level has its own unique elements/attributes corresponding to MPEG-7 descriptors.
<DCD
xmlns:DC="http://purl.org/metadata/dublin_core#"
xmlns:CDT="http://www.w3.org/TR/complex_datatypes#"> 数据挖掘研究院
<?DCD syntax="explicit"?>
<Description>Example of a Video Document DCD</Description>
<Namespace>http://www.dstc.edu.au/schemas/videodcd</Namespace> 数据挖掘研究院
<ElementDef Type="videodoc" Model="Elements" Root="True">
<Description>A video document structure.</Description>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>sequence</Element>
</Group>
</ElementDef>
<ElementDef Type="sequence" Model="Elements">
<Description>Description of a video sequence element</Description>
<AttributeDef Name="seqID" Occurs="Required"/>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Element>time_attribs</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>scene</Element>
</Group>
</Group>
</ElementDef>
<ElementDef Type="scene" Model="Elements">
<Description>Description of a video scene element</Description>
<AttributeDef Name="sceneID" Occurs="Required"/>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Element>time_attribs</Element>
<Element>transcript</Element>
<Element>keyframe</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>shot</Element>
</Group>
</Group>
</ElementDef> 数据挖掘研究院
<ElementDef Type="shot" Model="Elements">
<Description>Description of a video shot element</Description>
<AttributeDef Name="shotID" Occurs="Required"/>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Element>time_attribs</Element>
<Element>camera_distance</Element>
<Element>camera_angle</Element>
<Element>camera_motion</Element>
<Element>lighting</Element>
<Element>open_transition</Element>
<Element>close_transition</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>frame</Element>
</Group>
</Group>
</ElementDef>
<ElementDef Type="frame" Model="Elements">
<Description>Description of a video frame element</Description>
<AttributeDef Name="frameID" Occurs="Required"/>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Element>timestamp</Element>
<Element>CDT:colourhistogram</Element>
<Element>CDT:texture</Element>
<Element>annotation</Element>
<Element>CDT:anno_position</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>object</Element>
</Group>
</Group>
</ElementDef> 数据挖掘研究院
<ElementDef Type="object" Model="Elements">
<Description>Description of a video object/actor element</Description>
<AttributeDef Name="objectID" Occurs="Required"/>
<Group RDF:Order="Seq">
<Element>dc_values</Element>
<Element>CDT:position</Element>
<Element>CDT:shape</Element>
<Element>CDT:colourhistogram</Element>
<Element>CDT:texture</Element>
<Element>CDT:trajectory</Element>
<Element>annotation</Element> 数据挖掘研究院
<Element>CDT:anno_position</Element>
<Group Occurs="ZeroOrMore" RDF:Order="Seq">
<Element>object</Element>
</Group>
</Group>
</ElementDef>
<ElementDef Type="dc_values" Model="Elements">
<Description>List of Dublin Core Elements</Description>
<Group RDF:Order="Seq">
<Element>DC:Title</Element>
<Element>DC:Creator</Element>
<Element>DC:Subject</Element>
......
</Group>
</ElementDef>
<ElementDef Type="time_attribs" Model="Elements">
<Group RDF:Order="Seq">
<Element>start_time</Element>
<Element>end_time</Element>
<Element>duration</Element>
......
</Group>
</ElementDef>
<ElementDef Type="transcript" Model="Data" Datatype="string">
</ElementDef>
<ElementDef Type="keyframe" Model="Data" Datatype="uri">
</ElementDef>
<ElementDef Type="camera_distance" Model="Data" Datatype="enumeration">
<Values>close-up medium-shot long-shot</Values>
</ElementDef>
<ElementDef Type="camera_angle" Model="Data" Datatype="enumeration">
<Values>low eye-level high</Values>
</ElementDef>
<ElementDef Type="open_transition" Model="Data" Datatype="enumeration">
<Values>cut fade wipe dissolve</Values>
</ElementDef>
<ElementDef Type="annotation" Model="Data" Datatype="string">
</ElementDef>
</DCD>
-------------------------------------------------------------------------------- 数据挖掘研究院
6.2 Advantages of DCD for Video Metadata 数据挖掘研究院
Human-readable and simple.
Provides better data typing than RDF Schemas and XML DTDs (but still only basic). Also provides upper and lower bound constraints on attribute values.
Provides cardinality.
Supports multiple namespaces.
As an RDF vocabulary, it inherits the advantages of the data modelling concepts in RDF, plus constructs such as RDF:Seq and RDF:Alt.
--------------------------------------------------------------------------------
6.3 Disadvantages of DCD for Video Metadata
Currently no subclassing or inheritance but this is planned for the future. The proposal is to create subclasses from existing elements through an extends property.
Only basic data typing is supported , not complex data types. There is no support for multiple alternate data types, except if you create alternate elements with different data types e.g. start_time value can be SMPTE, secs, frames(int). Also there is no support for constraining the values of certain attributes of related elements.
Doesn′t support data types such as points, lines, polygons, colour histograms etc. These would all have to be described in a separate namespace e.g."http://www.w3.org/TR/complex_datatypes". It is not possible to specify that just the element′s datatype is to be a value from another namespace. You need to specify that the element itself is totally described in another namespace.
Only Seq and or Alt Groups are available. Bag is not a legal value for the RDF:Order property. Seq is fine for specifying sequential components but for multimedia, there is also a need to support groups of elements which run in parallel. The RDF Bag element is the most suitable for specifying this, (in the absence of any Par value), but it isn′t supported in DCD. 数据挖掘研究院
-------------------------------------------------------------------------------- 数据挖掘研究院
7. Schema for Object-Oriented XML (SOX)
Schema for Object-Oriented XML (SOX) [17] provides a facility for defining the structure, content and semantics of XML documents to enable XML validation and automated content checking. 数据挖掘研究院
SOX provides an alternative to XML DTDs for modeling markup relationships. The introduction to the SOX specification says that it provides the following advantages over XML DTDs:
More efficient software development processes for distributed applications;
Basic intrinsic datatypes;
An extensible datatyping mechanism;
Content model and attribute interface inheritance;
A powerful namespace mechanism;
Embedded documentation.
SOX supports three varieties of datatypes: scalar datatypes, enumerated datatypes and format datatypes. Scalar datatypes are derived from the basic number datatype, and support specification of the number of digits and decimal places, minimum and maximum value range, and a mask. An enumerated datatype may be derived from any of the intrinsic datatypes, and may specify an enumeration of valid values. A format datatype may be derived from any of the intrinsic datatypes, and must specify a mask.
In SOX, element types may inherit their content models and attribute definitions directly from another named element type. An element type may also inherit and extend an attribute list. Specialization of attribute definitions allows refinement and restriction of attribute datatype, enumeration list and default value. Additionally, an attribute value may be defined to be inherited from the identically named attribute on a parent or older ancestor element. Thus, for example, namespaces can be inherited from superordinate elements. 数据挖掘研究院
The SOX namespace facility enables Objects from any identifiable namespace to be used in building a SOX document. That is, any element, attribute, datatype, enumeration, entity, interface, notation, parameter, or processing instruction may be imported from any namespace. 数据挖掘研究院
A SOX document is a valid XML document, according to the SOX DTD. The schema designer is free to employ the same XML tools used for traditional XML documents. This means that a SOX document can processed by a validating XML parser, formatted according to an XSL stylesheet, and managed by any DOM-compliant or SAX-compliant application.
-------------------------------------------------------------------------------- 数据挖掘研究院
7.1 SOX Example
In this example, the structural elements, video_doc, video_sequence, video_scene, video_shot, video_frame and video_object are declared first. They each possess the DC attributes, plus their own specific elements and attributes.
<schema name="video_doc" namespace="http://www.dstc.edu.au/schemas/video_doc.xml" > 数据挖掘实验室
<h1>Video Metadata Document</h1> 数据挖掘研究院
<h2>Imported namespaces</h2>
<namespace name="dc" namespace="http://purl.org/metadata/dublin_core#"/>
<namespace name="dcq" namespace="http://purl.org/metadata/dublin_core_qualifiers#"/> 数据挖掘研究院
<h2>Structural Elements</h2>
<elementtype name="video_doc">
<model>
<sequence>
<element name="dc_attributes"/>
<element name="video_sequence" occurs="*"/>
</sequence>
</model>
</elementtype>
<elementtype name="video_sequence">
<model>
<sequence>
<element name="seqID"/>
<element name="dc_attributes"/>
<element name="time_attributes"/>
<element name="video_scene" occurs="*"/>
</sequence>
</model>
</elementtype>
<elementtype name="video_scene">
<model>
<sequence>
<element name="sceneID"/>
<element name="dc_attributes"/>
<element name="time_attributes"/>
<element name="transcript"/>
<element name="key_frame"/>
<element name="video_shot" occurs="*"/>
</sequence>
</model>
</elementtype>
<elementtype name="video_shot">
<model>
<sequence>
<element name="shotID"/>
<element name="dc_attributes"/>
<element name="time_attributes"/>
<element name="camera_distance"/>
<element name="camera_angle"/>
<element name="camera_motion"/>
<element name="lighting"/>
<element name="open_trans"/>
<element name="close_trans"/>
<element name="video_frame" occurs="*"/>
</sequence>
</model>
</elementtype>
<elementtype name="video_frame">
<model>
<sequence>
<element name="frameID"/>
<element name="dc_attributes"/>
<element name="timestamp"/>
<element name="colour_histogram"/>
<element name="texture"/>
<element name="video_object" occurs="*"/>
</sequence>
</model>
</elementtype>
<elementtype name="video_object">
<model>
<sequence>
<element name="objectID"/>
<element name="dc_attributes"/>
<element name="position"/>
<element name="shape"/>
<element name="colour"/>
<element name="texture"/>
<element name="anno_text"/>
<element name="anno_posn"/> 数据挖掘研究院
<element name="video_object" occurs="*"/>
</sequence>
</model>
</elementtype>
The next step is to break down the elements to sub-elements and eventually data types. SOX supports both intrinsic basic datatypes as well as user-defined scalar, enumeration and formatted datatypes, derived from the intrinsic datatypes. The code below illustrates some of the capabilities of SOX data typing for video description.
<h2>Attribute Elements</h2>
<elementtype name="dc_attributes">
<model>
<sequence>
<element namespace="dc" name="Title"/>
<element namespace="dc" name="Creator"/>
<element namespace="dc" name="Subject"/>
.....
</sequence>
</model>
</elementtype> 数据挖掘研究院
<elementtype name="time_attributes">
<model>
<sequence>
<element name="start_time"/>
<element name="end_time"/>
<element name="duration"/>
</sequence>
</model>
</elementtype> 数据挖掘研究院
<elementtype name="start_time">
<instanceof name="time_val"/>
</elementtype> 数据挖掘实验室
<elementtype name="end_time">
<instanceof name="time_val"/>
</elementtype> 数据挖掘研究院
<elementtype name="duration">
<instanceof name="time_val"/>
</elementtype> 数据挖掘研究院
<elementtype name="time_val">
<model>
<choice occurs=1>
<element name="frame_num"/>
<element name="SMPTE"/>
<element name="abs_time"/>
</choice>
</model>
</elementtype> 数据挖掘实验室
<elementtype name="frame_num">
<model>
<string datatype="frame"/>
</model>
</elementtype> 数据挖掘研究院
<datatype name="frame">
<scalar datatype="int" min="1" max="25"/>
</datatype>
<elementtype name="smpte">
<model>
<string>
<mask>##:##:##;##</mask>
</string>
</model>
</elementtype>
<elementtype name="abs_time">
<model>
<string datatype="time"/>
</model>
</elementtype>
<elementtype name="key_frame">
<model>
<string datatype="URI"/>
</model>
</elementtype> 数据挖掘研究院
<elementtype name="camera_dist">
&nbs

