Introduction
Many collections of scientific data in particular disciplines, e.g., the environmental sciences, are available today around the world. Much of this data conforms to some agreed upon standard for data exchange, i.e., a standard schema and its semantics. However, sharing this data among a global community of users is still difficult because of a lack of standards for the following necessary functions:
data providers need a standard for describing or publishing available sources of data;
data administrators need a standard for discovering the published data;
users need a standard for accessing this discovered data.
We are building a system, WebSemantics, that accomplishes the above tasks. We describe an architecture and protocols for the publication, discovery and access to scientific data, that are an extension of the World Wide Web architecture and protocols.
Objectives of WebSemantics
To provide an architecture that supports the sharing of data among a world-wide community of users.
WebSemantics provides mechanisms for:
- describing the data that is available;
- discovering the existence of data relevant to a problem;
- accessing discovered relevant data.
WSQL - the WebSemantics Query Language has constructs for:
- source discovery via controlled Web navigation;
- source registration in domain-specific catalogs;
- associative selection of sources from existing catalogs;
- uniform access to data stored in heterogeneous sources;
Architecture
The WebSemantics system has a layered architecture of interdependent components:
- The World Wide Web Layer
- Web documents, some containing source connection information;
- The Catalog Layer
- catalog storing information about sources: location, types, domains;
- The Query Processing Layer
- the Query Processor gives integrated access to the collection of sources registered in a specific catalog.
数据挖掘研究院
The Four Layers of WebSemantics
数据挖掘工具
数据挖掘实验室
Publishing Data Sources in Web Documents
WebSemantics defines a new HTML tag for describing Source Connection Information (WSSCI). 数据挖掘研究院
Example:
数据挖掘工具
<HTML>
<HEAD>
<TITLE>Environmental Data for Paris</TITLE>
<WSSCI WRAPPER ="www.oracle.com/pub/WS-Oracle.class"
ADDR = "server.env.org:8001"
USER = "guest" PASSWD = "1234" >
</HEAD>
<BODY>
This repository contains daily measurements of water pollution parameters ...
etc.
</BODY>
</HTML> 数据挖掘论坛
Interaction between Components
数据挖掘实验室
数据挖掘研究院