日期:2008-11-23  浏览次数:20450 次

XML in Databases

 YoonJoon Lee (李 潤 俊)韓國科學技術院

 

Contents

•    What is XML?

•    XML Data vs Documents

•    Store and retrieve XML in RDB

•    GML

 

 

What is XML?

–   A markup language that you can use to create your own tags

–   Created by W3C to overcome the limitations of HTML

–   Based on SGML(Standard Generalized Markup Language – “Sounds great, maybe later), used in publishing industry

–   Designed with the Web in mind

Origins of XML

–    In 1996, Jon Bosak convinced that W3C to let him form a committee on using SGML on the Web.

–    November, the committee has created the beginning of a simplified form of SGML, this was XML.

–    In March 1997, Bosak released a paper “XML, Java and the Future of the Web.”

–    SGML was created for general document structuring, HTML as an application SGML for Web document, XML is a simplification of SGML for general Web use.

A Sample XML document

<address>

<name>

  <title>Mrs.</title>

  <first-name>Mary</first-name>

  <last-name>McGoon</last-name>

</name>

<street>1401 Main Street</street>

<city state=“NC”>Anytown</city>

<postal-code>34829</postal-code>

</address>

 

Tags, elements and attributes

DTD (1/2)

–    Document type definition

–    Extensible in XML, a dialect of XML

•   RDF, HL7 SGML/XML, MathML, XML/EDI, FDX

–    Describes what tags the markup language has, what tags’ attribute may be, and how they may be combined.

–    Specifies very clearly what information may or  may not be included in markup language.

–    DTD syntax is different from ordinary XML syntax.

DTD (2/2)

<!-- address.dtd -->

<!ELEMENT address (name, street, city, state, postal-code)>

<!ELEMENT name (title? first-name, last-name)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT first-name (#PCDATA)>

<!ELEMENT last-name (#PCDATA)>

<!ELEMENT street (#PCDATA)>

<!ELEMENT city (#PCDATA)>

<!ELEMENT state (#PCDATA)>

<!ELEMENT postal-code (#PCDATA)>

 

Is XML a DB?

•    “collection of data”

•    Advantages: self-describing, portable, data in tree or graph structure

•    Disadvantages: verbose, slow access

+ storage, schemas, query languages, programming interfaces, …

- efficient storage, indexes, security, transactions and data integrity, multi-user access. Trigger queries across multiple documents, …

Why DB?

•    Want to expose legacy data

•    Looking for a place to store web pages

•    Database used by an e-commerce application in which XML is used as a data transfer

•    Interested in Data or Documents

Data vs. Documents

•    Used simply as a data transport between the database and a application?

•    Integral use as in the case of XHTML and DocBook documents?

Data-Centric Documents (1/2)

•    For machine consumption

Ex) sales orders, flight schedules, …

•    Fairly regular structure, fine-grained data and little or no mixed content, no significant or