XML in Databases
YoonJoon Lee (李 潤 俊)韓國科學技術院
Contents
• What is XML?
• XML Data vs Documents
• Store and retrieve XML in RDB
• GML
What is XML?
– A markup language that you can use to create your own tags
– Created by W3C to overcome the limitations of HTML
– Based on SGML(Standard Generalized Markup Language – “Sounds great, maybe later), used in publishing industry
– Designed with the Web in mind
Origins of XML
– In 1996, Jon Bosak convinced that W3C to let him form a committee on using SGML on the Web.
– November, the committee has created the beginning of a simplified form of SGML, this was XML.
– In March 1997, Bosak released a paper “XML, Java and the Future of the Web.”
– SGML was created for general document structuring, HTML as an application SGML for Web document, XML is a simplification of SGML for general Web use.
A Sample XML document
<address>
<name>
<title>Mrs.</title>
<first-name>Mary</first-name>
<last-name>McGoon</last-name>
</name>
<street>1401 Main Street</street>
<city state=“NC”>Anytown</city>
<postal-code>34829</postal-code>
</address>
Tags, elements and attributes
DTD (1/2)
– Document type definition
– Extensible in XML, a dialect of XML
• RDF, HL7 SGML/XML, MathML, XML/EDI, FDX
– Describes what tags the markup language has, what tags’ attribute may be, and how they may be combined.
– Specifies very clearly what information may or may not be included in markup language.
– DTD syntax is different from ordinary XML syntax.
DTD (2/2)
<!-- address.dtd -->
<!ELEMENT address (name, street, city, state, postal-code)>
<!ELEMENT name (title? first-name, last-name)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT first-name (#PCDATA)>
<!ELEMENT last-name (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT postal-code (#PCDATA)>
Is XML a DB?
• “collection of data”
• Advantages: self-describing, portable, data in tree or graph structure
• Disadvantages: verbose, slow access
+ storage, schemas, query languages, programming interfaces, …
- efficient storage, indexes, security, transactions and data integrity, multi-user access. Trigger queries across multiple documents, …
Why DB?
• Want to expose legacy data
• Looking for a place to store web pages
• Database used by an e-commerce application in which XML is used as a data transfer
• Interested in Data or Documents
Data vs. Documents
• Used simply as a data transport between the database and a application?
• Integral use as in the case of XHTML and DocBook documents?
Data-Centric Documents (1/2)
• For machine consumption
Ex) sales orders, flight schedules, …
• Fairly regular structure, fine-grained data and little or no mixed content, no significant or