数据库管理系统:ch10 XML_第1页
数据库管理系统:ch10 XML_第2页
数据库管理系统:ch10 XML_第3页
数据库管理系统:ch10 XML_第4页
数据库管理系统:ch10 XML_第5页
已阅读5页,还剩51页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Chapter 10: XMLXMLStructure of XML DataXML Document SchemaQuerying and TransformationApplication Program Interfaces to XMLStorage of XML DataXML ApplicationsIntroductionXML: Extensible Markup LanguageDefined by the WWW Consortium (W3C)Derived from SGML (Standard Generalized Markup Language), but sim

2、pler to use than SGML Documents have tags giving extra information about sections of the documentE.g. XML Introduction Extensible, unlike HTMLUsers can add new tags, and separately specify how the tag should be handled for displayXML Introduction (Cont.)The ability to specify new tags, and to create

3、 nested tag structures make XML a great way to exchange data, not just documents.Much of the use of XML has been in data exchange applications, not as a replacement for HTMLTags make data (relatively) self-documenting E.g. A-101 Downtown 500 A-101 Johnson XML: MotivationData interchange is critical

4、in todays networked worldExamples:Banking: funds transferOrder processing (especially inter-company orders)Scientific dataChemistry: ChemML, Genetics: BSML (Bio-Sequence Markup Language), Paper flow of information between organizations is being replaced by electronic flow of informationEach applicat

5、ion area has its own set of standards for representing informationXML has become the basis for all new generation data interchange formatsXML Motivation (Cont.)Earlier generation formats were based on plain text with line headers indicating the meaning of fieldsSimilar in concept to email headersDoe

6、s not allow for nested structures, no standard “type” languageTied too closely to low level document structure (lines, spaces, etc)Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntaxDTD (Document Type Descriptors)XML SchemaPlus textua

7、l descriptions of the semanticsXML allows new tags to be defined as requiredHowever, this may be constrained by DTDsA wide variety of tools is available for parsing, browsing and querying XML documents/dataComparison with Relational DataInefficient: tags, which in effect represent schema information

8、, are repeatedBetter than relational tuples as a data-exchange formatUnlike relational tuples, XML data is self-documenting due to presence of tagsNon-rigid format: tags can be addedAllows nested structuresWide acceptance, not only in database systems, but also in browsers, tools, and applicationsSt

9、ructure of XML DataTag: label for a section of dataElement: section of data beginning with and ending with matching Elements must be properly nestedProper nesting . Improper nesting . Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.Eve

10、ry document must have a single top-level elementExample of Nested Elements Hayes Main Harrison A-102 Perryridge 400 . . Motivation for NestingNesting of data is useful in data transferExample: elements representing customer_id, customer_name, and address nested within an order elementNesting is not

11、supported, or discouraged, in relational databasesWith multiple orders, customer name and address are stored redundantlynormalization replaces nested structures in each order by foreign key into table storing customer name and address informationNesting is supported in object-relational databasesBut

12、 nesting is appropriate when transferring dataExternal application does not have direct access to data referenced by a foreign keyStructure of XML Data (Cont.)Mixture of text with sub-elements is legal in XML. Example: This account is seldom used any more. A-102 Perryridge 400 Useful for document ma

13、rkup, but discouraged for data representationAttributesElements can have attributes A-102 Perryridge 400 Attributes are specified by name=value pairs inside the starting tag of an elementAn element may have several attributes, but each attribute name can only occur onceAttributes vs. SubelementsDist

14、inction between subelement and attributeIn the context of documents, attributes are part of markup, while subelement contents are part of the basic document contentsIn the context of data representation, the difference is unclear and may be confusingSame information can be represented in two ways .

15、A-101 Suggestion: use attributes for identifiers of elements, and use subelements for contentsNamespacesXML data has to be exchanged between organizationsSame tag name may have different meaning in different organizations, causing confusion on exchanged documentsSpecifying a unique string as an elem

16、ent name avoids confusionBetter solution: use unique-name:element-nameAvoid using long unique names all over document by using XML Namespaces Downtown Brooklyn More on XML SyntaxElements without subelements or text content can be abbreviated by ending the start tag with a / and deleting the end tagT

17、o store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below!CDATA Here, and are treated as just stringsCDATA stands for “character data”XML Document SchemaDatabase schemas constrain what information can be stored, and the data types of stored valu

18、esXML documents are not required to have an associated schemaHowever, schemas are very important for XML data exchangeOtherwise, a site cannot automatically interpret data received from another siteTwo mechanisms for specifying XML schemaDocument Type Definition (DTD)Widely usedXML Schema Newer, inc

19、reasing useDocument Type Definition (DTD)The type of an XML document can be specified using a DTDDTD constraints structure of XML dataWhat elements can occurWhat attributes can/must an element haveWhat subelements can/must occur inside each element, and how many times.DTD does not constrain data typ

20、esAll values represented as strings in XMLDTD syntaxElement Specification in DTDSubelements can be specified asnames of elements, or#PCDATA (parsed character data), i.e., character stringsEMPTY (no subelements) or ANY (anything can be a subelement)Example Subelement specification may have regular ex

21、pressions Notation: “|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrencesBank DTD!DOCTYPE bank Attribute Specification in DTDAttribute specification : for each attribute NameType of attribute CDATAID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this l

22、ater Whether mandatory (#REQUIRED)has a default value (value), or neither (#IMPLIED)ExamplesIDs and IDREFsAn element can have at most one attribute of type IDThe ID attribute value of each element in an XML document must be distinctThus the ID attribute value is an object identifierAn attribute of t

23、ype IDREF must contain the ID value of an element in the same documentAn attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same documentBank DTD with AttributesBank DTD with ID and IDREF attribute types. !DOCTYPE bank-2 dec

24、larations for branch, balance, customer_name, customer_street and customer_cityXML data with ID and IDREF attributes Downtown 500 Joe Monroe Madison Mary Erin Newark Limitations of DTDsNo typing of text elements and attributesAll values are strings, no integers, reals, etc.Difficult to specify unord

25、ered sets of subelementsOrder is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved)(A | B)* allows specification of an unordered set, butCannot ensure that each of A and B occurs only onceIDs and IDREFs are untypedThe owners attribute of an account may

26、 contain a reference to another account, which is meaninglessowners attribute should ideally be constrainted to refer to customer elementsXML SchemaXML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. SupportsTyping of valuesE.g. integer, string, etcAlso, constra

27、ints on min/max valuesUser-defined, complex typesMany more features, includinguniqueness and foreign key constraints, inheritance XML Schema is itself specified in XML syntax, unlike DTDsMore-standard representation, but verboseXML Schema is integrated with namespaces BUT: XML Schema is significantl

28、y more complicated than DTDs.XML Schema Version of Bank DTD . definitions of customer and depositor .XML Schema Version of Bank DTDChoice of “xs:” was ours - any other namespace prefix could be chosenElement “bank” has type “BankType”, which is defined separatelyxs:complexType is used later to creat

29、e the named complex type “BankType”Element “account” has its type defined in-lineMore features of XML SchemaAttributes specified by xs:attribute tag:adding the attribute use = “required” means value must be specifiedKey constraint: “account numbers form a key for account elements under the root bank

30、 element:Foreign key constraint from depositor to account:Querying and Transforming XML DataTranslation of information from one XML schema to anotherQuerying on XML data Above two are closely related, and handled by the same toolsStandard XML querying/translation languagesXPathSimple language consis

31、ting of path expressionsXSLTSimple language designed for translation from XML to XML and XML to HTMLXQueryAn XML query language with a rich set of featuresTree Model of XML DataQuery and transformation languages are based on a tree model of XML dataAn XML document is modeled as a tree, with nodes co

32、rresponding to elements and attributesElement nodes have child nodes, which can be attributes or subelementsText in an element is modeled as a text node child of the elementChildren of a node are ordered according to their order in the XML documentElement and attribute nodes (except for the root nod

33、e) have a single parent, which is an element nodeXPathXPath is used to address (select) parts of documents using path expressionsA path expression is a sequence of steps separated by “/”Think of file names in a directory hierarchyResult of path expression: set of values that along with their contain

34、ing elements/attributes match the specified path E.g. /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns JoeMaryE.g. /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tagsXPath (Cont.)The initial “/” denotes root of the document

35、(above the top-level tag)Path expressions are evaluated left to rightEach step operates on the set of instances produced by the previous stepSelection predicates may follow any step in a path, in E.g. /bank-2/accountbalance 400 returns account elements with a balance value greater than 400/bank-2/ac

36、countbalance returns account elements containing a balance subelementAttributes are accessed using “”E.g. /bank-2/accountbalance 400/account_numberreturns the account numbers of accounts with balance 400IDREF attributes are not dereferenced automatically (more on this later)Functions in XPathXPath p

37、rovides several functionsThe function count() at the end of a path counts the number of elements in the set generated by the pathE.g. /bank-2/accountcount(./customer) 2 Returns accounts with 2 customersAlso function for testing position (1, 2, .) of node w.r.t. siblingsBoolean connectives and and or

38、 and function not() can be used in predicatesIDREFs can be referenced using function id()id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanksE.g. /bank-2/account/id(owner) returns all customers referred to from the owner

39、s attribute of account elements.More XPath FeaturesOperator “|” used to implement union E.g. /bank-2/account/id(owner) | /bank-2/loan/id(borrower)Gives customers with either accounts or loansHowever, “|” cannot be nested inside other operators.“/” can be used to skip multiple levels of nodes E.g. /b

40、ank-2/customer_name finds any customer_name element anywhere under the /bank-2 element, regardless of the element in which it is contained.A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children“/”, described abo

41、ve, is a short from for specifying “all descendants”“.” specifies the parent.doc(name) returns the root of a named document E.g. doc(“bank.xml”)/bank/accountXQueryXQuery is a general purpose query language for XML data Currently being standardized by the World Wide Web Consortium (W3C)The textbook d

42、escription is based on a January 2005 draft of the standard. The final version may differ, but major features likely to stay unchanged.XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QLXQuery uses a for let where order by return syntax for SQL from where S

43、QL where order by SQL order by return SQL select let allows temporary variables, and has no equivalent in SQLFLWOR Syntax in XQuery For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPathSimple FLWOR expression in XQuery find all accounts with ba

44、lance 400, with each result enclosed in an . tag for $x in /bank-2/account let $acctno := $x/account_number where $x/balance 400 return $acctno Items in the return clause are XML text unless enclosed in , in which case they are evaluatedLet clause not really needed in this query, and selection can b

45、e done In XPath. Query can be written as:for $x in /bank-2/accountbalance400return $x/account_number JoinsJoins are specified in a manner very similar to SQLfor $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account_number = $d/account_number and $c/customer_name = $d/cust

46、omer_name return $c $a The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor account_number = $a/account_number and customer_name = $c/customer_name return $c $a Nested QueriesThe following query conver

47、ts data from the flat structure for bank (fig.10.1) information into the nested structure used in bank-1(fig.10.4) for $c in /bank/customer return $c/* for $d in /bank/depositorcustomer_name = $c/customer_name, $a in /bank/accountaccount_number=$d/account_number return $a $c/* denotes all the childr

48、en of the node to which $c is bound, without the enclosing top-level tag$c/text() gives text content of an element without any subelements / tagsSorting in XQuery The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $

49、c/customer_name return $c/* Use order by $c/customer_name to sort in descending orderCan sort at multiple levels of nesting (sort by customer_name, and by account_number within each customer) for $c in /bank/customer order by $c/customer_namereturn $c/* for $d in /bank/depositorcustomer_name=$c/cust

50、omer_name, $a in /bank/accountaccount_number=$d/account_number order by $a/account_number return $a/* Functions and Other XQuery FeaturesUser defined functions with the type system of XMLSchema function balances(xs:string $c) as xs:decimal* for $d in /bank/depositorcustomer_name = $c, $a in /bank/ac

51、countaccount_number = $d/account_number return $a/balance Types are optional for function parameters and return valuesThe * (as in decimal*) indicates a sequence of values of that typeUniversal and existential quantification in where clause predicatessome $e in path satisfies P every $e in path sati

52、sfies P XQuery also supports If-then-else clausesXSLTA stylesheet stores formatting options for a document, usually separately from documentE.g. an HTML style sheet may specify font colors and sizes for headings, etc.The XML Stylesheet Language (XSL) was originally designed for generating HTML from

53、XMLXSLT is a general-purpose transformation language Can translate XML to XML, and XML to HTMLXSLT transformations are expressed using rules called templatesTemplates combine selection using XPath with construction of resultsXSLT TemplatesExample of XSLT template with match and select part The match

54、 attribute of xsl:template specifies a pattern in XPathElements in the XML document matching the pattern are processed by the actions within the xsl:template elementxsl:value-of selects (outputs) specified values (here, customer_name)The template matches all elements that do not match any other temp

55、lateUsed to ensure that their contents do not get output.If an element matches several templates, only one is used based on a complex priority scheme/user-defined prioritiesCreating XML OutputAny text or tag in the XSL stylesheet that is not in the xsl namespace is output as isE.g. to wrap results i

56、n new XML elements. Example output: Joe Mary Creating XML Output (Cont.)Note: Cannot directly insert a xsl:value-of tag inside another tagE.g. cannot create an attribute for in the previous example by directly using xsl:value-ofXSLT provides a construct xsl:attribute to handle this situationxsl:attr

57、ibute adds attribute to the preceding elementE.g. results in output of the form .xsl:element is used to create output elements with computed namesStructural RecursionTemplate action can apply templates recursively to the contents of a matched element Example output: John Mary Joins in XSLTXSLT keys

58、allow elements to be looked up (indexed) by values of subelements or attributesKeys must be declared (with a name) and, the key() function can then be used for lookup. E.g. xsl:value-of select=key(“acctno”, “A-101”)Keys permit (some) joins to be expressed in XSLTSorting in XSLTUsing an xsl:sort dire

59、ctive inside a template causes all elements matching the template to be sorted Sorting is done before applying other templates Application Program InterfaceThere are two standard application program interfaces to XML data:SAX (Simple API for XML)Based on parser model, user provides event handlers fo

60、r parsing events E.g. start of element, end of elementNot suitable for database applicationsDOM (Document Object Model)XML data is parsed into a tree representation Variety of functions provided for traversing the DOM treeE.g.: Java DOM API provides Node class with methods getParentNode( ), getFirst

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论