XML Data Processing in Teradata

Posted by

XML is a markup language that we can use to format data in a wide range of applications. Teradata Database version 14.10 supports XML as a native SQL data type. This allows us to store XML documents in the database and perform common XML processing operations such as query, validation, and transformation of the XML documents. It also supports shredding of XML documents (extracting data from XML documents into database tables) and publishing the SQL query results in XML format.
This blog post presents an overview of the XML processing capabilities in the Teradata Database with code samples to demonstrate the functionality.

Loading XML Format Data into Teradata Database

There are three ways to load XML formatted data into the Teradata Database.

  • As VARCHAR/CLOB data
  • As instances of the XML data type
  • As data values of other SQL predefined types through XML shredding

Loading as VARCHAR/CLOB Data

The XML document is treated like any other character string. This method is appropriate when we want to archive the documents and not query the contents within the document. Thus, when the document is retrieved, it is retrieved as a whole and not as a part.

The method does not support queries that traverse the XML structure; instead, it supports queries that treat XML as text, for example, regular expression, contains, starts-with, etc.

Loading as Instances of the XML Data Type

This method is a new introduction in TD14.10. Here the XML documents are used to construct instances of the XML data type. This is done using either the new XML() constructor or the CREATEXML() function. When XML is loaded as an XML type instance:
(1) It is parsed to ensure it is well-formed
(2) It is stored in a compact internal format (5~10 times smaller)
(3) It is also possible to validate the document before storing

Storing XML documents in this manner enables us to query the internal structure and content through the XPath/XQuery query language. XSLT (eXtensible Stylesheet Language Transformations) processing is also supported through a method on the XML type. All the new functions and stored procedures implemented to support XML processing in TD14.10 also take XML type instances as parameters and return XML type instances.

XML Query

The XML operations demonstrated in this sample include XML query using XQuery/XPath. We use the XMLEXTRACT() method on the XML type instance. Another option is to use the XMLQUERY() function.

XML Data Type

We use the XML data type to represent XML documents in a Relational Database. New XML() constructor or CREATEXML() function are used to construct the XML data type instances.

Let’s better understand the concept with an example:

Both the sentences return an XML type instance.

Creating Tables with XML Datatype Column

Let us create a table with an XML datatype column.

The table has two columns, with the second column capable of storing XML values.

Loading XML Data into XML Datatype Column

BTEQ helps in loading the XML data values into the XML type column. The following steps can do the trick:

  • Create a directory on the file system where the XML files are to be stored and then place the XML file into that directory.
  • Create a text file with a listing of XML documents that are to be loaded over there.
  • Invoke a BTEQ script that uses the IMPORT instruction to load the files into the table.

We can use the following code to load the data:

Loading as Instances of the XML Data Type

We can store the XML data as an instance of the XML data type. To insert, create an XML type instance from a string literal via the new XML() operator.

Below is the code to load the data by file system:

Retrieving XML Values

There can be instances when we may want to retrieve the complete document or parts of the document, along with the data extracted from XML documents.

We can use the following code to retrieve the complete document.

XMLSERIALIZE() function helps us get the return document in a particular encoding.

We have learned to retrieve the complete document, but retrieving parts of the document is also possible. To retrieve the parts of the document, we’ll use XML Query Language in the following manner:

Loading as Data Values of Other SQL Predefined Types through XML Shredding

After learning how to retrieve the complete XML document and parts of the document, let’s learn to retrieve data extracted from XML documents via shredding.

XML Shredding is one of the methods used to retrieve data that is extracted from XML documents. XMLTABLE function provides another solution to shred the data.

For XML shredding, we’ll create another table for the schema, which is used to extract data from XML documents.

Creating the Table

After creating the table, we will need to store the schema document (it is also an XML document) in the table, which will help us validate the target table and extract the data based on schema documents.

Inserting Schema Document through File System

Extracting Data from XML Documents

In the below query, the ISSCHEMAVALID() method is used to validate the XML document in the SBX_UAT.Xav_XMLDOCS table using the schema stored in the SBX_UAT.XAV_SCHEMAS table.

The result of the query is

The ISSCHEMAVALID() method returned 1 as a result, indicating that the XML document is valid and based on the schema.

Parting Lines

This was a quick overview of XML data processing in Teradata, along with some new methods and functions introduced in Teradata Database version 14.10. Sample codes in the blog post are just for reference to understand the common XML processing operations including storage, query and retrieval.

This is it from our side.

Let us know your thoughts in the comments below.

Until next time!

Related Posts

  • Understanding Teradata Wallet

    Teradata Wallet is a facility for storage of sensitive/secret information, such as Teradata Database user passwords. Users are able to save and retrieve items by using this facility. Teradata wallet…

  • Teradata Query Grid : With Database, NoSQL, HadoopTeradata Query Grid : With Database, NoSQL, Hadoop

    Teradata 15.0 has come up with various exciting features and enhanced capabilities, Teradata Query Grid being one of them. Teradata database is now able to connect Hadoop with this Query Grid,…

  • Teradata and JSON – Monetizing the Internet of Things (IoT)Teradata and JSON – Monetizing the Internet of Things (IoT)

    The prevalent influence of technology has resulted in a widespread use of a variety of devices such as cell phones, sensors, web API’s and browsers. These devices generate huge volume…

  • Teradata IoT capabilities & Teradata ListenerTeradata IoT capabilities & Teradata Listener

    What is Teradata: Teradata is a relational database management system (RDBMS) that is: an open system, running on a UNIX MP-RAS or Windows server platform. capable of supporting many concurrent users from various…

  • HAWQ/HDB and Hadoop with Hive and HBaseHAWQ/HDB and Hadoop with Hive and HBase

    Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big…

  • Column Store Index in SQL Server 2012

    This post is about the new feature, i.e., Column Store Index which is available since SQL 2012 version. Microsoft has released column store index to improve the performance by 10x.…

Leave a Reply

Your email address will not be published. Required fields are marked *