Post1

Teradata and JSON – Monetizing the Internet of Things (IoT)

Posted by Pankaj Chahar

The prevalent influence of technology has resulted in a widespread use of a variety of devices such as cell phones, sensors, web API’s and browsers. These devices generate huge volume of data and exchange it over the internet using IoT methodology. The generated data, however, is unstructured or in JSON format, making it difficult to store it in a database. This data is believed to have huge benefit for businesses as they can use it to better understand their customers. Therefore, the need of the hour is to find a way to collect this unstructured data and put it into a structured database to be queried and analyzed. The blog post highlights how Teradata can be an ideal database management system to store the non-SQL data generated by devices into an SQL database.

The IoT Opportunities: JSON Data Sources

 

Database Management Systems Supporting JSON

  • MongoDB – it is a NoSQL DBMS and stores the JSON documents in a binary-encoded format called BSON.
  • Apache CouchDB – it is a document oriented open source database management system.
  • Azure DocumentDB – it is a Microsoft’s multi-tenant distributed database service managing JSON documents at Internet scale.
  • OrientDB – it is an open source NoSQL DBMS written in Java and natively supports JSON.
  • Elasticsearch – it is a search engine based on Lucene, schema-free JSON documents.
  • MySQL – it is the world’s most popular open source DBMS. It has native JSONdata types that enable efficient access to data in JSON, automatic validation of JSON documents stored in JSON columns.
  • Oracle Database – it is an object-relational DBMS by Oracle Corporation supporting JSON natively with relational database features.
  • PostgreSQL – it is an ORDBMS with a number of JSON functions and operators that can be used with its two JSON data types(JSON, and JSONB).

JSON Support for Teradata

If You Can’t Join Them, JSON Them

Teradata Database 15 has added JSON to its data warehouse. This means users can now define analytical queries across various systems, integrate and consume JSON data, and write & run non-SQL languages in-database. With Teradata Database 15, the supplier has aimed to fill out the logical data warehouse with a range of an organization’s data sources. The support for JSON is directed at IoT capability – the world of sensors and embedded microprocessors. Thus, Teradata Database 15 is capable of delivering on the unified data architecture promise of enabling users to gain access to data for new, powerful insights.

Multi-structured Data >> Data Warehouse

Teradata Database 15: JSON INTEGRATION

JSON (Javascript Object Notation) is a way to store information in an organized and accessible manner. It is a data interchange format, often used to transmit data in web applications. JSON is easier to read and write for humans and easier to parse and generate for machines than XML. The new Teradata Database version can store and process JSON records as JSON documents or in relational format. Teradata Database extends the following support for JSON data:

  • JSON data type: allows storing of JSON data in text and binary (BSON, UBJSON) formats.
  • Shredding: provides shredding functionality to extract values from JSON documents (which are about 16MB in size) and store this data in relational format.
  • Publishing: allows publishing the results of SQL queries in JSON format.
  • Schema on read, late binding: supports schema-less or dynamic schema to add new attributes without changing the schema. The data stored with the new attributes is immediately available for querying, while the rows without the new attributes are filtered out.
  • Geospatial conversion: provides functionality to convert ST_Geometry object into a GeoJSON value and vice-versa.
  • Compression: allows compression of JSON data of varying lengths.
  • Statistics, JIs for performance: collects statistics on extracted portions of the JSON data type.
  • SQL Queries via JSONPath .dot notation: JSONPath support to provide simple traversal and regular expressions with wildcards to filter and navigate complex JSON documents.
  • There are methods, functions, and stored procedures for parsing and validation that operate on the JSON data type.
  • Uses the existing join indexing structures on extracted portions of the JSON data type.
  • Apply advanced analytics to JSON data.
  • Use standard SQL to query JSON data.

    Teradata Client Application Support for JSON

    • CLI: Full native DBS support.
    • ODBC: The ODBC driver supports LOB Input, Output and InputOutput parameters. Therefore, it can load JSON data. Also, the its Catalog (Data Dictionary) functions support JSON.
    • JDBC: VARCHAR or CLOB values can be inserted into JSON destination columns.
    • .NET Data Provider: No support.
    • Teradata Parallel Transporter (TPT): JSON columns are treated like CLOB columns and are subjected to the same limitations. JSON columns are limited to 16 MB in size. Full support to import and export.
    • BTEQ: Cannot use the JSON keyword in the USING data statement; therefore, JSON values must be referred either as CLOB or VARCHAR, with the value not exceeding 64 KB.
    • Standalone Utilities: No support.

    Benefits of JSON Support in Teradata Database

    • JSON is the most widely accepted and used text-based data interchange format
    • It is independent of any programming languages
    • Nested structure of key-value or name-value pairs
    • Self-defining, easily manipulated data structure
    • New data dimension enables richer analytical insight
    • JSON integration joins existing XML integration
    • Accessed by developer friendly SQL extension
    • Optimizer, performance and availability feature integration
    • Late binding schema on read flexibility adds agility and avoids data warehouse changes

    Teradata Terminology for JSON

    • A JSON document is a string conforming to the JSON format. At the time of discussing JSON values in SQL context, JSON documents are referred to as an instance of the JSON data type.
    • When a JSON document structured is treated as an object it is written inside curly braces {}, while a JSON document structured when treated as an array it is written inside square bracket []. However, both are treated as JSON data type instances when taken in SQL context.
    • The serialized form is called JSON document and the structure of the JSON document is described as “a JSON document structured as an array or object” or JSON array and JSON object.

    The following is an example of a JSON document.

    Examples of Teradata JSON Statements

    **Select statement

    SELECT NEW JSON(‘{“name”:”Al”,”name”:”PankajDBA”}’).name;

    NEW JSON(‘{“name”:”Al”,”name”:”PankajDBA”}’, LATIN).name

    SELECT NEW JSON(‘{“name”:”Al”,”name”:”PankajDBA”}’).JSONExtract(‘$.name’);

    NEW JSON(‘{“name”:”Al”,”name”:”PankajDBA”}’, LATIN).JSONEXTRACT(‘$.name’)

     

    **Create table statement

    CREATE TABLE DBA_table_test (eno INTEGER, edata JSON(100));

     

    **Insert into  statement

    INSERT INTO DBA_table_test VALUES(2,

    ‘{“name” : “Deepak”, “age” : 24}’);

     

    **Insert into JSON column

    INSERT INTO DBA_table_test VALUES(3,

    ‘{“name” : “Deepak”‘);

     

    **Create table Statement with encoding

    CREATE TABLE json_table(id INTEGER, json_j1 JSON CHARACTER SET UNICODE);

     

    **Create table Statement with JSON columns

    CREATE TABLE employee ( id INTEGER, json1 JSON(20), json2 JSON(25), json3 JSON(30), json4 JSON(1000), json5 JSON(5000));

     

    **Update and Set Statement

    UPDATE DBA_table_test SET edata = NEW JSON(‘{“name” : “himanshu”}’) WHERE edata.JSONExtractValue(‘$.name’) = ‘Deepak’;

     

    **Update column value Statement

    UPDATE DBA_table_test SET edata = ‘{“name” : “varun”}’;

     

    **Delete Statement

    DELETE DBA_table_test WHERE CAST (edata.JSONExtractValue(‘$.age’) AS INTEGER) = 24;

    Loading JSON Data

    • Load utility support
      • Supported like LOB data – only in SQL protocol, not bulk loading
      • Choice of TPT script, SQL Assistant (SQLA), ODBC, JDBC, and ARC (backup)
      • Not currently supported: BTEQ, TPT API (which only supports bulk loading protocols)
    • Many ETL tools support JSON data sources

    Enable JSON Support in Teradata

    To enable JSON support in Teradata Database, the DBS Control field EnableJSON must be set to TRUE.

    Rules for JSON Data

    There are certain rules to use Unicode characters in JSON data types, including specifying maximum limits and exponential numbers in JSON instances, along with removing white space inside or outside the root of the JSON object or array.

    • JSON allows inserting Unicode characters to the hexadecimal formats in a character string using ‘\u’ string as an escape sequence. Although it is allowed within the JSON type, the ‘\u’ hex Unicode character is not interpreted; but is merely stored as it is.
    • The white space characters inside the root of the JSON object or array are regarded significant, thus, are not removed. Similarly, the white space outside the root of the JSON object or array is trimmed, and not removed, for all instances of the JSON type.
    • The range of nesting imposed on a JSON instance is between 512 and 32000. It is set using the JSON_MaxDepth DBS Control Field. A nested object or array are not counted in this limit.
    • Teradata Database supports storing exponential numbers using scientific notation in JSON documents. Numbers can have a value between -1e308 and 1e308, non-inclusive, while the range of exponents is between -308 and 308, non-inclusive. Any exponent or number specified outside of the valid range will result in an error.

    The new version of Teradata Database with support for JSON for IoT capabilities have added another feather to the range of benefits of the database management system. Teradata’s ability to store the non-SQL data generated by devices into an SQL database will help companies gain more information and insights about their customers’ needs and purchasing patterns.

    That is all from us for this time. Let us know in comments how you perceive the new opportunities that open up with this amendment to Teradata and what the future holds for the database management system.

    Until next time!

    Credits:                                                                

    Xavient Digital – powered by TELUS International

    Technical team:

    Pankaj Chahar         

    References:

    https://en.wikipedia.org

    https://www.teradata.com/Press-Releases/2015/Teradata-Accelerates-JSON-Data-Query-Performa

    https://info.teradata.com/htmlpubs/DB_TTU_16_00/index.html#page/Teradata_Data_Types/B035-1150-160K/seb1472240527441.html

    http://www.teradatawiki.net/2014/12/Teradata-15-features.html

     

Related Posts

  • Teradata IoT capabilities & Teradata ListenerTeradata IoT capabilities & Teradata Listener

    What is Teradata: Teradata is a relational database management system (RDBMS) that is: an open system, running on a UNIX MP-RAS or Windows server platform. capable of supporting many concurrent users from various…

  • Understanding Teradata Wallet

    Teradata Wallet is a facility for storage of sensitive/secret information, such as Teradata Database user passwords. Users are able to save and retrieve items by using this facility. Teradata wallet…

  • Teradata Query Grid : With Database, NoSQL, HadoopTeradata Query Grid : With Database, NoSQL, Hadoop

    Teradata 15.0 has come up with various exciting features and enhanced capabilities, Teradata Query Grid being one of them. Teradata database is now able to connect Hadoop with this Query Grid,…

  • Teradata NPARC (Named Pipe ARChive)

    Introduction to Teradata: Teradata is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations. The Teradata database system…

  • Content Data Store

    Content Data Store Content Data Store (CDS) is a system to provide storage facilities to massive data sets in the form of images, pdfs, documents and scanned documents. This dataset…

  • DiP (Storm,Spark,Flink and Apex) Co-Dev opportunityDiP (Storm,Spark,Flink and Apex) Co-Dev opportunity

    Real time data ingestion using Data Ingestion Platform (DiP) which harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to give real time data ingestion and visualization. DiP…

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *