Overview of Teradata
Teradata is a RDBMS (Relational Database Management System). This system is solely based on off-the-shelf (readymade) symmetric multiprocessing (SMP) technology combined with communication networking, connecting SMP systems to form Massively Parallel Processing (MPP) systems. It is wholly used to manage large data warehousing operations. Teradata acts as a single data store that can accept large number of concurrent requests from multiple client applications.
Parallelism along with load distribution shared among several users, Execution of complex queries with a maximum of 256 joins, parallel efficiency, complete scalability and Teradata Intelligent Memory are some of Teradata’s widely shown Features.
Architectural View of Teradata, How it works –Shared Nothing?
|Shared Nothing Architecture: A distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.|
What is Teradata Intelligent Memory?
Teradata is “Temperature Aware” Database technology.
Intelligent Memory is the latest feature in the list of temperature-aware features in the Teradata Database. It is a feature which automatically and transparently keeps the “hottest” or most “frequently accessed data” in memory for rapid and fastest possible query performance, at a fraction of the cost of an in-memory database. Teradata Intelligent Memory combines RAM and disk for High Performance Big Data without the requirement of exclusive in-memory operations.
How is Teradata memory “Intelligent”?
The TD Database continuously tracks the “temperature” senses of all its data. The most frequently accessed information is identified inside the database on a “hot” data list. Teradata Intelligent Memory automatically puts in information to its extended memory block whenever data on the very hot list is fetched from disk for query processing and analysis. When we need this data for another query, the database automatically looks to Teradata Memory. This eliminates the need to access a physical disk where I/O speed is up to 3k times slower. Hence, since it is 3k times faster, using the temperature aware feature, we call it Teradata “Intelligent” Memory. –Automatic Intelligence
Input/Output for fetching data from disk for CPU processing and the CPU processing cycles are the two major system resources of computer in which constraints generally impact system performance. By having the most frequently used data in memory and thereby eliminating needed I/O, performance is enhanced. –Relief from I/O Constraints
The Teradata Database file system (TDFS) knows about what data is available in memory (Disk) and automatically uses that copy (cached), just as it would use data out of cache instead of going to disk. — Make the Most of memory
TD Intelligent Memory lifts up this fact in an innovative manner so as to achieve high DB performance without the cost of buying enough memory to store the entire DB. By keeping a replica of the most used data in memory, Physical Disks’ I/O to solid-state disk (SSD) and hard disk drive (HDD) can be reduced dramatically, helping in running at the speed of business. Making sure that we can access “hot” data for fast processing enhances query as well as system performance, which in return provides business leaders more timely insights to improve decision making. — Rapid access equals better business decisions.
In today’s world we are getting a terabyte/petabyte or more of memory onto one server, if we are to pay the price. As far as processing big data analytics is concerned this means that we’d have tens or even hundreds of servers if we wanted to load anything that speaks of big data and its analysis, and we’d need to have software that could integrate those resources well, assuming we wanted to hold all that data in memory.
And that is the reason why more of the companies think that in-memory technology is the thing that has been deployed to pin OLTP databases in memory and run business intelligence queries off the same DB. We can mirror the servers involved and we get a huge rise in speed. It may really be the best use of in-memory or say intelligent memory technology.
The data analysis transactions are not simple. They vary significantly according to the goal and the behavior of the data being analyzed. We cannot simply model these in the way we can model the OLTP transactions. But we know a couple of things for sure. First, the transaction speed will be faster if the most frequently accessed data is held in memory and only has to be read from disk once. Secondly, it will go faster if we employ as much parallelism as possible, so with Teradata, we have Parallelism and in-memory Transactions.
And this means that in-memory technology and big data, whether they like it or not, will really play nicely together.
How Cache benefits Teradata Intelligent Memory?
As with most computer cache techniques, Data is kept there for short periods, seconds or at most minutes. Teradata employs a caching approach as well; what it calls the FSG (File Segment) cache. And Teradata is smart enough to make sure that no data kept in the FSG cache will be moved into Intelligent Memory, and vice versa.
TIM plays with cache to emphasize on long run data usage. The increased memory blocks work best with the already present File Segment cache to list intensity based data collections that will define queries over time and stabilize them. –Cache Partner
How Teradata categorizes Multi-temperature Data?
In Teradata, the frequency at which data is accessed for R/W ops is often described as its “Heat intensity or Temperature”. Analyzing and managing data by its temperature can open up opportuni¬ties to provide value across the entire enterprise. Hence, we categorize data on the basis of their frequent access/temperature :
|Data Temp||Definition of usage||Business Examples|
|White Hot||Continuous, Often expected spikes of repeated data access||Live campaign, data repeatedly queried and accessed|
|Hot||Frequently accessed Data||Initially for live campaign, to analyze sales figures or trends, using reports|
|Warm||Data accessed less frequently and usually with less urgency||The campaign is changed or closed within a month or two|
|Cold||Data historical Information usually seen in data mining and analysis activities||Campaign ended, reports completed, now data accessed on yearly reviews|
|Dormant||Data that has not been touched for considerable period of time or not at all||Data archived, long finished campaign|
What happens to the rest of Memory after Teradata Intelligent Memory?
After use of “Hot” data, Teradata leaves behind the “Cold” data. And that “cold” data uses the Teradata virtual Storage concept and compresses the data left behind.
And then comes the Compress on Cold feature:
The Teradata Database has improved the use of hybrid storage to achieve more intelligent multi-temperature data management where the “hot” data is the most frequently used and “cold” data is the least used or dormant. It is the industry’s only intelligent virtual storage solution that automatically migrates and compresses, or decompresses data between drive types, to achieve optimum performance and storage utilization. This keeps data from turning into dead data. –Block Level Compression (BLC)
Teradata Database intelligently succeeds to manage data to maximize performance while optimizing the return on system resources. It automatically compresses the coldest or the least frequently used data on the system to save disk storage space and using the left over space for other purposes. Keeping data in its natural decompressed format when it is frequently used, maximizes performance by avoiding repeated decompression processing and is good from the system performance perspective. Compressing the less frequently used data automatically enables the storage of the most used data at the most effective costs. No DBA staff intervention is required with Teradata’s automated self-managing design.
With big data analytics becoming extensive, there’s a critical need for a database to be smart enough to dynamically judge how “hot” or “cold” the data is for an the entire enterprise. The hotter, more popular, data need to be located on the fastest storage devices while less active, cooler data can be pushed onto slower media. Cold data is compressed up to five times to gain minimal storage cost. With the Teradata Database, Teradata Virtual Storage increases intelligent management of data by automatically decompressing and relocating once cold data onto faster storage systems as demand for the data heats up. For example, the Teradata Database will recognize when monthly year-over-year data should be cycled in or out of archival media as needed and without laborious database administration intervention.
Benefits of TIM (Teradata Intelligent Memory)
TIM uses highly developed algorithms that themselves age, track, and rank data for effective data management and support for user queries. Inside TIM, we can store and compress data in columns and rows, which increases the amount of data availability in the memory space. TIM puts only the very hottest data to the new extended memory space area. Organizations make full use of it by being able to access the most current data rapidly from system memory to satisfy the vast majority of their queries, which also achieves a better financial ROI (return on investment).
Teradata storage can be increased at a lower price than virtual storage. Storage can now be implemented to a clique besides adding both nodes and storage. The probability to mix drive sizes in each clique starts positioning where the large volume archived data or other Cold data can be added within the Enterprise Data Warehouse. This increases the utilization of the EDW in reach of deep history data analysis, with a result of enhancing the ROI for the EDW.
The configuration flexibility of Virtual Storage now allows storage in a clique to be expanded in a broad range of size increments since the restrictions on drives per AMP are eliminated. Expansion can be gained by adding the desired drive count and performing a restart. The system Reconfiguration process is not needed since AMP count assignments are not typically changed. Since the Virtual Storage or in-memory based approach does not usually require added AMPs, only a system restart (which is just a few minutes long) is required after new storage is added to a system based on TVS.
When is TIM not appropriate?
This Intelligent Memory product release does not suit every TD solution and is not appropriate for consideration in some of the systems. The many Enterprise Data Warehouse systems that are focused mainly or solely on operational or Active EDW environments would contain mostly hot data. As data ages and cools, it would be moved to archive since it no longer offers apparent business value. There’s no benefit seen with Intelligent Memory in this case if the Cold data is simply no longer used as it works best for “Hot” data bust not “Dormant” Data.
The fat memory is available on the 670 Data Mart Appliance, the 2700 Data Warehouse Appliance, and on the 6700 Active Enterprise Data Warehouse, and will soon be available on the flash-accelerated Extreme Data Appliance.
The future possibilities of TIM:
Although at this time, it is not possible to say when or how TIM will be enhanced beyond the current initial release, there are definitely many possibilities for the future. Among possibilities for expanding the capability of TIM, is a greater sharing of data across more disks.
Teradata’s latest in-memory architecture is integrated with its management of data heat intensity. This is very significant, because the hottest data will locate automatically to the in-memory layer—Teradata Intelligent Memory; the next hottest data will move self to solid state disk; and, so on. Teradata also provides the column storage and data compression that amplify the value of data in memory. The customer sees increased performance without having to make decisions about which data is placed in memory.
Teradata’s Temperature aware feature accelerates warehouse query performance and increases the value of system space by safeguarding that the most recently used data is kept in memory. It is Teradata’s new approach to multi-temperature data management. Intelligent memory enables flexible configurations of mixed drive capacities within one system and clique. It also gives cost effective and simple expansion of storage in a system without having to add further TD nodes. It allows the use of mixed storage on a System. Specifically, disks of different sizes and types can be mixed in a disk array, and different disk array models can be mashed in a clique. This allows the system to get back old disks in a new configuration or mix and match larger, lower performance disks with smaller, faster performance storage.
The brain beneath the new Intelligent Memory feature is to make fine adjustments to the underlying TD database so it stores the very hot data in a FSG cache. With main memory access being on the order of 3k times faster than getting out to disk drives on a server node with the TD parallel DB, it makes sense to spend a bit on main memory.
Intelligent Memory process automatically and transparently takes the data and places it on storage by considering its thermal characteristics: Hot, Warm, Cold. It provides relevant and good use of large capacity drives for Cold/Dead data storage. Data placement is self and lucid optimized by moving most frequent accessed data (‘hot data’) to faster storage, while moving rarely accessed data (‘cold data’) to slower storage units or shared disks.
With Intelligent Memory, Teradata goes up to introduce and provide the highest performing IDW as part of the UDA (Unified Data Arch.). Intelligent Memory uses most of the main space to provide the highest query with no cost of in-memory databases. It provides the best of both worlds: It keeps the frequently used current or say “hot” data in memory to achieve high query/system performance—without the need to restrict available data to that which will fit in the available memory.
- Column Store Index in SQL Server 2012
This post is about the new feature, i.e., Column Store Index which is available since SQL 2012 version. Microsoft has released column store index to improve the performance by 10x.…
- Teradata NPARC (Named Pipe ARChive)
Introduction to Teradata: Teradata is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations. The Teradata database system…
- Oracle Goldengate
Oracle GoldenGate is an Oracle proprietary software for real-time data integration and replication that supports different databases- Oracle, DB2, SQL Server, Ingres, MySQL etc. Even the source and target database…
- HAWQ/HDB and Hadoop with Hive and HBase
Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big…
- DiP (Storm,Spark,Flink and Apex) Co-Dev opportunity
Real time data ingestion using Data Ingestion Platform (DiP) which harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to give real time data ingestion and visualization. DiP…
- Understanding Oracle Multitenant 12c database
Overview of Oracle Multitenant Databases Overview Database 12c Enterprise Edition introduces Multitenant, a new design that empowers clients to effortlessly merge numerous databases, without changing their applications. This new design…