Post3

Different Application of Joins in SQL

Posted by

Structurally organized with large quantities of information, databases are the backbone of every organization. That is why companies invest in database applications such as MySQL, PostgreSQL, Microsoft SQL Server and Oracle to store, retrieve, and manage their data. In addition to minimizing paper work, DBMS makes it easy to fetch historical data.

The task of retrieving data is made simple by a range of clauses and commands in SQL. Join clause is arguably the most used clause to retrieve data from the piles of record stored in numerous tables as it helps combine fields from multiple tables by using values common to each. Regardless of how difficult the scenario is to fetch the data, programmers do find a way to get the desired result, from combining different joins in a single SQL command to creating temporary tables to using variables.

That said, there always comes a situation when conventional procedures fail to deliver result. In such situations, programmers have to try different approaches. In this blog post, we are discussing one such scenario.

The Scenario

Imagine a situation in which fetching columns will be decided on a key column value, with source table having multiple records for a day but the target table should have one record per day.

Source Table

CUSTOMER_

ID

CHAR_

ID

EFF_DATE STATUS IND Need to do
1 15 1-Aug-17 A ? Combine (status and IND values) that are coming on a day
1 7 1-Aug-17 ? Y
1 15 2-Aug-17 C ? Get IND value from previous date
1 7 3-Aug-17 ? N Get Status value from previous date
1 15 4-Aug-17 D ? Combine them as these are on same day
1 7 4-Aug-17 ? Y

 

Target Table

CUSTOMER_ID EFF_DATE STATUS IND
1 1-Aug-17 A Y
1 2-Aug-17 C Y
1 3-Aug-17 C N
1 4-Aug-17 D Y

 

Limitations

We have one source and one target table. The source table has jumbled data and we cannot change its structure (DDL) as it is already in use.

Problem Statement

  • For a day, Status column will be populated in target table from row having CHAR_ID column value 15 only.
  • If on that day CHAR_ID=15 record is not present then status will be taken from previous day CHAR_ID=15 record.
  • For a day, IND will be populated in target table from row having CHAR_ID column value 7 only.
  • If on that day CHAR_ID=7 is not present then IND will be taken from previous day IND =7 record.
  • In case on a day, we have both values for CHAR_ID=15 and CHAR=7, values of columns STATUS and IND will be merged into one.

Solution

Firstly, we divide the data in two different sets of sources based on values of column CHAR_ID

Source 1

CUSTOMER_ID CHAR_ID EFF_DATE STATUS
1 15 1-Aug-17 A
1 15 2-Aug-17 C
1 15 4-Aug-17 D

 

Source 2

CUSTOMER_ID CHAR_ID EFF_DATE IND
1 7 1-Aug-17 Y
1 7 3-Aug-17 N
1 7 4-Aug-17 Y

 

Now to fetch all records from source 1, we join both the data sets based on CUSTOMER_ID and EFF_DATE on below condition:

  • Getting 1 record from source 1 table for an effective date and then joining it with the complete source 2
  • Source 1 effective date should be greater than or equal to source 2 effective date
  • This action will result in multiple records, thus, we will need to filter the records. So, the record with a higher source 2 effective date against a source 1 effective date will be picked.

SELECT * FROM (

SELECT S1.CUSTOMER_ID, S1.EFF_DATE, S1.STATUS, S2.EFF_DATE AS EFF_DATE1, S2.IND,

ROW_NUMBER() OVER(PARTITION BY S1.CUSTOMER_ID, S1.EFF_DATE ORDER BY S2.EFF_DATE DESC) AS RW

FROM SOURCE1 S1

LEFT JOIN SOURCE2 S2

ON S1.CUSTOMER_ID= S2.CUSTOMER_ID

AND S1.EFF_DATE>= S2.EFF_DATE

) A

WHERE RW=1;

Above query will internally process data in two steps as depicted below and will provide result as Step 2:

Step 1

CUSTOMER_ID CHAR_ID S1.EFF_DATE STATUS S2.EFF_DATE1 IND
1 15 1-Aug-17 A 1-Aug-17 Y
1 15 2-Aug-17 C 1-Aug-17 Y
1 15 4-Aug-17 D 1-Aug-17 Y
1 15 4-Aug-17 D 3-Aug-17 N
1 15 4-Aug-17 D 4-Aug-17 Y

 

After filtration, we will get one record per day basis.

Step 2

CUSTOMER_ID CHAR_ID S1.EFF_DATE STATUS S2.EFF_DATE1 IND
1 15 1-Aug-17 A 1-Aug-17 Y
1 15 2-Aug-17 C 1-Aug-17 Y
1 15 4-Aug-17 D 4-Aug-17 Y

 

Now again we join these two columns CUSTOMER_ID and EFF_DATE but this time for CHAR_ID = 7 on below condition:

  • Getting 1 record from source 2 table for an effective date and joining with complete source 1
  • Source 2 effective date should be greater than or equal to source 1 effective date.
  • This action will again result in multiple records, thus, we will need to filter the records, but this time the record with a higher source 1 effective date against a source 2 effective date will be picked.

SELECT * FROM (

SELECT S2.CUSTOMER_ID, S2.EFF_DATE, S1.STATUS, S1.EFF_DATE AS EFF_DATE1, S2.IND,

ROW_NUMBER() OVER(PARTITION BY S2.CUSTOMER_ID, S2.EFF_DATE ORDER BY S1.EFF_DATE DESC) AS RW

FROM SOURCE1 S1

LEFT JOIN SOURCE2 S2

ON S1.CUSTOMER_ID= S2.CUSTOMER_ID

AND S2.EFF_DATE>= S1.EFF_DATE

) A

WHERE RW=1

The query will again internally process data in two steps and will provide result as in Step 2:

Step 1

CUSTOMER_ID CHAR_ID S2.EFF_DATE IND S1.EFF_DATE1 STATUS
1 7 1-Aug-17 Y 1-Aug-17 A
1 7 3-Aug-17 N 1-Aug-17 A
1 7 3-Aug-17 N 2-Aug-17 C
1 7 4-Aug-17 Y 1-Aug-17 A
1 7 4-Aug-17 Y 2-Aug-17 C
1 7 4-Aug-17 Y 4-Aug-17 D

 

After filtration, we will get one record per day basis.

Step 2

CUSTOMER_ID CHAR_ID S2.EFF_DATE STATUS S1.EFF_ DATE1 IND
1 7 1-Aug-17 A 1-Aug-17 Y
1 7 3-Aug-17 C 2-Aug-17 N
1 7 4-Aug-17 D 4-Aug-17 Y

 

Thus, by running the two commands we get two result set as:

Source 1

CUSTOMER_ID CHAR_ID EFF_DATE STATUS EFF_DATE1 IND
1 15 1-Aug-17 A 1-Aug-17 Y
1 15 2-Aug-17 C 1-Aug-17 Y
1 15 4-Aug-17 D 4-Aug-17 Y

 

Source 2

CUSTOMER_ID CHAR_ID EFF_DATE STATUS EFF_DATE1 IND
1 7 1-Aug-17 A 1-Aug-17 Y
1 7 3-Aug-17 C 2-Aug-17 N
1 7 4-Aug-17 D 4-Aug-17 Y

 

Now, we’ll simply use the UNION command between above two result set and we will get the final desired result set with columns, CUSTOMER_ID, EFF_DATE, STATUS, and IND

Result:

CUSTOMER_ID EFF_DATE STATUS IND
1 1-Aug-17 A Y
1 2-Aug-17 C Y
1 3-Aug-17 C N
1 4-Aug-17 D Y

 

Through this scenario we witnessed a completely different application of joins in SQL. We first had to divide our source table into two data sets. Then apply left join on both the data sets one after the other and get two additional data sets. And finally use the union clause get the desired result. Although, the approach used to solve the problem is complicated, it helped us achieved the desired result, and is probably the only possible way out. Let us know, if you think otherwise through your comments, we would love to know alternatives.

That’s all for now, until next time!

Related Posts

  • Working with SQL Server Database on Microsoft Azure (Part 2)Working with SQL Server Database on Microsoft Azure (Part 2)

    The huge volume of data generated around us, today, requires optimum storage and management to have any benefit for organizations. With a host of advantages that MS Azure possesses, especially…

  • Working with SQL Server Database on Microsoft Azure (Part 1)Working with SQL Server Database on Microsoft Azure (Part 1)

    One of the trickiest implementation for any organization is the database. They require dedicated server and someone to manage it. Cloud services such as Microsoft Azure give organizations opportunity to…

  • Microsoft SQL Server Execution PlanMicrosoft SQL Server Execution Plan

    In a database environment, certain questions related to performance pop up repeatedly: why is my query running slowly? And why is SQL Server not using the index? etc.  As a…

  • SQL Server 2017 on Ubuntu PlatformSQL Server 2017 on Ubuntu Platform

    Unlike previous releases of SQL Server, this time Microsoft has also given us the choice of picking the operating system on which we wish to install the application. This means…

  • SQL Server – Reporting Services MigrationSQL Server – Reporting Services Migration

    A while back, I got a task in my project, which was to migrate SQL Server Reporting Services from SQL 2008 to SQL 2012. Now, I had two ways of…

  • Column Store Index in SQL Server 2012

    This post is about the new feature, i.e., Column Store Index which is available since SQL 2012 version. Microsoft has released column store index to improve the performance by 10x.…

Leave a Reply

Your email address will not be published. Required fields are marked *