How to track data lineage with SQL Server Graph Tables – Part 3 Populate Graph Tables

In this 3rd article I will show you how to use the procedures that we created in the previous post to populate the graph tables with useful data lineage information.

Data Flow

We will simulate the flow of data from two source systems (Finance and HR) into a data warehouse/business intelligence system as illustrated in the diagram below.

Staging

Most modern BI systems include a staging area, also sometimes referred to as a landing area. Data is collected from various disparate systems and loaded into the staging database with very minimal transformations, typically only data type conversions. The goal is to quickly grab the data from the different source systems and load it into the staging area for further processing so that any performance impact to the source systems is minimized.

To capture the data lineage for this first hop from source system to staging we can use our procedure as follows.

-- JDE to Staging
EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'Finance',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'JDE',
		@TargetDatabase = 'Staging',
		
		--Table Level
		@SourceTable = 'F0411',
		@TargetTable = 'F0411',

		--Column Level
		@SourceColumnList = 'RPKCO, RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ',
		@TargetColumnList = 'RPKCO, RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ',

		@ProcessName = Load F0411';
GO



-- HR to Staging 
EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'HR',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'Payroll',
		@TargetDatabase = 'Staging',
		
		--Table Level
		@SourceTable = 'EMP',
		@TargetTable = 'EMP',

		--Column Level
		@SourceColumnList = 'EMPID, HIREDT, JOBTITLE',
		@TargetColumnList = 'EMPID, HIREDT, JOBTITLE',

		@ProcessName = 'Load EMP';
GO 

As you can see from the code above it is very easy to map an input table to an output table using the procedures we created in the previous steps.

EDW

Next, we will capture the data lineage from Staging to the Enterprise Data Warehouse (EDW)

-- Staging to EDW for Finance tables

EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'Business Intelligence System',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'Staging',
		@TargetDatabase = 'EDW',
		
		--Table Level
		@SourceTable = 'F0411',
		@TargetTable = 'Fact Accounts Payable',

		--Column Level
		@SourceColumnList = 'RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ, RPDIVJ',
		@TargetColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number, Payee Address Number, 	Approvers Address Number, Invoice Date - Julian, Invoice Date - Gregorian',

		@ProcessName = 'Load Fact Accounts Payable';
GO



-- Staging to EDW for HR tables
EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'Business Intelligence System',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'Staging',
		@TargetDatabase = 'EDW',
		
		--Table Level
		@SourceTable = 'EMP',
		@TargetTable = 'Dim Employee',

		--Column Level
		@SourceColumnList = 'EMPID, HIREDT, HIREDT, JOBTITLE',
		@TargetColumnList = 'Employee Number, Hire Date, Tenure, Job Title',

		@ProcessName = 'Load Dim Employee';
GO

If you have a keen eye you might have noticed that we are mapping some of the fields twice. The field RPDIVJ in the JDE table F0411 is mapped to both Invoice Date – Julian and Invoice Date – Gregorian. The field HIREDT in the HR EMP table is mapped to both the Hire Date and the Tenure columns. In both scenarios this implies some transformation logic between these columns.

Accounts Payable Model

The final step in our example is the process that loads the SQL Server Analysis Services (SSAS) model.

-- EDW to SSAS Finance
EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'Business Intelligence System',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'EDW',
		@TargetDatabase = 'SSAS Accounts Payable Model',
		
		--Table Level
		@SourceTable = 'Fact Accounts Payable',
		@TargetTable = 'Fact Accounts Payable',

		--Column Level
		@SourceColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number,  Invoice Date - Gregorian',
		@TargetColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number,  Invoice Date',

		@ProcessName = 'Process SSAS Accounts Payable Model'



-- EDW to SSAS HR
EXEC	[dbo].[Create_Data_Lineage]
		
		--System level
		@SourceSystem = 'Business Intelligence System',
		@TargetSystem = 'Business Intelligence System',

		--Database level 
		@SourceDatabase = 'EDW',
		@TargetDatabase = 'SSAS Accounts Payable Model',
		
		--Table Level
		@SourceTable = 'Dim Employee',
		@TargetTable = 'Dim Employee',

		--Column Level
		@SourceColumnList = 'Employee Number, Hire Date, Tenure,  Job Title',
		@TargetColumnList = 'Employee Number, Hire Date, Tenure,  Job Title',

		@ProcessName = 'Process SSAS Accounts Payable Model'


GO

As you can see in the code above the Invoice Date – Gregorian from the Finance system is used to populate the Invoice Date and the Invoice Date – Julian did not make it into the SSAS cube.

Summary

The database procedures make it really easy to capture data lineage, they could even be enhanced to accept and parse JSON or XML inputs so that data lineage could be dropped in and maintained using a web application….can someone say Power App

In the next article I will cover how to query the graph tables as well as visualize the data in order to answer data lineage questions.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

2 thoughts on “How to track data lineage with SQL Server Graph Tables – Part 3 Populate Graph Tables

  1. Hi,
    What seems to be missing in your data lineage solution is a mechanism to collect information for source-to-target transformations, for instance by parsing executed SQL-statements.
    Or am I missing something?
    Regards,
    RN

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s