In this 3rd article I will show you how to use the procedures that we created in the previous post to populate the graph tables with useful data lineage information.
Data Flow
We will simulate the flow of data from two source systems (Finance and HR) into a data warehouse/business intelligence system as illustrated in the diagram below.

Staging
Most modern BI systems include a staging area, also sometimes referred to as a landing area. Data is collected from various disparate systems and loaded into the staging database with very minimal transformations, typically only data type conversions. The goal is to quickly grab the data from the different source systems and load it into the staging area for further processing so that any performance impact to the source systems is minimized.
To capture the data lineage for this first hop from source system to staging we can use our procedure as follows.
-- JDE to Staging
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'Finance',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'JDE',
@TargetDatabase = 'Staging',
--Table Level
@SourceTable = 'F0411',
@TargetTable = 'F0411',
--Column Level
@SourceColumnList = 'RPKCO, RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ',
@TargetColumnList = 'RPKCO, RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ',
@ProcessName = Load F0411';
GO
-- HR to Staging
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'HR',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'Payroll',
@TargetDatabase = 'Staging',
--Table Level
@SourceTable = 'EMP',
@TargetTable = 'EMP',
--Column Level
@SourceColumnList = 'EMPID, HIREDT, JOBTITLE',
@TargetColumnList = 'EMPID, HIREDT, JOBTITLE',
@ProcessName = 'Load EMP';
GO
As you can see from the code above it is very easy to map an input table to an output table using the procedures we created in the previous steps.
EDW
Next, we will capture the data lineage from Staging to the Enterprise Data Warehouse (EDW)
-- Staging to EDW for Finance tables
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'Business Intelligence System',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'Staging',
@TargetDatabase = 'EDW',
--Table Level
@SourceTable = 'F0411',
@TargetTable = 'Fact Accounts Payable',
--Column Level
@SourceColumnList = 'RPDOC, RPDCT, RPSFX, RPSFXE, RPDCTA, RPAN8, RPPYE, RPSNTO, RPDIVJ, RPDIVJ',
@TargetColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number, Payee Address Number, Approvers Address Number, Invoice Date - Julian, Invoice Date - Gregorian',
@ProcessName = 'Load Fact Accounts Payable';
GO
-- Staging to EDW for HR tables
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'Business Intelligence System',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'Staging',
@TargetDatabase = 'EDW',
--Table Level
@SourceTable = 'EMP',
@TargetTable = 'Dim Employee',
--Column Level
@SourceColumnList = 'EMPID, HIREDT, HIREDT, JOBTITLE',
@TargetColumnList = 'Employee Number, Hire Date, Tenure, Job Title',
@ProcessName = 'Load Dim Employee';
GO
If you have a keen eye you might have noticed that we are mapping some of the fields twice. The field RPDIVJ in the JDE table F0411 is mapped to both Invoice Date – Julian and Invoice Date – Gregorian. The field HIREDT in the HR EMP table is mapped to both the Hire Date and the Tenure columns. In both scenarios this implies some transformation logic between these columns.
Accounts Payable Model
The final step in our example is the process that loads the SQL Server Analysis Services (SSAS) model.
-- EDW to SSAS Finance
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'Business Intelligence System',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'EDW',
@TargetDatabase = 'SSAS Accounts Payable Model',
--Table Level
@SourceTable = 'Fact Accounts Payable',
@TargetTable = 'Fact Accounts Payable',
--Column Level
@SourceColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number, Invoice Date - Gregorian',
@TargetColumnList = 'Document, Document Type, Document Pay Item, Pay Item Extension Number, Adjusting Document Type, Address Number, Invoice Date',
@ProcessName = 'Process SSAS Accounts Payable Model'
-- EDW to SSAS HR
EXEC [dbo].[Create_Data_Lineage]
--System level
@SourceSystem = 'Business Intelligence System',
@TargetSystem = 'Business Intelligence System',
--Database level
@SourceDatabase = 'EDW',
@TargetDatabase = 'SSAS Accounts Payable Model',
--Table Level
@SourceTable = 'Dim Employee',
@TargetTable = 'Dim Employee',
--Column Level
@SourceColumnList = 'Employee Number, Hire Date, Tenure, Job Title',
@TargetColumnList = 'Employee Number, Hire Date, Tenure, Job Title',
@ProcessName = 'Process SSAS Accounts Payable Model'
GO
As you can see in the code above the Invoice Date – Gregorian from the Finance system is used to populate the Invoice Date and the Invoice Date – Julian did not make it into the SSAS cube.
Summary
The database procedures make it really easy to capture data lineage, they could even be enhanced to accept and parse JSON or XML inputs so that data lineage could be dropped in and maintained using a web application….can someone say Power App
In the next article I will cover how to query the graph tables as well as visualize the data in order to answer data lineage questions.
Hopefully you have found this to be another practical post.
Until next time.
Anthony