How to track data lineage with SQL Server Graph Tables – Part 1 Create Nodes and Edges

Anthony Presents

Where did this data come from?
How can I trust this data?
What impact will changing this field have on other systems?

If you have ever heard or asked any of these questions, then this series of blog posts is for you.

As data volumes continue to grow so too does the need to manage the data estate. One critical aspect with managing the data estate is understanding data ancestry. Thankfully you can leverage SQL Server 2019 Graph tables to track the lineage of one of your most valuable assets, your data.

In this series of blog posts, I will show you how you can use how you can use Graph Tables in SQL Server 2019 to capture and report on data lineage.

Prerequisites

  • SQL Server 2019 Developer Edition (You can download a free copy for development use from here )
  • SQL Server Management Studio 18 (You can download a…

View original post 679 more words

Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 1 MongoDB

Featured

Data virtualization enables a single access point to data without the need to physically copy it from its source into a central repository and/or data management system. This is particularly useful when dealing with “big data” where the volume, variety, velocity or veracity of the data make it unwieldly to work with. Modern data storage ecosystems seamlessly integrate multiple technologies in order to deliver the best set of capabilities with the broadest reach into internal and external data sources. This also referred to as polyglot persistence and is premised on the idea of using different data storage technologies to handle different data storage needs for a “best of breed” solution.

In this series of articles I will show you how to use Polybase in SQL Server 2019 to access data stored in MongoDB. This will allow you to leverage the flexibility and power of MongoDB’s schema on read design to store JSON documents but surface them up in SQL Server as if they were tables inside of a standard relational database management system. This is ideal for situations in which you need to seamlessly integrate structured and unstructured data.

In this article we will focus on setting up the Ubuntu Server in Azure, installing MongoDB and adding some data to it.

Overview

The diagram below provides a conceptual overview of the architecture for this tutorial.

As you can see in the image above, we will use two virtual machines (VM) running in Azure. One VM will house MongDB and the other SQL Server 2019. It is possible to put SQL Server 2019 and MongoDB on the same box however, I setup the environment this way to approximate what a typical setup would be for most organizations.

Prerequisites

The following items are required to follow this blog post.

Spin up an Ubuntu Server in Azure

Microsoft has a pre-built VM in Azure that you can spin up with a couple of clicks. In Azure search for Ubuntu and select the Ubuntu Server 18.04 LTS. It should look similar to the screen clip below.

Fill in the instance details . I used a D2s v3 for my VM size.

For the Administrator Account section, you must use SSH public key.

To generate the SSH public key I used PuTTYgen. It comes with PuTTY. Move your mouse around to generate the key. Copy (below) and paste (above) the text that gets generated by the tool into Azure.

Once you have generated the key be sure to click Save private key because you will need this to connect to the VM and install MongoDB later on.

For the inbound port rules be sure to allow SSH (port 22) to connect to the VM.

For the Disks I just used Standard HDD.

I did not create any extensions.

Feel free to add tags at your own discretion, they are handy if you have a lot of resources to manage. Click create to spin up the VM. Next we will connect to the VM using PuTTY and install MongoDB.

Connect

We will use PuTTY to connect to the VM. Launch PuTTY and fill out the Host Name (or IP address) with the Public IP Address from Azure.

Next expand SSH and select Auth. Pick the file that contains the private key file which you saved when generating the RSA key using PuTTYgen.

Click open and enter the user ID that you created when filling out the Administrator Account section of the VM creation screen.

Install MongoDB

Once connected run the following commands.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5


echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list


sudo apt-get update


sudo apt-get install -y mongodb-org

This will install MongoDB and apply the latest updates to it. Next fire up the MongoDB service by running the following command.

sudo service mongod start

If everything went smoothly you should be able to open the MongoDB terminal by running the following command.

Mongo

This should give you a screen similar to the one below.

Create some data

With the database open enter the following command to generate some data into a collection called testData.

for (var i = 1; i <= 150; i++) {
 db.testData.insert( { x : i } )
}

Again, for the sake of conciseness I will not bother securing the database by creating user ids etc. I do recommend properly securing the database for any typical workload other than for demonstration purposes.

Type exit to leave Mongo DB. Next we will need to enable remote access to the MongoDB.

Enable remote connectivity to MongoDB

WARNING this is not recommended for a production environment and is only done for the sake of brevity in this blog post. To properly enable remote connectivity refer to the following security checklist from MongoDB.

MongoDB’s default IP binding prevents it from connecting outside of the local host. In order to change this, we need to modify the mongod.conf file.

Run the following command

Sudo nano /etc/mongod.conf

This will launch Nano which allow you to modify the configuration file as follows.

As you can see in the image above we need to add a new bindIp: 0.0.0.0 and commend out or delete the existing one. Be sure to exit and save the changes to the config file. It is also a good idea to restart MongoDB by running the following command.

sudo service mongod restart

Exit the connection to the Ubuntu server. We now need to add a port to the VM to allow for outside connections to it. To do this we need to go back to the VM in Azure.

Add inbound security rule

From the Azure portal navigate to the VM and click on Networking.

Select Add inbound port rule and fill out the screen as follows.

Once this is done, we can now test connecting to the MongoDB database using Studio 3T or any other tool.

Studio 3T

To connect to the MongoDB database use the public IP that was specified in Azure and also fill out the SSH Tunnel information as follows.

Notice that you need to use the same private key as before. Test the connection and you should see successfully results.

In the next post I will cover setting up SQL Server 2019.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

Further References

https://docs.microsoft.com/en-us/azure/cosmos-db/connect-mongodb-account

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/install-mongodb

https://tanzimsaqib.wordpress.com/2015/06/12/installing-and-securing-mongodb-on-windows-linux-on-azure/

http://timmyreilly.azurewebsites.net/running-mongo-on-ubuntu-virtual-machine-in-azure/

Instant insights, automation and action – Part 4 Register Power BI in Azure Active Directory

This is the fourth post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power BI Report.

In this article I will cover how to enable data to be pushed into Power BI use Flow. This is a fast no code solution.


This is a one-time setup that is required in order to use the Power BI connector in MS Flow. If you do not do this step you will see an error screen in MS Flow like the screen clip below.


Prerequisites

In order to complete this tutorial, you will need permission to register applications in your Azure Active Directory tenant.


For more information on the Azure AD Tenant you can click the following link.

https://docs.microsoft.com/en-us/power-bi/developer/create-an-azure-active-directory-tenant

Power BI Development Center

Log onto the Power BI Development Center and enable API features and get the key to register the app in Azure.

Go to the following URL and sign in.

https://dev.powerbi.com/apps


Enter in a meaningful name for your app, I called mine AnthonysPowerBIApp but you can call yours whatever you would like. Choose Native for the Application Type and select Read all datasets and Read and write all datasets for the API Access


Click on Register. A screen like the one below should pop up. Be sure to copy down the Application ID as this is needed to register the application in Azure.


Azure Portal

Next log onto the azure portal using the following URL https://portal.azure.com/#home

Once in the portal admin page navigate to the Azure Active Directory menu blade


Next click on App registrations and select the app that we created using the Power BI Development Center.


You can change settings in the app if you whish to tailor it be clicking on Properties.

Now that the Power BI App has been registered in Azure Active Directory you can use it in various Microsoft cloud services such as Flow.


As you can see in the image above, I no longer get a permission error and I am able to select the workspace, dataset and table.


In the next post we will build out the flow so that data is passed from the Power App to an Azure Machine Learning experiment for scoring and then into the Power BI API Enabled Dataset for real-time analytics.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

Here is the official documentation from Microsoft on how to register Power BI to push data into it using REST API calls.

https://docs.microsoft.com/en-us/power-bi/developer/overview-of-power-bi-rest-api


Instant insights, automation and action – Part 3 Create the Power BI Report

This is the third post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power App. In this article I will cover the Power BI report.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Create the API Enabled Dataset

Log onto Power BI and create a new app workspace called Customer Segmentation. This step is not required however if you are like me you create a lot of different content so it’s a good habitat to get into so that you can better manage your work.


In case you are wondering the screen clip above is using the new App Workspace experience. Next, we will create a new streaming data set.

On the splash page for the app click Skip at the bottom right corner of the page.


Now select +Create > Streaming dataset.


Select API and click next.


Next create the WholeSaleCustomer dataset.

It will have the following field names and data types

Field Name Data Type
Customer Name Text
Channel Number
Region Number
Fresh Number
Milk Number
Grocery Number
Frozen Number
Detergents_Paper Number
Delicassen Number
Category Number


Click the Create button to generate the dataset.

Next, we will leverage the generated PowerShell script to create some test records in our newly formed dataset. Click on PowerShell and copy the code into Notepad.


We will create three test records by running the PowerShell code below. Modify the code you coped into Notepad so that it looks simlar to the code below. Before you can run this you will need to replace <Your Key> with the key displayed in your Power BI service.

$endpoint = "https://api.powerbi.com/beta/8c17d9d4-2652-4573-8a9c-d5dde0750715/datasets/13b74183-5eb2-480b-ba11-c0af0ecbdd26/rows?
key=<Your Key>

$payload = @{
"Customer Name" ="Test1"
"Channel" =1
"Region" =1
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =0
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))
$payload = @{
"Customer Name" ="Test2"
"Channel" =2
"Region" =2
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =1
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))
$payload = @{
"Customer Name" ="Test3"
"Channel" =3
"Region" =3
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =2
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))

To do this launch PowerShell in Administrator mode and copy and paste the code into the PowerShell desktop app.


The data set now has three records in it and you can start to use it in Power BI. To do this go to the dataset and click the three dots beside the name of the dataset. This will open a new report with a blank canvas. Add a table and drop all of the fields from the data set into the visual.


As you may notice from the screen shot above the fields Fresh, Milk, Grocery, Frozen, Detergents_Paper and Delicassen are not formatted as currency but should be. Unfortunately, API enabled data sets only have three data types Text, Number and Date and no formatting options so we cannot specify that these fields are currency fields.

Thankfully we can leverage the Report level measures for live connections to Analysis Services tabular models & Power BI service datasets feature that was released in May 2017 to add new measures with the proper currency data type defined.
Continue reading “Instant insights, automation and action – Part 3 Create the Power BI Report”

Instant insights, automation and action – Part 2 Create Azure Machine Learning Experiment

This is the second post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power App. In this article I will cover the Azure Machine Learning Studio Experiment.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Build the Azure Machine Learning Experiment

The Azure Machine Learning Studio platform is a powerful cloud service from Microsoft that allows data scientists to rapidly build and deploy machine learning experiments. For the purpose of brevity, we will leverage an existing template from the Azure AI Gallery. The Azure AI Gallery is a great resource for creating and learning about Machine Learning experiments in the Microsoft platform.

Weehyong Tok from Microsoft created an experiment that segments customers based on the dataset Wholesale customers Data Set from UCI Machine Learning Repository which is perfect for our purposes.

You can find the experiment here https://gallery.azure.ai/Experiment/Customer-Segmentation-of-Wholesale-Customers-3 .

Open the experiment in the Azure Machine Learning Studio by clicking on Open in Studio. Be sure to log in using the same account that you used to build the Power App.

This will launch the Azure Machine Learning Studio platform and create an experiment for you based on Weehyong Tok template. You may notice that the experiment has to be updated, click ok.

This is because the Assign to Clusters module has been deprecated and replaced by a new module called Assign Data to Clusters. Thankfully the upgrade takes care of the necessary changes and we can use the experiment as is with out having to modify it.

Click the Run button at the bottom of the page.

Once the experiment has finished running click on the output of the Assign to Cluster module and select Visualize from the drop down menu.

As you can see in the image the data is grouped into clusters.

This experiment uses the K-Means clustering algorithm to assign the data points to groups. As you can see in the image below it currently uses 2 centroids which essentially means that each row will be assigned to 1 of 2 groups based on the distance of the data points in the row to the centroid.

Modify the experiment to determine the optimum number of centroids

Now you may wonder if this is the optimal number of clusters or not. Thankfully we can use an elbow chart to help determine the optimal number of centroids. To do this we will add a Python module to drop some code into our experiment.

Search for the Execute Python Script and drop it onto the canvas of the experiment. Connect the first output (the one on the left) of the Split Data module to the first input of the Execute Python Script module. Your experiment should look as follows.

Now you will need to add the following code to the Execute Python Script module. Replace the generated with the code below.

Python Code

# The script MUST contain a function named azureml_main
# which is the entry point for this module.

# imports up here can be used to 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, sqrt, atan2
from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist

# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

# Execution logic goes here
#print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1)) #We don't need this, we just want the visual.

colors = ['b','g','r']
markers = ['o','v','s']

distortions = []
centroids = range(1, 10)
for i in centroids:
kmeanModel = KMeans(n_clusters=i).fit(dataframe1)
kmeanModel.fit(dataframe1)
distortions.append(sum(np.min(cdist(dataframe1, kmeanModel.cluster_centers_, 'euclidean'),axis=1))/dataframe1.shape[0])

plt.plot(centroids, distortions, 'bx-')
plt.xlabel('Number of centroids')
plt.ylabel('Distortions')
plt.title('Elbow chart showing the optimal number of centroids')
plt.show()

plt.savefig("elbow.png") #To see the chart in Azure Machine Learning Studio we need to save the image as a png.

# If a zip file is connected to the third input port is connected,
# it is unzipped under ".\Script Bundle". This directory is added
# to sys.path. Therefore, if your zip file contains a Python file
# mymodule.py you can import it using:
# import mymodule

# Return value must be of a sequence of pandas.DataFrame
return dataframe1,

Run the experiment and click on the second output, Python device (Dataset), of the Python Script module and select visualize. You should see something like the image below.

The optimal number of centroids is at the “elbow” of the chart above which looks to be about 5. Based on this insight we will update the algorithm and change the number of centroids to 5. We will also increase the number of iterations to 500 since we have more centroids.

Run the experiment and click on the output of the Assign to Cluster module and select Visualize from the drop down menu. The output should look like the image below.

Next, we will convert this experiment into a Predictive Web Service. At the bottom of the screen select Predictive Web Service > Predictive Web Service [Recommended]

Once the predictive experiment has been setup, we are going to modify it slightly so that it only returns the Assignment field. To do this we need to drop in the Select Columns in Dataset module and place it between the Assign to Clusters module and the Web service output.

Launch the column selector and enter in the Assignments column as the only value to get passed through to the web service output.

Run the experiment and Deploy Web Service.

This concludes the second part of this series. Next, we will build the API enabled dataset in Power BI which will store the data that we will use in the Power BI Reports and Dashboards. Since the dataset is API enabled we can push data into it using Flow.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

@Python Programming has a good site for understanding the Python code to plot an elbow chart.

https://pythonprogramminglanguage.com/kmeans-elbow-method/ 

 

Instant insights, automation and action – Part 1 Create Power App

In this series of blog posts, I will explain how you can integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data. The tutorial will be premised on analyzing whole sale customer purchases using the Wholesale customers Data Set from UCI Machine Learning Repository.

The conceptual architecture of the system is illustrated below.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Build the Power App

First we will build a very simple Power App. The app will allow users to enter in new purchase orders directly in a Power BI dashboard by leveraging the Power Apps custom Visual for Power BI.

Our app will have fields to capture the following data elements:

  • Customer Name
  • Channel
  • Region
  • Amount spent on FRESH produce
  • Amount spent on MILK produce
  • Amount spent on GROCERY produce
  • Amount spent on FROZEN product
  • Amount spent on DETERGENT and PAPER products
  • Amount spent on DELICASSEN products
  • Category number

The app will look as follows when complete.

Log onto Power Apps https://web.powerapps.com/home and select create new blank app. Select portrait layout.

This will open up a blank canvas. Your screen should look similar to the following image below.

Next we will add text input fields for each one of the data entry items listed above.

To do this navigate to Insert > Text > Text input.

Size the input field and enter in the appropriate name for the control, remove the default and add a text hint.

Repeat this for each data entry field.

When finished you should have a text input field for the following data elements:

  • NAME
  • CHANNEL
  • REGION
  • FRESH
  • MILK
  • GROCERY
  • FROZEN
  • DETERGENT
  • DELICASSEN
  • CATEGORY

Your app should now look like the following image.

Next we will add a button. To do this click on Insert > Button. Rename the button to SUBMIT and position it in the bottom right hand corner of the screen.

Your screen should now look as follows.

Save the app and give it an icon. I called mine the Customer Data Entry App.

This concludes the first part of this series. Next, we will build the Azure Machine Learning Studio experiment that we will use to categorize the customer if the customers category number has not been filled out in the app.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

@ChuckSterling has an excellent series of videos on embedding a Power App in a Power BI Dashboard.

https://www.youtube.com/watch?v=xKTPI2pEl9I

https://www.youtube.com/watch?v=dZb3vzp1WFE&list=WL&index=46&t=706s

@NathanPatrickTaylor also has a great video on integrating Power BI, Power Apps and Flow.

https://www.youtube.com/watch?v=au4a3AEIbKw&index=47&list=WL