Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 1 MongoDB

Data virtualization enables a single access point to data without the need to physically copy it from its source into a central repository and/or data management system. This is particularly useful when dealing with “big data” where the volume, variety, velocity or veracity of the data make it unwieldly to work with. Modern data storage ecosystems seamlessly integrate multiple technologies in order to deliver the best set of capabilities with the broadest reach into internal and external data sources. This also referred to as polyglot persistence and is premised on the idea of using different data storage technologies to handle different data storage needs for a “best of breed” solution.

In this series of articles I will show you how to use Polybase in SQL Server 2019 to access data stored in MongoDB. This will allow you to leverage the flexibility and power of MongoDB’s schema on read design to store JSON documents but surface them up in SQL Server as if they were tables inside of a standard relational database management system. This is ideal for situations in which you need to seamlessly integrate structured and unstructured data.

In this article we will focus on setting up the Ubuntu Server in Azure, installing MongoDB and adding some data to it.

Overview

The diagram below provides a conceptual overview of the architecture for this tutorial.

As you can see in the image above, we will use two virtual machines (VM) running in Azure. One VM will house MongDB and the other SQL Server 2019. It is possible to put SQL Server 2019 and MongoDB on the same box however, I setup the environment this way to approximate what a typical setup would be for most organizations.

Prerequisites

The following items are required to follow this blog post.

Spin up an Ubuntu Server in Azure

Microsoft has a pre-built VM in Azure that you can spin up with a couple of clicks. In Azure search for Ubuntu and select the Ubuntu Server 18.04 LTS. It should look similar to the screen clip below.

Fill in the instance details . I used a D2s v3 for my VM size.

For the Administrator Account section, you must use SSH public key.

To generate the SSH public key I used PuTTYgen. It comes with PuTTY. Move your mouse around to generate the key. Copy (below) and paste (above) the text that gets generated by the tool into Azure.

Once you have generated the key be sure to click Save private key because you will need this to connect to the VM and install MongoDB later on.

For the inbound port rules be sure to allow SSH (port 22) to connect to the VM.

For the Disks I just used Standard HDD.

I did not create any extensions.

Feel free to add tags at your own discretion, they are handy if you have a lot of resources to manage. Click create to spin up the VM. Next we will connect to the VM using PuTTY and install MongoDB.

Connect

We will use PuTTY to connect to the VM. Launch PuTTY and fill out the Host Name (or IP address) with the Public IP Address from Azure.

Next expand SSH and select Auth. Pick the file that contains the private key file which you saved when generating the RSA key using PuTTYgen.

Click open and enter the user ID that you created when filling out the Administrator Account section of the VM creation screen.

Install MongoDB

Once connected run the following commands.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5


echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list


sudo apt-get update


sudo apt-get install -y mongodb-org

This will install MongoDB and apply the latest updates to it. Next fire up the MongoDB service by running the following command.

sudo service mongod start

If everything went smoothly you should be able to open the MongoDB terminal by running the following command.

Mongo

This should give you a screen similar to the one below.

Create some data

With the database open enter the following command to generate some data into a collection called testData.

for (var i = 1; i <= 150; i++) {
 db.testData.insert( { x : i } )
}

Again, for the sake of conciseness I will not bother securing the database by creating user ids etc. I do recommend properly securing the database for any typical workload other than for demonstration purposes.

Type exit to leave Mongo DB. Next we will need to enable remote access to the MongoDB.

Enable remote connectivity to MongoDB

WARNING this is not recommended for a production environment and is only done for the sake of brevity in this blog post. To properly enable remote connectivity refer to the following security checklist from MongoDB.

MongoDB’s default IP binding prevents it from connecting outside of the local host. In order to change this, we need to modify the mongod.conf file.

Run the following command

Sudo nano /etc/mongod.conf

This will launch Nano which allow you to modify the configuration file as follows.

As you can see in the image above we need to add a new bindIp: 0.0.0.0 and commend out or delete the existing one. Be sure to exit and save the changes to the config file. It is also a good idea to restart MongoDB by running the following command.

sudo service mongod restart

Exit the connection to the Ubuntu server. We now need to add a port to the VM to allow for outside connections to it. To do this we need to go back to the VM in Azure.

Add inbound security rule

From the Azure portal navigate to the VM and click on Networking.

Select Add inbound port rule and fill out the screen as follows.

Once this is done, we can now test connecting to the MongoDB database using Studio 3T or any other tool.

Studio 3T

To connect to the MongoDB database use the public IP that was specified in Azure and also fill out the SSH Tunnel information as follows.

Notice that you need to use the same private key as before. Test the connection and you should see successfully results.

In the next post I will cover setting up SQL Server 2019.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

Further References

https://docs.microsoft.com/en-us/azure/cosmos-db/connect-mongodb-account

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/install-mongodb

https://tanzimsaqib.wordpress.com/2015/06/12/installing-and-securing-mongodb-on-windows-linux-on-azure/

http://timmyreilly.azurewebsites.net/running-mongo-on-ubuntu-virtual-machine-in-azure/

Instant insights, automation and action – Part 6 Integrate Power BI, Power Apps, Azure Machine Learning and Dynamics 365 using MS Flow

This is the last article in a 6-part series in which I will explain how you can integrate Power BI, Power Apps, Azure Machine and Dynamics 365 using MS Flow.

For reference here are the descriptions and links to the previous articles.

Instant insights, automation and action – Part 1 Create Power App

Instant insights, automation and action – Part 2 Create Azure Machine Learning Experiment

Instant insights, automation and action – Part 3 Create the Power BI Report

Instant insights, automation and action – Part 4 Register Power BI in Azure Active Directory

Instant insights, automation and action – Part 5 Integrate with MS Flow

In this article I will explain how you can kick off a MS Flow by adding an action to your Power App and then how you can integrate the Power App into a Power BI Dashboard. Data alerts can by tied to tiles in the Power BI Dashboard that can kick off additional flows which will insert records into Dynamics. The complete system is depicted in the diagram below.


Modify the Power APP

In Part 1 of this series we created a simple app that allowed a user to enter new sales data. We now need to go back to this app and modify it. Navigate to Power Apps and edit the app


Once the app is open click on the submit button to select it and then from the Action menu at the top select Flows.


This will open up a new pane in which you can select the flow that we created in Part 5 of this series. Once you have selected the flow enter the following code into the formula expression bar.

PowerApptoAzureMLtoPowerBIbkp.Run(NAME.Text, CHANNEL.Text, REGION.Text, FRESH.Text, MILK.Text, GROCERY.Text, FROZEN.Text, DETERGENT.Text, DELICASSEN.Text,CATEGORY.Text)


This will execute the flow and pass the data values from each of the text input boxes into the flow. You can test the flow by clicking on the play button in the top right-hand corner of the screen.

Save the report and publish it so that the new version with the flow attached to the submit button is available to integrate into Power BI.

Modify the Power BI Report

Next, we will need to modify the Power BI report to drop in a PowerApps visual. Open the Power BI report that we created in Part 3 and add a new custom visual from the marketplace. We need to add the Power App custom visual to the report.


Once the new visual has been successfully added we will add it to a new page in the report. In the Power BI report create a new page and call it Data Entry. We are doing this to keep the report clean and simple. We will integrate various visuals including the Power App in a Power BI Dashboard once we have finished putting the necessary polish in the report.

Drop the new visual onto the canvas of the new page in the report and add any field from the list of fields in the dataset, I used customer name. You should see a screen like the image below.


We are not creating or editing an app since we already built it in Part 1. Click ok and then select Choose app. Select the app we created for entering new whole customer sales data.


Click Add. You may see another warning about creating or editing the app, just ignore this by clicking ok.


New report page should now look like the image below.


Rename Page 1 and call it Wholesale Customer Report. You can spruce up the first page to make it look more appealing. I modified my report to make it look like this.


Once you are happy with the design of the report you need to publish it to Power BI. You can replace the existing report that we created in Part 3. Once the report has been published navigate to the cloud service and go the report that you just published.

Build the Dashboard

It’s now time to build a dashboard. With the report open pin the following visuals to a new dashboard.


To pin a visual to a dashboard click on the visual and select the pin from the menu bar.


A menu like the one below will pop up. Give the new dashboard a name such as Wholesale customer dashboard.


Select pin to create and add the visual to the new dashboard. Repeat this for all of the card visuals in the report except instead of selecting New Dashboard select Existing dashboard and if not already selected pick the Wholesale customer dashboard that we just created.

Next, we will need to pin the Power App visual. Go to the Data Entry page and pin the Power App just like we did for the card visuals. If you are having trouble selecting the pin option you may need to edit the report to pin the visual.

Your dashboard should now look something like this.


Let’s rearrange the tiles and add some new visuals by using Q&A.

First add a new visual by typing the following questions in the Q&A bar at the top of the screen.

Fresh by customer sort by fresh

Pin the visual to the existing Wholesale customer dashboard.


Then place this at the bottom of the dashboard.

Repeat these steps using the following questions:

Milk by customer sort by milk

Grocery by customer sort by grocery

Frozen by customer sort by frozen

Detergent paper by customer sort by detergent paper

Delicassen by customer sort by delicassen

Your dashboard should now look similar to the image below.


Try adding a new customer by using the Power App embedded in the Power BI Dashboard. After you have entered data into each of the input boxes in the Power App hit the submit button and in about 5 seconds or less you should see the customer count go up and your new customer on the dashboard in real-time. Also try entering in a new customer but do not fill out the Category field blank. Notice how even though the field is blank it is still populated by the time it shows up in Power BI, that is because the Azure Machine Learning model is supplying this data.

Integrate with Dynamics 365

The last step is to add a data alert to one of the tiles which will create a record in Dynamics 365. Navigate to the dashboard if not already there and click the … in the top right hand corner of the Fresh tile.


Then select Manage alerts.


This will open a new menu on the right-hand side of the screen. From this screen click + Add alert rule. Create an alert that will fire once the Fresh goes above a certain value. In my case I used 60,000.


For the purposes of this tutorial an alert based on an absolute value is adequate however a better choice would be to create an alert on a relative value such as % change since you do not want to have to go in and modify the alert to increase its threshold every time you surpass it. Click Save and close.

Go back to Manage alerts for this tile (Fresh) and this time select Use Microsoft Flow to trigger additional actions.



This will launch MS Flow. Use the default template to create a new flow triggered from a Power BI alert.


Use the template and select the Alert for Fresh from the Alert id drop down menu. Next select add new step and search for Dynamics 365. Then select Create a new record Dynamics 365.

Your flow should now look like this.


Enter the details for the Dynamics 365 tenant and select the Entity that you want a record created in. For my purposes I created a new task to follow-up with the customer by using the tasks entity. Save the flow and test it out by entering in new sales data using the Power App embedded in the Power BI report. If you have wired up the flow correctly a new record should be created in Dynamics 365 once you have triggered the data alert in your Power BI dashboard.

We have now reached the end of this series hopefully you have realized that by combining Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 you can open up new possibilities which lead to insights, automation and action at the speed of business.

Until next time.

Anthony


Instant insights, automation and action – Part 5 Integrate with MS Flow

This is the fifth post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered how to create an API enabled dataset.

In this article I will cover how you can use MS Flow to create and automated workflow which will integrate the various components that we have built thus far as illustrated in the diagram below.


Prerequisites

Before you can complete this tutorial you will need to make sure that you have access to use MS Flow as well as registered Power BI with Azure AD so that you can push data into an API enabled data set. For more information on how to register Power BI you can refer to the previous article.

Create the Flow

Log onto MS Flow using the following URL https://us.flow.microsoft.com

Once you have logged into MS Flow click on My Flows and then select New > Create from blank using the drop down menu.

This will land you on a second screen in which you will need to click Create from blank once more.

After clicking create from blank once more you should see a screen like the one below.

The first thing we need to add is a Power Apps Trigger. To do this search for the word PowerApps and select the first result back from the search.

Add Power BI action

Next we will add the an action to our flow. Click on + New Step after the PowerApp connector and search for Power BI. Then select Add rows to dataset (preview)

This will open a new window in which you will select the workspace, dataset, and table name. The workspace will be called Customer Segmentation and corresponds to the Power BI Workspace you used to create the customer segmentation report. The dataset should be called WholeSaleCustomer and corresponds to the name of the API enabled dataset that we created in Power BI. The table name is the default table name for all API enabled datasets that are created in Power BI.

Next, we will use the dynamic content wizard to get the data from PowerApps into each of the appropriate columns in the API enabled dataset.

NOTE You must do this next step correctly and in the right order because as of the time of this writing there is no way to remove elements once they have been created.

Click on Add dynamic content and then in the pop up menu on the right select Power Apps See More.

This will open an additional option to Ask in PowerApps.

Select Ask in PowerApps and notice how flow automatically generates a field in the flow called Addrowstoadataset_CustomerName.

Complete the same steps for each column in the Power BI dataset. Be sure to select Ask in PowerApps for each new field. Once done your action should look like this.

This will automatically create parameters in the flow which will surface in the Power App once the Flow has been added to the Power App. As a sneak peek to what I mean here is a screen clip of the step to call the flow from inside PowerApps, we will do this step later in this tutorial series.

Add Azure Machine Learning action

Now we will add a new action between the PowerApp connector and the Power BI one to call the Azure Machine Learning API. To do this we will use the HTTP event. As of the time of this writing there is no OOTB connector or action to Azure Machine Learning Studio.

After you have added the HTTP event fill out the details as follows.

You can get the URI from the Azure Machine Learning Experiment > Request/Response page. Also, be sure to include the Content-Type and Authorization information. The authorization information needs to include the keyword bearer as well as the API Key which you can get from the Azure Machine Learning Experiment page.

Next, we will add a dynamic body as follows.

This will take the data that comes out of Power Apps and pass it to the Azure Machine Learning API for scoring. The machine learning model responds with the grouping the customer belongs to by using a clustering alogrithm. The Azure Machine Learning service responds with a JSON document that needs to be parsed in order to get the appropriate information.

Parse the JSON response

After the HTTP action we will add a Parse JSON action and take the Body of the HTTP response as input. We will also use the following Schema.

{
	"type": "object",
	"properties": {
		"Results": {
			"type": "object",
			"properties": {
				"output1": {
					"type": "object",
					"properties": {
						"type": {
							"type": "string"
						},
						"value": {
							"type": "object",
							"properties": {
								"ColumnNames": {
									"type": "array",
									"items": {
										"type": "string"
									}
								},
								"ColumnTypes": {
									"type": "array",
									"items": {
										"type": "string"
									}
								},
								"Values": {
									"type": "array",
									"items": {
										"type": "array",
										"items": {
											"type": "string"
										}
									}
								}
							}
						}
					}
				}
			}
		}
	}
}

 

This schema can also be generated by dropping the sample payload generated by the Azure Machine Learning Service Request/Response document in the Sample Response section.

Your action should look like the image below.

parse json

You flow should now look as follows.

complete flow
Complete flow with all components

Using the HTTP event is adequate for this tutorial however a more robust solution would be to use Azure Functions.

You can grab the sample C# code generated in Azure Machine Learning Studio to jump start the development of the “server-less” function.

Add conditional logic to Category data

Last, we will add some conditional logic to the Category field in the Power BI event. Click on the Category field in the Power BI event and select Experssion.

Enter in the following code to the expression box.

if
(
      equals(triggerBody()['Addrowstoadataset_Category'],''),
      first(first(body('Parse_JSON')?['Results']?['output1']?['value']?['Values'])),
      triggerBody()['Addrowstoadataset_Category']
)

This will check to see if the Category field has been filled out in the Power App and use that value otherwise if the category field is null it will use the value from the Azure Machine Learning model.

Test the flow by clicking Test in the top right hand corner of the flow.

In the next post I will show you how you can integrate the flow into the Power App and then integrate the app into a Power BI Dashboard.

Hopefully you have found this to be another practical post.

Until next time

Anthony

Instant insights, automation and action – Part 4 Register Power BI in Azure Active Directory

This is the fourth post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power BI Report.

In this article I will cover how to enable data to be pushed into Power BI use Flow. This is a fast no code solution.


This is a one-time setup that is required in order to use the Power BI connector in MS Flow. If you do not do this step you will see an error screen in MS Flow like the screen clip below.


Prerequisites

In order to complete this tutorial, you will need permission to register applications in your Azure Active Directory tenant.


For more information on the Azure AD Tenant you can click the following link.

https://docs.microsoft.com/en-us/power-bi/developer/create-an-azure-active-directory-tenant

Power BI Development Center

Log onto the Power BI Development Center and enable API features and get the key to register the app in Azure.

Go to the following URL and sign in.

https://dev.powerbi.com/apps


Enter in a meaningful name for your app, I called mine AnthonysPowerBIApp but you can call yours whatever you would like. Choose Native for the Application Type and select Read all datasets and Read and write all datasets for the API Access


Click on Register. A screen like the one below should pop up. Be sure to copy down the Application ID as this is needed to register the application in Azure.


Azure Portal

Next log onto the azure portal using the following URL https://portal.azure.com/#home

Once in the portal admin page navigate to the Azure Active Directory menu blade


Next click on App registrations and select the app that we created using the Power BI Development Center.


You can change settings in the app if you whish to tailor it be clicking on Properties.

Now that the Power BI App has been registered in Azure Active Directory you can use it in various Microsoft cloud services such as Flow.


As you can see in the image above, I no longer get a permission error and I am able to select the workspace, dataset and table.


In the next post we will build out the flow so that data is passed from the Power App to an Azure Machine Learning experiment for scoring and then into the Power BI API Enabled Dataset for real-time analytics.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

Here is the official documentation from Microsoft on how to register Power BI to push data into it using REST API calls.

https://docs.microsoft.com/en-us/power-bi/developer/overview-of-power-bi-rest-api


Instant insights, automation and action – Part 3 Create the Power BI Report

This is the third post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power App. In this article I will cover the Power BI report.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Create the API Enabled Dataset

Log onto Power BI and create a new app workspace called Customer Segmentation. This step is not required however if you are like me you create a lot of different content so it’s a good habitat to get into so that you can better manage your work.


In case you are wondering the screen clip above is using the new App Workspace experience. Next, we will create a new streaming data set.

On the splash page for the app click Skip at the bottom right corner of the page.


Now select +Create > Streaming dataset.


Select API and click next.


Next create the WholeSaleCustomer dataset.

It will have the following field names and data types

Field Name Data Type
Customer Name Text
Channel Number
Region Number
Fresh Number
Milk Number
Grocery Number
Frozen Number
Detergents_Paper Number
Delicassen Number
Category Number


Click the Create button to generate the dataset.

Next, we will leverage the generated PowerShell script to create some test records in our newly formed dataset. Click on PowerShell and copy the code into Notepad.


We will create three test records by running the PowerShell code below. Modify the code you coped into Notepad so that it looks simlar to the code below. Before you can run this you will need to replace <Your Key> with the key displayed in your Power BI service.

$endpoint = "https://api.powerbi.com/beta/8c17d9d4-2652-4573-8a9c-d5dde0750715/datasets/13b74183-5eb2-480b-ba11-c0af0ecbdd26/rows?
key=<Your Key>

$payload = @{
"Customer Name" ="Test1"
"Channel" =1
"Region" =1
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =0
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))
$payload = @{
"Customer Name" ="Test2"
"Channel" =2
"Region" =2
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =1
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))
$payload = @{
"Customer Name" ="Test3"
"Channel" =3
"Region" =3
"Fresh" =98.6
"Milk" =98.6
"Grocery" =98.6
"Frozen" =98.6
"Detergents_Paper" =98.6
"Delicassen" =98.6
"Category" =2
}


Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))

To do this launch PowerShell in Administrator mode and copy and paste the code into the PowerShell desktop app.


The data set now has three records in it and you can start to use it in Power BI. To do this go to the dataset and click the three dots beside the name of the dataset. This will open a new report with a blank canvas. Add a table and drop all of the fields from the data set into the visual.


As you may notice from the screen shot above the fields Fresh, Milk, Grocery, Frozen, Detergents_Paper and Delicassen are not formatted as currency but should be. Unfortunately, API enabled data sets only have three data types Text, Number and Date and no formatting options so we cannot specify that these fields are currency fields.

Thankfully we can leverage the Report level measures for live connections to Analysis Services tabular models & Power BI service datasets feature that was released in May 2017 to add new measures with the proper currency data type defined.
Continue reading “Instant insights, automation and action – Part 3 Create the Power BI Report”

Instant insights, automation and action – Part 2 Create Azure Machine Learning Experiment

This is the second post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power App. In this article I will cover the Azure Machine Learning Studio Experiment.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Build the Azure Machine Learning Experiment

The Azure Machine Learning Studio platform is a powerful cloud service from Microsoft that allows data scientists to rapidly build and deploy machine learning experiments. For the purpose of brevity, we will leverage an existing template from the Azure AI Gallery. The Azure AI Gallery is a great resource for creating and learning about Machine Learning experiments in the Microsoft platform.

Weehyong Tok from Microsoft created an experiment that segments customers based on the dataset Wholesale customers Data Set from UCI Machine Learning Repository which is perfect for our purposes.

You can find the experiment here https://gallery.azure.ai/Experiment/Customer-Segmentation-of-Wholesale-Customers-3 .

Open the experiment in the Azure Machine Learning Studio by clicking on Open in Studio. Be sure to log in using the same account that you used to build the Power App.

This will launch the Azure Machine Learning Studio platform and create an experiment for you based on Weehyong Tok template. You may notice that the experiment has to be updated, click ok.

This is because the Assign to Clusters module has been deprecated and replaced by a new module called Assign Data to Clusters. Thankfully the upgrade takes care of the necessary changes and we can use the experiment as is with out having to modify it.

Click the Run button at the bottom of the page.

Once the experiment has finished running click on the output of the Assign to Cluster module and select Visualize from the drop down menu.

As you can see in the image the data is grouped into clusters.

This experiment uses the K-Means clustering algorithm to assign the data points to groups. As you can see in the image below it currently uses 2 centroids which essentially means that each row will be assigned to 1 of 2 groups based on the distance of the data points in the row to the centroid.

Modify the experiment to determine the optimum number of centroids

Now you may wonder if this is the optimal number of clusters or not. Thankfully we can use an elbow chart to help determine the optimal number of centroids. To do this we will add a Python module to drop some code into our experiment.

Search for the Execute Python Script and drop it onto the canvas of the experiment. Connect the first output (the one on the left) of the Split Data module to the first input of the Execute Python Script module. Your experiment should look as follows.

Now you will need to add the following code to the Execute Python Script module. Replace the generated with the code below.

Python Code

# The script MUST contain a function named azureml_main
# which is the entry point for this module.

# imports up here can be used to 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, sqrt, atan2
from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist

# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

# Execution logic goes here
#print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1)) #We don't need this, we just want the visual.

colors = ['b','g','r']
markers = ['o','v','s']

distortions = []
centroids = range(1, 10)
for i in centroids:
kmeanModel = KMeans(n_clusters=i).fit(dataframe1)
kmeanModel.fit(dataframe1)
distortions.append(sum(np.min(cdist(dataframe1, kmeanModel.cluster_centers_, 'euclidean'),axis=1))/dataframe1.shape[0])

plt.plot(centroids, distortions, 'bx-')
plt.xlabel('Number of centroids')
plt.ylabel('Distortions')
plt.title('Elbow chart showing the optimal number of centroids')
plt.show()

plt.savefig("elbow.png") #To see the chart in Azure Machine Learning Studio we need to save the image as a png.

# If a zip file is connected to the third input port is connected,
# it is unzipped under ".\Script Bundle". This directory is added
# to sys.path. Therefore, if your zip file contains a Python file
# mymodule.py you can import it using:
# import mymodule

# Return value must be of a sequence of pandas.DataFrame
return dataframe1,

Run the experiment and click on the second output, Python device (Dataset), of the Python Script module and select visualize. You should see something like the image below.

The optimal number of centroids is at the “elbow” of the chart above which looks to be about 5. Based on this insight we will update the algorithm and change the number of centroids to 5. We will also increase the number of iterations to 500 since we have more centroids.

Run the experiment and click on the output of the Assign to Cluster module and select Visualize from the drop down menu. The output should look like the image below.

Next, we will convert this experiment into a Predictive Web Service. At the bottom of the screen select Predictive Web Service > Predictive Web Service [Recommended]

Once the predictive experiment has been setup, we are going to modify it slightly so that it only returns the Assignment field. To do this we need to drop in the Select Columns in Dataset module and place it between the Assign to Clusters module and the Web service output.

Launch the column selector and enter in the Assignments column as the only value to get passed through to the web service output.

Run the experiment and Deploy Web Service.

This concludes the second part of this series. Next, we will build the API enabled dataset in Power BI which will store the data that we will use in the Power BI Reports and Dashboards. Since the dataset is API enabled we can push data into it using Flow.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

@Python Programming has a good site for understanding the Python code to plot an elbow chart.

https://pythonprogramminglanguage.com/kmeans-elbow-method/ 

 

Transforming data into value one blog post at a time

 

 

Thanks for joining me!

My name is Anthony Bulk and I am passionate about making data useful. Whether it’s through data storytelling, business intelligence, artificial intelligence or data storage I will cover it all.

Come join me on this journey down data alley!

Here are the topics that have covered so far…

Last updated March 15, 2019.

Instant insight, automation and action using Power Apps, Power BI, Flow and Azure Machine Learning

Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB


“Without data you’re just another person with an opinion.”

W. Edwards Deming

 “If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and only five minutes finding the solution.”

Albert Einstein

Instant insights, automation and action – Part 1 Create Power App

In this series of blog posts, I will explain how you can integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data. The tutorial will be premised on analyzing whole sale customer purchases using the Wholesale customers Data Set from UCI Machine Learning Repository.

The conceptual architecture of the system is illustrated below.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Build the Power App

First we will build a very simple Power App. The app will allow users to enter in new purchase orders directly in a Power BI dashboard by leveraging the Power Apps custom Visual for Power BI.

Our app will have fields to capture the following data elements:

  • Customer Name
  • Channel
  • Region
  • Amount spent on FRESH produce
  • Amount spent on MILK produce
  • Amount spent on GROCERY produce
  • Amount spent on FROZEN product
  • Amount spent on DETERGENT and PAPER products
  • Amount spent on DELICASSEN products
  • Category number

The app will look as follows when complete.

Log onto Power Apps https://web.powerapps.com/home and select create new blank app. Select portrait layout.

This will open up a blank canvas. Your screen should look similar to the following image below.

Next we will add text input fields for each one of the data entry items listed above.

To do this navigate to Insert > Text > Text input.

Size the input field and enter in the appropriate name for the control, remove the default and add a text hint.

Repeat this for each data entry field.

When finished you should have a text input field for the following data elements:

  • NAME
  • CHANNEL
  • REGION
  • FRESH
  • MILK
  • GROCERY
  • FROZEN
  • DETERGENT
  • DELICASSEN
  • CATEGORY

Your app should now look like the following image.

Next we will add a button. To do this click on Insert > Button. Rename the button to SUBMIT and position it in the bottom right hand corner of the screen.

Your screen should now look as follows.

Save the app and give it an icon. I called mine the Customer Data Entry App.

This concludes the first part of this series. Next, we will build the Azure Machine Learning Studio experiment that we will use to categorize the customer if the customers category number has not been filled out in the app.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

@ChuckSterling has an excellent series of videos on embedding a Power App in a Power BI Dashboard.

https://www.youtube.com/watch?v=xKTPI2pEl9I

https://www.youtube.com/watch?v=dZb3vzp1WFE&list=WL&index=46&t=706s

@NathanPatrickTaylor also has a great video on integrating Power BI, Power Apps and Flow.

https://www.youtube.com/watch?v=au4a3AEIbKw&index=47&list=WL