Instant insights, automation and action – Part 2 Create Azure Machine Learning Experiment

This is the second post in a series of articles in which I explain how to integrate Power BI, Power Apps, Flow, Azure Machine Learning and Dynamics 365 to rapidly build a functioning system which allows users to analyze, insert, automate and action data.

In the previous article I covered building the Power App. In this article I will cover the Azure Machine Learning Studio Experiment.

We will build out this system in the following order; Power App, Azure Machine Learning, Power BI and then last MS Flow to connect the components. Before you can begin this tutorial there are some required prerequisites.

Prerequisites

  • Power Apps
  • MS Flow
  • Power BI Pro or Premium
  • Access to Azure Active Directory to register Power BI App
  • Dynamics 365

Build the Azure Machine Learning Experiment

The Azure Machine Learning Studio platform is a powerful cloud service from Microsoft that allows data scientists to rapidly build and deploy machine learning experiments. For the purpose of brevity, we will leverage an existing template from the Azure AI Gallery. The Azure AI Gallery is a great resource for creating and learning about Machine Learning experiments in the Microsoft platform.

Weehyong Tok from Microsoft created an experiment that segments customers based on the dataset Wholesale customers Data Set from UCI Machine Learning Repository which is perfect for our purposes.

You can find the experiment here https://gallery.azure.ai/Experiment/Customer-Segmentation-of-Wholesale-Customers-3 .

Open the experiment in the Azure Machine Learning Studio by clicking on Open in Studio. Be sure to log in using the same account that you used to build the Power App.

This will launch the Azure Machine Learning Studio platform and create an experiment for you based on Weehyong Tok template. You may notice that the experiment has to be updated, click ok.

This is because the Assign to Clusters module has been deprecated and replaced by a new module called Assign Data to Clusters. Thankfully the upgrade takes care of the necessary changes and we can use the experiment as is with out having to modify it.

Click the Run button at the bottom of the page.

Once the experiment has finished running click on the output of the Assign to Cluster module and select Visualize from the drop down menu.

As you can see in the image the data is grouped into clusters.

This experiment uses the K-Means clustering algorithm to assign the data points to groups. As you can see in the image below it currently uses 2 centroids which essentially means that each row will be assigned to 1 of 2 groups based on the distance of the data points in the row to the centroid.

Modify the experiment to determine the optimum number of centroids

Now you may wonder if this is the optimal number of clusters or not. Thankfully we can use an elbow chart to help determine the optimal number of centroids. To do this we will add a Python module to drop some code into our experiment.

Search for the Execute Python Script and drop it onto the canvas of the experiment. Connect the first output (the one on the left) of the Split Data module to the first input of the Execute Python Script module. Your experiment should look as follows.

Now you will need to add the following code to the Execute Python Script module. Replace the generated with the code below.

Python Code

# The script MUST contain a function named azureml_main
# which is the entry point for this module.

# imports up here can be used to 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, sqrt, atan2
from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist

# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

# Execution logic goes here
#print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1)) #We don't need this, we just want the visual.

colors = ['b','g','r']
markers = ['o','v','s']

distortions = []
centroids = range(1, 10)
for i in centroids:
kmeanModel = KMeans(n_clusters=i).fit(dataframe1)
kmeanModel.fit(dataframe1)
distortions.append(sum(np.min(cdist(dataframe1, kmeanModel.cluster_centers_, 'euclidean'),axis=1))/dataframe1.shape[0])

plt.plot(centroids, distortions, 'bx-')
plt.xlabel('Number of centroids')
plt.ylabel('Distortions')
plt.title('Elbow chart showing the optimal number of centroids')
plt.show()

plt.savefig("elbow.png") #To see the chart in Azure Machine Learning Studio we need to save the image as a png.

# If a zip file is connected to the third input port is connected,
# it is unzipped under ".\Script Bundle". This directory is added
# to sys.path. Therefore, if your zip file contains a Python file
# mymodule.py you can import it using:
# import mymodule

# Return value must be of a sequence of pandas.DataFrame
return dataframe1,

Run the experiment and click on the second output, Python device (Dataset), of the Python Script module and select visualize. You should see something like the image below.

The optimal number of centroids is at the “elbow” of the chart above which looks to be about 5. Based on this insight we will update the algorithm and change the number of centroids to 5. We will also increase the number of iterations to 500 since we have more centroids.

Run the experiment and click on the output of the Assign to Cluster module and select Visualize from the drop down menu. The output should look like the image below.

Next, we will convert this experiment into a Predictive Web Service. At the bottom of the screen select Predictive Web Service > Predictive Web Service [Recommended]

Once the predictive experiment has been setup, we are going to modify it slightly so that it only returns the Assignment field. To do this we need to drop in the Select Columns in Dataset module and place it between the Assign to Clusters module and the Web service output.

Launch the column selector and enter in the Assignments column as the only value to get passed through to the web service output.

Run the experiment and Deploy Web Service.

This concludes the second part of this series. Next, we will build the API enabled dataset in Power BI which will store the data that we will use in the Power BI Reports and Dashboards. Since the dataset is API enabled we can push data into it using Flow.

Hopefully you have found this to be another practical post.

Until next time

Anthony

References

@Python Programming has a good site for understanding the Python code to plot an elbow chart.

https://pythonprogramminglanguage.com/kmeans-elbow-method/ 

 

2 thoughts on “Instant insights, automation and action – Part 2 Create Azure Machine Learning Experiment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s