Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 3 Setup Polybase

This is the third post in a series in which I explain how to use Polybase to connect SQL Server to Mongo DB in order to have a modern data ecosystem that caters to both structured and unstructured data. By leveraging Polybase you can centralize access without moving data and deliver a beast of breed solution for all of your data needs.

In my previous post I walked through how to spin up a SQL Server 2019 Azure VM and install Polybase. In this post I will show you how to setup Polybase to access a collection in MongoDB. This is the glue between MongoDB and SQL Server 2019. Conceptually this is the arrow in the diagram below.

First, I will show you how to setup an unsecure connection and then I will show you how to secure it with basic authentication.

Verify that MongoDB is up and running

Confirm that Mongo DB is up and running. As specified in my prior article you can use Studio 3T to connect or you can tunnel directly in using SSH and connect to MongoDB through PuTTY. If you shutdown the VM you will need to restart MongoDB using the command.

sudo service mongod restart

To run this command you will need to use PuTTY to tunnel in. Details on how to do this can be found in this article.

Configure polybase

Connect to the SQL Server instance. You can RDP into the VM or connect in from your local workstation. I connected in directly from my laptop using Azure Data Studio a nice light weight database tool.

Enable Polybase

Once connected you will need to turn on Polybase. Execute the following command in the MASTER database.

USE MASTER
exec sp_configure @configname = 'polybase enabled', @configvalue =1;
GO

RECONFIGURE WITH OVERRIDE;

Create Database

Next you will need to create a database. Execute the following SQL command to create a database called TEST.

CREATE
DATABASE TEST
GO

USE TEST
GO

The following commands all need to be executed in the new Test database that was just created with the command above.

Create External Data Source

Now you will need to create an external data source. I called mine MongoDBSource.

CREATE EXTERNAL DATA SOURCE MongoDBSource
WITH (
LOCATION='mongodb://40.85.255.249:27017',
CONNECTION_OPTIONS ='ssl=false;'
);
GO

Make sure that you modify the LOCATION parameter so that you specify the IP or name of the server that is running MongoDB.

Also you will need to add the CONNECTION_OPTIONS = ‘ssl=false;’ otherwise you will see the following error when you try to create the External Table.

Msg 105082, Level 16, State 1, Line 22

105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on ‘40.85.255.249:27017’] (Error Code: 13053) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on ‘40.85.255.249:27017’] (Error Code: 13053), SqlState: HY000, NativeError: 110 .

Create External Table

The last thing we need to do is create an external table.

CREATE EXTERNAL TABLE Test(
[_id] NVARCHAR(24),
[x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

Common errors

If you try to implicitly convert the [x] FLOAT(53) column to an int you will see the following error.

Msg 105083, Level 16, State 1, Line 24

105083;The following columns in the user defined schema are incompatible with the external table schema for table ‘Test2’: user defined column type: ([x] INT) vs. detected external table column type: ([x] FLOAT(53)). The detected external table schema is: ([_id] NVARCHAR(24) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [x] FLOAT(53))

This is because SQL Server requires a schema on read with strong type casting.

In addition, if you try to create the table without all of the “columns” that reside in the collection by using the following SQL command.

CREATE
EXTERNAL
TABLE Test2(
 [x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

You will get the following error.

Msg 105075, Level 16, State 1, Line 24

105075;The user specified schema column count is 1, which varies from the external table schema column count of 2. The detected external table schema is: ([_id] NVARCHAR(24) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [x] FLOAT(53))

Again, this is because the current version of PolyBase in SQL Server requires a table structure that matches the collection structure in MongoDB.

Once the table is created you can query it just as you would any other table in SQL Server.

Select *
from [dbo].[Test]

Your results should look similar to the image below.

By default the PUSHDOWN predicate is applied so that the MongoDB is used for query execution allowing for optimum performance by distributing the compute to the best location/technology for the job.

Next let’s add some basic authentication so that our connection is a bit more secure.

Securing the connection with basic Authentication

Using PuTTY log back in the MongoDB VM. Once connected to the VM log into MongoDB by typing in the following command.

mongo

Once in the MongoDB shell execute the following command.

use test
db.createUser(
{
user: “mytestuser”,
pwd: “mytestuser”,
roles: [ { role: “readWrite”, db: “test” } ]
}
)

You should see the following response if the command is successful.

Successfully added user:
{
 "user" : "mytestuser",
 "pwd" : "mytestuser",
 "roles" : [
 {
 "role" : "readWrite",
 "db" : " test "
 }
 ]
}

This creates a user called mytestuser with the password mytestuser. You should probably use a more secure id and password but this is sufficient for the purposes of this article.

Enable authentication in MongoDB

Exit out of Mongo and run the following command.

sudo nano /etc/mongod.conf

Modify the configuration file by adding the tag authorization: enabled under the security section.

Exit and save the config file and then restart mongo.

sudo service mongod restart

Modify SQL Server

Now that we have made the necessary changes in MongoDB we need to add authentication information in SQL Server.

Log into SQL Server and create a master key. Execute the following SQL command to create a master key.

USE TEST

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password123!'
GO

Next create a database scoped credential.

CREATE DATABASE SCOPED CREDENTIAL MongoDB
WITH IDENTITY ='mytestuser', SECRET = 'mytestuser';

GO

As you can see in the command above the IDENTITY and SECRET must match the user and pwd in the MongoDB command we used to create the user.

Last, we will drop and recreate the external data source and table using the following commands.

DROP EXTERNAL DATA SOURCE MongoDBSource

CREATE EXTERNAL DATA SOURCE MongoDBSource
WITH (
  LOCATION = 'mongodb://40.85.255.249:27017', 
  CREDENTIAL = MongoDB,
  CONNECTION_OPTIONS = 'ssl=false;'
);

GO

--Notice how the CREDENTIAL argument has been added to the create command above.

DROP EXTERNAL TABLE Test
CREATE EXTERNAL TABLE Test(
[_id] NVARCHAR(24),
[x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

Now you are using an account to access the database in MongoDB. The next step would be to enable SSL in MongoDB so that the CONNECTION_OPTIONS ‘ssl=false;’ can be removed and a secure connection established between SQL Server and MongoDB. The details on how to do it is beyond the scope of this article for instructions on how to enable SSL in MongoDB you can refer to the following guide.

https://docs.mongodb.com/manual/tutorial/configure-ssl/

The setup that I have documented in this series can be done on premise or in the cloud using VMs as I have done.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 2 SQL Server 2019

In this series of blog posts, I will explain how you can connect MongoDB to SQL Server 2019 using Polybase so that you have the benefit of both a schemaless and relational database technologies integrated and working together to form a modern data ecosystem that can handle both traditional and “big data”.

In my previous post I explained how to install and configure Mongo DB in an Azure VM running Linux. In this post I will walk you through the process of setting up SQL Server 2019, which is the area on the right of the diagram below.

Spin up Azure VM with SQL Server 2019

Microsoft has a pre-built VM with the latest release of SQL Server 2019 ( at the time of this writing it is CTP2.3) which makes it really quick and easy to setup. Simply navigate to your Azure portal and search for SQL Server 2019. You should see a Free SQL Server : (CTP2.3) SQL 2019 Developer option, once you select it you should see the following.

Click create and fill out the subsequent screens as follows.

Step 1 Basics

NOTE Be sure to remember the Username and Password because it will be required later when we connect to the VM and install Polybase.

Step 2 Size

I went for a DS2_v2 but you are free to pick a size that suites your needs. NOTE Microsoft recommends a DS2 or higher for development and functional testing.

Step 3 Settings

You will need to open a public inbound port (3389 RDP) so that you can remotely connect to it and install Polybase.

Step 4 SQL Server settings

This last step is optional. You can enable external connections directly into the SQL Server database which is handy if you want to connect to the database without having to log onto the VM. Once the VM is created we will need to log into it to install Polybase.

Install Polybase

Unfortunately, the pre-built VM does not have Polybase installed on it so you will need to log onto the VM and install. To connect to the VM go the resource in the Azure Portal and select Connect. You should see a screen like this.

Download the RDP file and enter the credentials you used when you first created the VM.

Once you have logged onto the VM you will need to navigate to the SQL Server 2019 installation software. You can find it here C:\SQLServerFull. Double click on setup.

In the SQL Server Installation Center menu select New SQL Server stand-alone installation or add features to an existing installation.

For the Installation Type select Add features to an existing instance of SQL Server 2019 CTP2.3

On the Feature Selection screen select PolyBase Query Service for External Data.

On the PolyBase Configuration screen select the first option. A PolyBase scale-out group is ideal for scenarios in which you have multiple external data sources that you want to connect to and you need to optimize performance.

We will use the defaults for Server Configuration.

Review the summary and click the install button. If all goes well you should see the following screen when complete.

In the next post I will explain how to configure Polybase to connect it to your MongDB database installed on separate VM. This setup allows you to extend your reach without overstretching by letting the data stay where it is but still making it available for integration and analytics with line of business application data.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 1 MongoDB

Featured

Data virtualization enables a single access point to data without the need to physically copy it from its source into a central repository and/or data management system. This is particularly useful when dealing with “big data” where the volume, variety, velocity or veracity of the data make it unwieldly to work with. Modern data storage ecosystems seamlessly integrate multiple technologies in order to deliver the best set of capabilities with the broadest reach into internal and external data sources. This also referred to as polyglot persistence and is premised on the idea of using different data storage technologies to handle different data storage needs for a “best of breed” solution.

In this series of articles I will show you how to use Polybase in SQL Server 2019 to access data stored in MongoDB. This will allow you to leverage the flexibility and power of MongoDB’s schema on read design to store JSON documents but surface them up in SQL Server as if they were tables inside of a standard relational database management system. This is ideal for situations in which you need to seamlessly integrate structured and unstructured data.

In this article we will focus on setting up the Ubuntu Server in Azure, installing MongoDB and adding some data to it.

Overview

The diagram below provides a conceptual overview of the architecture for this tutorial.

As you can see in the image above, we will use two virtual machines (VM) running in Azure. One VM will house MongDB and the other SQL Server 2019. It is possible to put SQL Server 2019 and MongoDB on the same box however, I setup the environment this way to approximate what a typical setup would be for most organizations.

Prerequisites

The following items are required to follow this blog post.

Spin up an Ubuntu Server in Azure

Microsoft has a pre-built VM in Azure that you can spin up with a couple of clicks. In Azure search for Ubuntu and select the Ubuntu Server 18.04 LTS. It should look similar to the screen clip below.

Fill in the instance details . I used a D2s v3 for my VM size.

For the Administrator Account section, you must use SSH public key.

To generate the SSH public key I used PuTTYgen. It comes with PuTTY. Move your mouse around to generate the key. Copy (below) and paste (above) the text that gets generated by the tool into Azure.

Once you have generated the key be sure to click Save private key because you will need this to connect to the VM and install MongoDB later on.

For the inbound port rules be sure to allow SSH (port 22) to connect to the VM.

For the Disks I just used Standard HDD.

I did not create any extensions.

Feel free to add tags at your own discretion, they are handy if you have a lot of resources to manage. Click create to spin up the VM. Next we will connect to the VM using PuTTY and install MongoDB.

Connect

We will use PuTTY to connect to the VM. Launch PuTTY and fill out the Host Name (or IP address) with the Public IP Address from Azure.

Next expand SSH and select Auth. Pick the file that contains the private key file which you saved when generating the RSA key using PuTTYgen.

Click open and enter the user ID that you created when filling out the Administrator Account section of the VM creation screen.

Install MongoDB

Once connected run the following commands.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5


echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list


sudo apt-get update


sudo apt-get install -y mongodb-org

This will install MongoDB and apply the latest updates to it. Next fire up the MongoDB service by running the following command.

sudo service mongod start

If everything went smoothly you should be able to open the MongoDB terminal by running the following command.

Mongo

This should give you a screen similar to the one below.

Create some data

With the database open enter the following command to generate some data into a collection called testData.

for (var i = 1; i <= 150; i++) {
 db.testData.insert( { x : i } )
}

Again, for the sake of conciseness I will not bother securing the database by creating user ids etc. I do recommend properly securing the database for any typical workload other than for demonstration purposes.

Type exit to leave Mongo DB. Next we will need to enable remote access to the MongoDB.

Enable remote connectivity to MongoDB

WARNING this is not recommended for a production environment and is only done for the sake of brevity in this blog post. To properly enable remote connectivity refer to the following security checklist from MongoDB.

MongoDB’s default IP binding prevents it from connecting outside of the local host. In order to change this, we need to modify the mongod.conf file.

Run the following command

Sudo nano /etc/mongod.conf

This will launch Nano which allow you to modify the configuration file as follows.

As you can see in the image above we need to add a new bindIp: 0.0.0.0 and commend out or delete the existing one. Be sure to exit and save the changes to the config file. It is also a good idea to restart MongoDB by running the following command.

sudo service mongod restart

Exit the connection to the Ubuntu server. We now need to add a port to the VM to allow for outside connections to it. To do this we need to go back to the VM in Azure.

Add inbound security rule

From the Azure portal navigate to the VM and click on Networking.

Select Add inbound port rule and fill out the screen as follows.

Once this is done, we can now test connecting to the MongoDB database using Studio 3T or any other tool.

Studio 3T

To connect to the MongoDB database use the public IP that was specified in Azure and also fill out the SSH Tunnel information as follows.

Notice that you need to use the same private key as before. Test the connection and you should see successfully results.

In the next post I will cover setting up SQL Server 2019.

Hopefully you have found this to be another practical post.

Until next time.

Anthony

Further References

https://docs.microsoft.com/en-us/azure/cosmos-db/connect-mongodb-account

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/install-mongodb

https://tanzimsaqib.wordpress.com/2015/06/12/installing-and-securing-mongodb-on-windows-linux-on-azure/

http://timmyreilly.azurewebsites.net/running-mongo-on-ubuntu-virtual-machine-in-azure/