Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 3 Setup Polybase

This is the third post in a series in which I explain how to use Polybase to connect SQL Server to Mongo DB in order to have a modern data ecosystem that caters to both structured and unstructured data. By leveraging Polybase you can centralize access without moving data and deliver a beast of breed solution for all of your data needs.

In my previous post I walked through how to spin up a SQL Server 2019 Azure VM and install Polybase. In this post I will show you how to setup Polybase to access a collection in MongoDB. This is the glue between MongoDB and SQL Server 2019. Conceptually this is the arrow in the diagram below.

First, I will show you how to setup an unsecure connection and then I will show you how to secure it with basic authentication.

Verify that MongoDB is up and running

Confirm that Mongo DB is up and running. As specified in my prior article you can use Studio 3T to connect or you can tunnel directly in using SSH and connect to MongoDB through PuTTY. If you shutdown the VM you will need to restart MongoDB using the command.

sudo service mongod restart

To run this command you will need to use PuTTY to tunnel in. Details on how to do this can be found in this article.

Configure polybase

Connect to the SQL Server instance. You can RDP into the VM or connect in from your local workstation. I connected in directly from my laptop using Azure Data Studio a nice light weight database tool.

Enable Polybase

Once connected you will need to turn on Polybase. Execute the following command in the MASTER database.

USE MASTER
exec sp_configure @configname = 'polybase enabled', @configvalue =1;
GO

RECONFIGURE WITH OVERRIDE;

Create Database

Next you will need to create a database. Execute the following SQL command to create a database called TEST.

CREATE
DATABASE TEST
GO

USE TEST
GO

The following commands all need to be executed in the new Test database that was just created with the command above.

Create External Data Source

Now you will need to create an external data source. I called mine MongoDBSource.

CREATE EXTERNAL DATA SOURCE MongoDBSource
WITH (
LOCATION='mongodb://40.85.255.249:27017',
CONNECTION_OPTIONS ='ssl=false;'
);
GO

Make sure that you modify the LOCATION parameter so that you specify the IP or name of the server that is running MongoDB.

Also you will need to add the CONNECTION_OPTIONS = ‘ssl=false;’ otherwise you will see the following error when you try to create the External Table.

Msg 105082, Level 16, State 1, Line 22

105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on ‘40.85.255.249:27017’] (Error Code: 13053) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on ‘40.85.255.249:27017’] (Error Code: 13053), SqlState: HY000, NativeError: 110 .

Create External Table

The last thing we need to do is create an external table.

CREATE EXTERNAL TABLE Test(
[_id] NVARCHAR(24),
[x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

Common errors

If you try to implicitly convert the [x] FLOAT(53) column to an int you will see the following error.

Msg 105083, Level 16, State 1, Line 24

105083;The following columns in the user defined schema are incompatible with the external table schema for table ‘Test2’: user defined column type: ([x] INT) vs. detected external table column type: ([x] FLOAT(53)). The detected external table schema is: ([_id] NVARCHAR(24) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [x] FLOAT(53))

This is because SQL Server requires a schema on read with strong type casting.

In addition, if you try to create the table without all of the “columns” that reside in the collection by using the following SQL command.

CREATE
EXTERNAL
TABLE Test2(
 [x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

You will get the following error.

Msg 105075, Level 16, State 1, Line 24

105075;The user specified schema column count is 1, which varies from the external table schema column count of 2. The detected external table schema is: ([_id] NVARCHAR(24) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [x] FLOAT(53))

Again, this is because the current version of PolyBase in SQL Server requires a table structure that matches the collection structure in MongoDB.

Once the table is created you can query it just as you would any other table in SQL Server.

Select *
from [dbo].[Test]

Your results should look similar to the image below.

By default the PUSHDOWN predicate is applied so that the MongoDB is used for query execution allowing for optimum performance by distributing the compute to the best location/technology for the job.

Next let’s add some basic authentication so that our connection is a bit more secure.

Securing the connection with basic Authentication

Using PuTTY log back in the MongoDB VM. Once connected to the VM log into MongoDB by typing in the following command.

mongo

Once in the MongoDB shell execute the following command.

use test
db.createUser(
{
user: “mytestuser”,
pwd: “mytestuser”,
roles: [ { role: “readWrite”, db: “test” } ]
}
)

You should see the following response if the command is successful.

Successfully added user:
{
 "user" : "mytestuser",
 "pwd" : "mytestuser",
 "roles" : [
 {
 "role" : "readWrite",
 "db" : " test "
 }
 ]
}

This creates a user called mytestuser with the password mytestuser. You should probably use a more secure id and password but this is sufficient for the purposes of this article.

Enable authentication in MongoDB

Exit out of Mongo and run the following command.

sudo nano /etc/mongod.conf

Modify the configuration file by adding the tag authorization: enabled under the security section.

Exit and save the config file and then restart mongo.

sudo service mongod restart

Modify SQL Server

Now that we have made the necessary changes in MongoDB we need to add authentication information in SQL Server.

Log into SQL Server and create a master key. Execute the following SQL command to create a master key.

USE TEST

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password123!'
GO

Next create a database scoped credential.

CREATE DATABASE SCOPED CREDENTIAL MongoDB
WITH IDENTITY ='mytestuser', SECRET = 'mytestuser';

GO

As you can see in the command above the IDENTITY and SECRET must match the user and pwd in the MongoDB command we used to create the user.

Last, we will drop and recreate the external data source and table using the following commands.

DROP EXTERNAL DATA SOURCE MongoDBSource

CREATE EXTERNAL DATA SOURCE MongoDBSource
WITH (
  LOCATION = 'mongodb://40.85.255.249:27017', 
  CREDENTIAL = MongoDB,
  CONNECTION_OPTIONS = 'ssl=false;'
);

GO

--Notice how the CREDENTIAL argument has been added to the create command above.

DROP EXTERNAL TABLE Test
CREATE EXTERNAL TABLE Test(
[_id] NVARCHAR(24),
[x] FLOAT(53)
)
WITH (
LOCATION='test.testData',
DATA_SOURCE= MongoDBSource
);

Now you are using an account to access the database in MongoDB. The next step would be to enable SSL in MongoDB so that the CONNECTION_OPTIONS ‘ssl=false;’ can be removed and a secure connection established between SQL Server and MongoDB. The details on how to do it is beyond the scope of this article for instructions on how to enable SSL in MongoDB you can refer to the following guide.

https://docs.mongodb.com/manual/tutorial/configure-ssl/

The setup that I have documented in this series can be done on premise or in the cloud using VMs as I have done.

Hopefully you have found this to be another practical post.

Until next time.

Anthony