Data virtualization enables a single access point to data without the need to physically copy it from its source into a central repository and/or data management system. This is particularly useful when dealing with “big data” where the volume, variety, velocity or veracity of the data make it unwieldly to work with. Modern data storage ecosystems seamlessly integrate multiple technologies in order to deliver the best set of capabilities with the broadest reach into internal and external data sources. This also referred to as polyglot persistence and is premised on the idea of using different data storage technologies to handle different data storage needs for a “best of breed” solution.
In this series of articles I will show you how to use Polybase in SQL Server 2019 to access data stored in MongoDB. This will allow you to leverage the flexibility and power of MongoDB’s schema on read design to store JSON documents but surface them up in SQL Server as if they were tables inside of a standard relational database management system. This is ideal for situations in which you need to seamlessly integrate structured and unstructured data.
In this article we will focus on setting up the Ubuntu Server in Azure, installing MongoDB and adding some data to it.
Overview
The diagram below provides a conceptual overview of the architecture for this tutorial.
As you can see in the image above, we will use two virtual machines (VM) running in Azure. One VM will house MongDB and the other SQL Server 2019. It is possible to put SQL Server 2019 and MongoDB on the same box however, I setup the environment this way to approximate what a typical setup would be for most organizations.
Prerequisites
The following items are required to follow this blog post.
Spin up an Ubuntu Server in Azure
Microsoft has a pre-built VM in Azure that you can spin up with a couple of clicks. In Azure search for Ubuntu and select the Ubuntu Server 18.04 LTS. It should look similar to the screen clip below.
Fill in the instance details . I used a D2s v3 for my VM size.
For the Administrator Account section, you must use SSH public key.
To generate the SSH public key I used PuTTYgen. It comes with PuTTY. Move your mouse around to generate the key. Copy (below) and paste (above) the text that gets generated by the tool into Azure.
Once you have generated the key be sure to click Save private key because you will need this to connect to the VM and install MongoDB later on.
For the inbound port rules be sure to allow SSH (port 22) to connect to the VM.
For the Disks I just used Standard HDD.
I did not create any extensions.
Feel free to add tags at your own discretion, they are handy if you have a lot of resources to manage. Click create to spin up the VM. Next we will connect to the VM using PuTTY and install MongoDB.
Connect
We will use PuTTY to connect to the VM. Launch PuTTY and fill out the Host Name (or IP address) with the Public IP Address from Azure.
Next expand SSH and select Auth. Pick the file that contains the private key file which you saved when generating the RSA key using PuTTYgen.
Click open and enter the user ID that you created when filling out the Administrator Account section of the VM creation screen.
Install MongoDB
Once connected run the following commands.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5 echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list sudo apt-get update sudo apt-get install -y mongodb-org
This will install MongoDB and apply the latest updates to it. Next fire up the MongoDB service by running the following command.
sudo service mongod start
If everything went smoothly you should be able to open the MongoDB terminal by running the following command.
Mongo
This should give you a screen similar to the one below.
Create some data
With the database open enter the following command to generate some data into a collection called testData.
for (var i = 1; i <= 150; i++) { db.testData.insert( { x : i } ) }
Again, for the sake of conciseness I will not bother securing the database by creating user ids etc. I do recommend properly securing the database for any typical workload other than for demonstration purposes.
Type exit to leave Mongo DB. Next we will need to enable remote access to the MongoDB.
Enable remote connectivity to MongoDB
WARNING this is not recommended for a production environment and is only done for the sake of brevity in this blog post. To properly enable remote connectivity refer to the following security checklist from MongoDB.
MongoDB’s default IP binding prevents it from connecting outside of the local host. In order to change this, we need to modify the mongod.conf file.
Run the following command
Sudo nano /etc/mongod.conf
This will launch Nano which allow you to modify the configuration file as follows.
As you can see in the image above we need to add a new bindIp: 0.0.0.0 and commend out or delete the existing one. Be sure to exit and save the changes to the config file. It is also a good idea to restart MongoDB by running the following command.
sudo service mongod restart
Exit the connection to the Ubuntu server. We now need to add a port to the VM to allow for outside connections to it. To do this we need to go back to the VM in Azure.
Add inbound security rule
From the Azure portal navigate to the VM and click on Networking.
Select Add inbound port rule and fill out the screen as follows.
Once this is done, we can now test connecting to the MongoDB database using Studio 3T or any other tool.
Studio 3T
To connect to the MongoDB database use the public IP that was specified in Azure and also fill out the SSH Tunnel information as follows.
Notice that you need to use the same private key as before. Test the connection and you should see successfully results.
In the next post I will cover setting up SQL Server 2019.
Hopefully you have found this to be another practical post.
Until next time.
Anthony
Further References
https://docs.microsoft.com/en-us/azure/cosmos-db/connect-mongodb-account
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/install-mongodb
http://timmyreilly.azurewebsites.net/running-mongo-on-ubuntu-virtual-machine-in-azure/
2 thoughts on “Extend your information reach without over stretching by virtualizing data using SQL Server 2019 and MongoDB – Part 1 MongoDB”