Integration with Power BI, part 5: Using Microsoft Azure

In this final part of this guide, we’ll guide you on the steps needed to achieve an integration between your SurveyCTO server and Power BI via Microsoft Azure. Please read the introduction before reading this part.

Microsoft Azure is a cloud-based database solution. As it is part of the Microsoft product ecosystem, it is amongst the data sources you can use to provide Power BI with data. As Azure supports digest authentication, it can use SurveyCTO's API to return data from your SurveyCTO server, acting as a bridge into Power BI.

Consider Azure as an option if your organization has long term data collection projects, and many data consumers who need to independently view up to date dashboards and reports about the status of data collection. Azure facilitates this because the database you will setup will automatically update itself, allowing anyone viewing dashboards and reports in Power BI to click on Refresh, to view the very latest data. Azure will also effectively facilitate long term storage of very large datasets over time, larger than Google Sheets or Excel supports, even larger than is advisable for a SurveyCTO form (which is why we recommend auto-purging for any long term integration scenario).

This guide focuses on creating an Azure Data Factory to push data into Azure Blob Storage, which will in turn connect with PowerBI.

There are other options too, including Azure Data Lake Storage or SQL Database, for example. Feel free to explore all your options by reading Microsoft Azure help topics.

The steps below are specific to the SurveyCTO case. By setting up this workflow, you will end up with an Azure subscription, an Azure Resource Group, an Azure Storage Account (including a blob container) and an Azure Data Factory with 1) two Linked Services; 2) two Datasets and 3) one Pipeline.

The workflow will be as follows: in the Azure Data factory, we will create a Pipeline to Copy Data from a Dataset that’s fed by SurveyCTO API requests (Source) to a Dataset stored in a Blob Container (Sink). In Power BI, you’ll then create a Query by Getting Data via Azure Blob Storage. Briefly, here are the steps:

  1. Create an Azure Storage Account
  2. Create an Azure Data Factory
    1. Create 2 Linked Services: 1) SurveyCTO API; 2) Blob Storage
    2. Create 2 Datasets
    3. Create a Pipeline to Copy Data from Dataset 1 to Dataset 2
  3. Create a Query on PowerBI

Note that if you do not have an Azure Subscription, you’ll need to create one before following all these steps. Sign up here.

1. Create an Azure Storage Account

If you already have an Azure Storage Account and it’s the one you want to use to store SurveyCTO data, please skip this step.

  1. In your Azure portal, select All services > Storage > Storage accounts.
  2. In the Storage accounts page, create a new Storage Account.
    1. Click + Add.
    2. Under the Resource group field, Create new or choose an already existing one.
    3. Give your Storage Account a name and click Review + create to keep the default values (it may fail the first time, but try again in a minute). Otherwise, feel free to adjust to your use case, and select Next : Advanced to add other settings.

Note that this guide will follow the steps using a Storage Account with Account kind StorageV2 (general purpose v2) and location East US, West US.

2. Create an Azure Data Factory

If you already have an Azure Data Factory and it’s the one you want to use with the SurveyCTO API, please skip this step. The general steps for creating an Azure Data Factory can be found in this Microsoft documentation. If you prefer doing this using other tools (e.g. Azure PowerShell, Python, etc.), click on the documentation link and change the Quickstart accordingly. Note that as of writing this, the Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.

  1. In your Azure Portal, select + Create a resource on the left menu, select Analytics, and then select Data Factory.
01.png newDataFactory1.png
  1. On the new data factory page, enter a Name. The name of the Azure Data Factory must be globally unique. If you see an error, change the name of the data factory and try creating it again.
  2. For Subscription, select the Azure subscription in which you want to create the Data Factory.
  3. Under Resource Group, select Use existing, and select the resource group where your Storage Account was created (the one you created in 1.2).
  4. For Version, select V2.
  5. Choose the location.
  6. If it is checked, uncheck Enable GIT.
  7. Select the blue Create button below.
  8. After the creation is complete, you should see the Data Factory page. If you don’t, select All services > Analytics > Data factories, and click on your recent Data Factory.
  9. Click on the Author & Monitor tile to start the Azure Data Factory user interface (UI) application on a separate tab.

monitordatafactory.png

  1. On the Let's get started page, switch to the Author tab (pencil icon) in the left panel.

pencil.png

I. Create Linked Services

To summarize, this is the data flow this guide is suggesting:

drawing_chart.png

We’re still on the Azure side, so we’ll drop Power BI for a while and focus on the first linkage. Briefly, it’s as if you had two different datasets (the SurveyCTO API and Azure the Blob Container), and you want to copy data from one to another. In an Azure Data Factory, before creating these datasets, you must first create linked services to define the connection information needed for the Data Factory to connect to external resources.

Create two linked services and their two respective datasets.

SurveyCTO API Connection

  1. In the Author Tab (pencil icon) of your Data Factory, click on Connections, and then select the New button on the toolbar.

connection2.png

  1. On the New Linked Service page that appears, switch to the ‘File’ tab and select ‘HTTP’, and then Continue, and a configuration window will open:
    1. Even though you can enter the full URL link here, you’ll have the option to insert the Relative URL later. This way, you’ll be able to use this Linked Service to feed several Datasets, for maximum flexibility.
    2. Give this Linked Service a name of your choice.
    3. For the Base URL, enter https://servername.surveycto.com as per the SurveyCTO API documentation (be sure to replace servername with the name of your server).
    4. Choose Digest Authentication, and enter your SurveyCTO username and password. This must be a user with the permission, Can download data. The Data manager user role would do (learn more here: Managing user roles).
    5. Click on Test connection at the bottom and, if successful, click Finish. Note that the integration runtime will be filled out automatically.
06.png 07.png

Azure Blob Storage Connection

  1. While still in the Data Factory resource, click to add + New Linked Service.
  2. On the New Linked Service page, select Azure Blob Storage, and then select Continue.
  3. Give your Linked Service a name and test the connection by completing the following steps:
    1. For Name, enter “AzureStorageLinkedService” (without quotes).
    2. For Storage account name, select the name of your Azure storage account.
    3. Select Test connection to confirm that the Data Factory service can connect to the storage account.
    4. Select Finish to save the linked service.
BlobStorage.png newlinkedservice.png

II. Create Datasets

In this procedure, you will create two datasets: a.) the input dataset and b.) the output dataset. These datasets are JSON datasets, and they refer to the linked services that you created in the previous section.

The input dataset represents the data pulled by the API, which will copy data into the output dataset stored in the blob container.

As you create this connection, there is no “Save” button. You will create each part (each part will be in its own tab), and then publish them all at once.

Input Dataset

  1. While still in the Azure Data Factory Author page, select the + (plus) button on the top left (right below Factory Resources), and then select Dataset.

10.png

  1. On the New Dataset page, select HTTP, and then select Continue.
  2. Choose the JSON format, and select Continue.
  3. Go to the Connection tab closer to the bottom (it will be between the General and Schema tabs)
  4. For Linked service, choose the previous HTTP Linked service configured.
  5. Add the Relative URL according to the SurveyCTO API documentation. In my case, I will insert /api/v2/forms/data/wide/json/test?date=0 to pull all data from the form with the ID, "test", in JSON format.

ConnectionHttp.png

Alternatively, instead of using date=0, set a date early enough to retrieve all data available. For example, if you started your data collection in August 1, 2019, then any earlier date will fetch all observations. This is the recommended approach as it will avoid any issues/errors caused by rate limiting. You can use any online converter to convert a date into Unix epoch time.

 

  1. Scroll down. For File pattern, select Array of objects.
  2. Select Preview data to check whether data is being pulled correctly. It should preview the submissions for that form. If it says “No Data”, confirm that there are no submissions for that form yet before continuing (although, we advise that you make some test form submissions so that there are some records present). Click on X to get out of the Data Preview.

httpdatasetconnection2.PNG

Output Dataset

  1. Again, select the + (plus) button right below Factory Resources, and then again select Dataset.
  2. On the New Dataset page, select Azure Blob Storage, and then select Continue. Then, choose the JSON format, and select Continue again.
13.png 14..png
  1. Go to the Connection tab between General and Schema
  2. For Linked service, choose the previous Azure Blob Storage Linked service configured
  3. In the File path, insert the folder name and file name you want for your form data. The file name needs to have the correct extension - in this case, .json. Both can only contain lowercase letters, numbers, and hyphens, and begin with a letter or number..
  4. In the File Pattern, choose Array of objects.

III. Create a Pipeline

  1. Select the + (plus) button below Factory Resources, and then select Pipeline.

15.png

  1. In the Activities Menu, open Move & Transform.
  2. Drag Copy Data to the whitespace on the right.
If you cannot access the tabs mentioned in the upcoming steps, you probably accidentally clicked away. To access them again, simply repeat the above step with the Copy Data module again.

copydata.png

  1. You should already be on the General tab. You must give this pipeline a Name; you can leave it as the default.
  2. Now let’s start configuring our connection between the input dataset and the output dataset. In the Source tab (closer to the bottom, between General and Sink), select our input dataset (HttpFile1).
  3. Go to the Sink tab, and select our output dataset (AzureBlob1)
  4. Near the top, click Validate. Ensure there are no errors, and fix any that appear.
  5. Select Debug. Resolve any issues that arise.
  6. On the top-right, select Publish All.Doing so will trigger the connection. Now, you should already have data in your Blob Storage.
  7. Go back to the Azure Portal (it will probably be in a different browser tab, but keep the Data Factory tab open, since you will need it later).
  8. Go to your Storage Account, and access your Blob container by clicking on Blobs under Blob Service or in the Storage Account homepage:

blobs.png

  1. Inside your Blob Storage, you may already see the blob container (form-data) and the JSON file (named "test" in the screenshots):

18.png

19.png

  1. In order to refresh this file with the latest collected data sent to the Server, you need to add a Trigger. Go back to the Pipeline Menu in the other tab, and click on Add trigger > New/Edit.

newtrigger.png

  1. A window will open for you to configure the trigger. Select the Choose trigger dropdown, and select New.
  2. Fill out the Trigger, including how frequently you want to pull data via the API. Note that if you are requesting all submissions (date=0), your trigger should have at least 5 minute intervals.
  3. Once saved and finished, click on Publish All to publish the trigger configuration. If you click on the Monitor tab (below the pencil icon), you’ll be able to monitor the Pipeline Runs and Triggers Runs.

3. Create a Query in Power BI

After configuring everything in Azure, you just need to setup the connection on the Power BI side. Follow these steps:

  1. Open Power BI Desktop and select Get Data from the Introduction Window. Otherwise, go to the Home Tab and click on Get Data.
  2. Click on Azure, choose Azure Blob Storage, and select Connect.

getdatablob.png

  1. Then, it will open a window to insert your Account name or URL. Here, insert your Storage Account Name ("martacto" in the screenshot) and then insert the Account key:
    1. In the Azure portal, go back to All services > Storage > Storage accounts, and select your storage account.

accesskey.png

Source: Manage storage account settings in the Azure portal

    1. In the new side-panel that appears, select Access keys.
    2. Copy the values for the Storage account name and key1 boxes to the clipboard and paste them into the Account key box on PowerBI.
  1. A new Window will open showing all your Blob containers - check the one with your data (forma-data) and click Transform Data.

form-data.png

  1. Here, click on Binary just to double check your data.

binary0.png

  1. Click on To Table. A window will open, click Ok.

binary.png

ToTable.png

  1. Click on the arrows to the right of the column named “Column1”. Here, uncheck the option Use original column name as prefix - this will add unnecessary prefixes to your variable names. If you have an alert with List may be incomplete, click on Load more… and then Ok.

powerbi_loadmore.png

  1. Your dataset will finally appear. Click on Close and Apply, and wait for the changes to apply. Your data is now ready for any Power BI dashboard!

datasetblob.png

  1. From here, you’ll be able to create Reports and Dashboards using the dataset imported via API request. Whenever you want to refresh your tables and graphics so that they’re all illustrating the latest data available on the Server, click on Refresh in the menu bar:

refresh_powerbi.png

Clicking Refresh will run your script, which will trigger a new API request. Your dataset will be updated, as well as all associated graphics and tables. You can learn more about refreshing in these help topics from Microsoft:

You've reached the end of our guide on ways to integrate with Power BI. As in the introduction, these are just a few methods that work with SurveyCTO. Microsoft offers many paths to getting your data into Power BI.

0 Comments

Please sign in to leave a comment.