Working with server dataset XML files

This is considered an "advanced" skill for working with server datasets. If you are unfamiliar with server datasets, check out our support article on using data from another form, section 1. Publishing data to the server dataset.

Most of the time, when you would like to change the behavior of a server dataset, you will Edit it from the Design tab of the server console. However, there are some properties that cannot be changed from the Design tab. Whenever you would like to change those properties, you will have to delete the current server dataset, and create a new one. You can re-create the dataset from scratch, but you can also edit the previous dataset definition XML file.

What is a server dataset definition?

A server dataset definition defines its properties and behavior, including the dataset ID, the column headers, form links, dataset publishing, and more. However, it does not store dataset data. You can modify a server dataset by downloading the dataset definition and data, deleting the dataset from the server, editing the dataset definition, and then re-uploading the dataset definition and data (we will discuss this in detail later).

These are dataset changes where your only option is to edit the server dataset using its XML file:

  • Adding a series of dataset publishing mappings in bulk
  • Changing the dataset ID (discussed in Scenario 1 below)
  • Change the order of the columns
  • Removing a column that is not needed
  • If a colleague sends you a dataset definition that contains references to form definitions that are not deployed on your server, you can modify the XML file to remove those references so it can be successfully uploaded (discussed in Scenario 2 below)
If you have already started collecting data using a form that uses that server dataset (such as when using data from another form), it is inadvisable to make manual modifications to the server dataset. If you are careful, it is possible to do so safely, but it can still disrupt data collection workflows (e.g. a form will try to download dataset data, but the server dataset is not there). If you can tolerate the server dataset without those modifications, consider leaving it be, instead of making manual changes to the XML dataset definition.

XML file structure

This is a quick overview of the XML features you will need in SurveyCTO, but there are lots of other resources you can use to learn about XML files more thoroughly.

The server dataset definition is an XML file. Take a look at this basic XML file for a server dataset (you can also download it here):

xml-1.png

It is usually easier to view XML files in a code editor, such as Notepad++ for Windows or BBEdit for Mac. It will also make it easier to identify the row numbers, which are frequently referenced in this article. But, you can also use something more basic, such as Notepad for Windows, or TextEdit for Mac. (If you are comfortable with more advanced tools, you can use Visual Studio Code.)

Tags

One of the most important parts of XML is the tags. They define when an element starts and stops. Tag names are entered between angle brackets < and >. For example, in the sample above, the tag <dataset> is on row 2.

Elements

XML elements contain the data. Elements start with a starting tag, and end with an ending tag, and they include everything in-between. The main visual difference between a starting tag and an ending tag is the ending tag starts with a slash / after the first angle bracket. For example, the </dataset> tag on row 14 is the ending tag of the <dataset> element that starts on row 2. That starting tag, that ending tag, and everything in-between is part of that <dataset> element.

You can also have elements within elements. For example, within the <dataset> element, which spans rows 2-14, is another element called <definition>, which spans rows 3-10. You'll notice that it ends on row 10, since the </definition> closing tag with the slash near the beginning is used there.

Elements can also be empty, meaning they have no content. These are defined by having a slash at the end of the tag, right before the angle bracket, instead of near the beginning. For example, the <formLinks/> tag on row 8 ends with a slash since it is an empty tag, since no form links have been defined for this server dataset.

Element text

Elements can also contain text. For example, the <id> element on row 4 contains the text "example_server_dataset", since that is the server dataset's form ID.

Dataset definition XML files

This section references the server dataset used in the case management example article, specifically the dataset definition XML file, which you can download by clicking here.

The server dataset is defined in the XML file using elements. If you change the elements, you will change how the server dataset looks and behaves.

Basics

id: The unique server dataset ID. Used in pulldata() and search() expressions when pre-loading data to identify the dataset.

title: The user-friendly server dataset title.

datasetType: The dataset type. This should never be changed.

fieldNames: The comma-separated list of column headers. If you would like to change the order of the columns, or remove a column, you can do so here. For example, if you were to upload the example server dataset above under XML file structures to your server, the column headers would be in this order: id_key,name,age,gender,marital. However, if you would like the gender to be listed before the age, you can change it to id_key,name,gender,age,marital.

Form links

The XML file also defines which forms are attached to the server dataset, defined in the <formLinks> element.

Each form that is attached to the server dataset (i.e. each form that pre-loads data from the server dataset) is defined within the <formLinks> element. For each attached form, there is a <formLink> element (different than the <formLinks> tag with an "s"), and within those elements is a <formId> element. Within that <formId> element is the form ID of the form attached to the server dataset.

xml-2.png

For example, in the example dataset definition XML file, if you take a look at rows 10, 13, and 16, you can see that the form definitions with the form IDs "complaint_followup", "school_form", and "water_usage" have this server dataset attached to them for pre-loading.

Data links

Dataset publishing is defined in the <dataLinks> element. Both publishing to the server dataset (such as publishing from a form) and publishing from the server dataset (such as to an external Google Sheet) is defined here.

xml-3.png

For each form that publishes to or from the server dataset, within the <dataLinks></code< element, there will be a <dataLink>. Within that element are a variety of other elements used to define the publishing:

dataLinkClass: This is what type of file is being connected to the server dataset for publishing. For example, the value will be "FORM" if it is publishing data from a form, "SPREADSHEET" if it is publishing to a Google Sheet, and so on.

dataLinkType: Whether data is being published to the server dataset (INCOMING) or being published from the server dataset to another location (OUTGOING).

linkObjectId: ID of the file or form that is being published to or from. If it is a form, then it will be the form ID; if it is a Google Sheet, it will be the Google Drive file ID.

fieldMap: Which fields or columns should publish to which columns. This is in JSON format, so it starts and ends with a curly bracket, each field mapping is separated by a comma, and which fields publish where is indicated with a colon. For example, take a look at this field mapping from row 32 of the example, where the form with the ID "complaint_followup" publishes into the server dataset (spaces have been added so it is easier to read):

{"caseid":"id", "new_visit_num":"visit_num", "new_severity":"severity", "new_severity_en":"severity_enumerator"}

Here, the "caseid" field of the form publishes to the 'id' column, the "new_visit_num" field publishes to the 'visit_num' column, and so on.

joiningField: The field used for identifying rows. When setting up dataset publishing on the server, this is the Form field to identify unique records. For example, on row 33 of the example, the <joiningField> is defined as "caseid". So, based on the field mapping, when a form is submitted, if the value of the field "caseid" matches a value in the 'id' column of the server dataset, then the new data will be published to that row. If the value of the field "caseid" does not exist in the 'id' column, then the data will be added as a new row. This needs to be a field defined in the fieldMap.

relevanceField: The field used to determine when the data should be published. If that field has a value of 1, then it will trigger publishing; otherwise, the data will not be published. When setting up dataset publishing on the server, this is the field specified under Include form submissions whenever this field is 1. This does not have to be a field defined in the fieldMap.

You cannot upload a server dataset XML file to the server while there are form links or data links to forms that are not deployed. So, once you know how to identify form and data links, you can delete them from the XML file before you upload the file. For an example, see scenario 2 below.

Case management options

These are defined in the <caseManagementOptions> tag. To learn more about case management, check out our guide to case management, especially part 3.

displayMode: How the case management menu will be displayed to data collectors, either as a "tree" or a "table".

showFinalizedSentWhenTree: If this value is "true", and the displayMode is "tree" when forms are marked as finalized or sent, then a list of finalized and sent forms will be listed in green. If this value is "false", then they will not be listed at all. This does nothing if the displayMode is "table".

showColumnsWhenTable: When the displayMode is "table", this is the list of columns that will appear in the case management menu. Each column name needs to be between in a <columnNames> element, so between <columnNames> and </columnNames> tags.

Changing the XML file

Generally speaking, it is a lot easier to change the server dataset from the server console's Design tab. But, you can also change the behavior of the server dataset by changing the XML file. To change a dataset, first download the dataset definition XML and data, delete the dataset from the server, make the needed changes to the XML file, then upload the XML file to the server with the CSV file. Here is a breakdown of those steps:

  1. On the Design tab of the server console, scroll to the server dataset you would like to change.
  2. For that server dataset, click Download, then click both Download data and Dataset definition to download the dataset data and definition. Make sure both the CSV dataset data and XML dataset definition have been downloaded successfully.
  3. Once you have confirmed you have both the dataset definition and data, delete the dataset from the server.
  4. Open the XML file in a text editor.
  5. Change the XML file as needed.
  6. Save the file.
  7. On the server console's Design tab, click a + (plus) on the left, then Add server dataset.
  8. Click the New dataset from definition tab.
  9. Under Please choose a definition file to upload, click Select file, and select and open the dataset definition XML file.
  10. Click Upload a dataset definition.
  11. At the popup that says "Dataset created successfully", click OK.
  12. When the dataset definition has been successfully uploaded, click its Upload button.
  13. Under File with new dataset contents (.csv file), click Select file.
  14. Select and open the CSV dataset data file.
  15. Leave Append selected, and click Upload.
  16. At the popup that says "Dataset modified successfully", click OK.

To learn more about downloading and uploading server datasets, check out our support article on working between servers.

Make sure you do not delete the dataset definition while data collection is still active. Wait for a "rest" period where enumerators are no longer submitting data to the server, since otherwise, there will be no server dataset to publish to. You can also inform all enumerators to stop submitting data (or stop data collection overall) during specified times so you can use that time to delete the old server dataset, and upload the new one.

Exercises

Here, we will walk you through some exercises so you can practice what you have learned. These are both real scenarios you may face with a server dataset.

Scenario 1: Change server dataset ID

You have a server dataset with the dataset ID 'example_server_dataset', but you would like to change it to 'respondent_data'. That way, when you reference it in a form definition's pulldata() or search() expression, it will be easier to understand what it is referring to. Click here to download the dataset definition.

  1. Open the server dataset XML file in a text editor.
  2. Select the content between the <id> tags.
    xml-4.png
  3. Change this to 'respondent_data'.
    xml-5.png
  4. Save the file.
  5. Upload the dataset definition to your server.
  6. Make sure the dataset ID of the dataset you just uploaded is now 'respondent_data'.
    xml-6.png

Tip: While this is the only way you can change the dataset ID, you can easily change the dataset title using the Rename button of the server dataset on the Design tab.

Scenario 2: Remove an attached form

A colleague has sent you a dataset definition that will be used in a future survey (to download the file, open the link, and click the download symbol in the upper-right). However, that form definition contains a <formLink> to a form definition that is not deployed on your own server. You cannot upload the dataset to your server until that form link has been removed (or deployed a form definition to your server that has that form ID). Download the dataset definition here.

  1. Open the server dataset XML file in a text editor.
  2. Scroll to the <formLinks> element on rows 8-12.
    xml-7.png
  3. Select the <formLink> element from rows 9-11.
    xml-8.png
  4. Delete that section.
    xml-9.png
  5. Save the file, and upload it to your server.
The empty element <formLinks></formLinks> is considered to be the exact same as <formLinks/>. If you'd like, you can change the empty <formLinks> tags to that more simplified empty tag, but it is not required.

Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.

0 Comments

Please sign in to leave a comment.