Working with server dataset XML files

This is considered an "advanced" skill for working with server datasets. If you are unfamiliar with server datasets, check out our Guide to dataset publishing.

Most of the time, when you would like to change the behavior of a server dataset, you will Edit it from the Design tab of the server console. However, there are some properties that cannot be changed from the Design tab. Whenever you would like to change those properties, you will have to delete the current server dataset, and create a new one. You can re-create the dataset from scratch, but you can also edit the previous dataset definition XML file.

What is a server dataset definition?

A server dataset definition defines its properties and behavior, including the dataset ID, the column headers, form links, dataset publishing, and more. However, it does not store dataset data. You can modify a server dataset by downloading the dataset definition and data, deleting the dataset from the server, editing the dataset definition, and then re-uploading the dataset definition and data (we will discuss this in detail later).

These are dataset changes where your only option is to edit the server dataset using its XML file:

Adding a series of dataset publishing mappings in bulk
Changing the dataset ID (discussed in Scenario 1 below)
Change the order of the columns
Removing a column that is not needed
Changing the unique ID field
If a colleague sends you a dataset definition that contains references to form definitions that are not deployed on your server, you can modify the XML file to remove those references so it can be successfully uploaded (discussed in Scenario 2 below)

If you have already started collecting data using a form that uses that server dataset (such as when using data from another form), it is inadvisable to make manual modifications to the server dataset. If you are careful, it is possible to do so safely, but it can still disrupt data collection workflows (e.g. a form will try to download dataset data, but the server dataset is not there). If you can tolerate the server dataset without those modifications, consider leaving it be, instead of making manual changes to the XML dataset definition.

Section header

XML file structure

This is a quick overview of the XML features you will need in SurveyCTO, but there are lots of other resources you can use to learn about XML files more thoroughly.

The server dataset definition is an XML file. Take a look at this basic XML file for a server dataset (you can also download it by clicking here):

It is usually easier to view XML files in a code editor, such as Notepad ++ for Windows or BBEdit for Mac. It will also make it easier to identify the row numbers, which are frequently referenced in this article. But, you can also use something more basic, such as Notepad for Windows, or TextEdit for Mac. (If you are comfortable with more advanced tools, you can use Visual Studio Code.)

Elements

XML elements contain the data. Elements start with a starting tag, and end with an ending tag, and they include everything in-between. The main visual difference between a starting tag and an ending tag is the ending tag starts with a slash / after the first angle bracket. For example, the </dataset> tag on row 14 is the ending tag of the <dataset> element that starts on row 2. That starting tag, that ending tag, and everything in-between is part of that <dataset> element.

You can also have elements within elements. For example, within the <dataset> element, which spans rows 2-14, is another element called <definition>, which spans rows 3-10. You'll notice that it ends on row 10, since the </definition> closing tag with the slash near the beginning is used there.

Elements can also be empty, meaning they have no content. These are defined by having a slash at the end of the tag, right before the angle bracket, instead of near the beginning. For example, the <formLinks/> tag on row 8 ends with a slash since it is an empty tag, since no form links have been defined for this server dataset.

Element text

Elements can also contain text. For example, the <id> element on row 4 contains the text "example_server_dataset", since that is the server dataset's ID.

Dataset definition XML files

This section references the server dataset used in the case management example article, specifically the dataset definition XML file, which you can download by clicking here.

The server dataset is defined in the XML file using elements. If you change the elements, you will change how the server dataset looks and behaves.

Basics

id: The unique server dataset ID. Used in pulldata() and search() expressions when pre-loading data to identify the dataset.

title: The user-friendly server dataset title.

datasetType: The dataset type. This should always have a value of "SERVER" (even for cases and enumerator datasets).

fieldNames: The comma-separated list of column headers. If you would like to change the order of the columns, or remove a column, you can do so here. For example, if you were to upload the example server dataset above under XML file structures to your server, the column headers would be in this order: id_key,name,age,gender,marital. However, if you would like the gender to be listed before the age, you can change it to id_key,name,gender,age,marital.

Form links

The XML file also defines which forms are attached to the server dataset, defined in the <formLinks> element.

Each form that is attached to the server dataset (i.e. each form that pre-loads data from the server dataset) is defined within the <formLinks> element. For each attached form, there is a <formLink> element (different than the <formLinks> tag with an "s"), and within those elements is a <formId> element. Within that <formId> element is the form ID of the form attached to the server dataset.

For example, in the example dataset definition XML file, if you take a look at rows 10, 13, and 16, you can see that the form definitions with the form IDs "complaint_followup", "school_form", and "water_usage" have this server dataset attached to them for pre-loading.

Data links

Data links were updated in SurveyCTO 2.80. Server datasets already on your server will be updated automatically. To update a server dataset definition, upload it to a server, and then re-download it.

To learn more about dataset publishing, check out our Guide to dataset publishing.

Dataset publishing is defined in the <dataLinks> element. Both publishing to the server dataset (such as publishing from a form) and publishing from the server dataset (such as to an external Google Sheet) is defined here.

For each form that publishes to or from the server dataset, within the <dataLinks> element, there will be a <dataLink>. Within that element are a variety of other elements used to define the publishing:

dataLinkClass: What type of file is being connected to the server dataset for publishing. For example, the value will be "FORM" if it is publishing data from a form, and "SPREADSHEET" if it is publishing to a Google Sheet.

dataLinkType: Whether data is being published to the server dataset ("INCOMING") or being published from the server dataset to another location ("OUTGOING").

dataLinkFormat: Has a value of "0" if data will be published in wide format, and a value of "1" if data will be published in long format.

linkObjectId: ID of the form or file that is being published to or from the server dataset. If it is a form, then it will be the form ID; if it is a Google Sheet, it will be the Google Drive file ID.

fieldMap: A JSON array of which fields or columns should publish to which columns. Each value in the array is an object that defines each field mapping. Each object has these properties:

formField: The name of the form field that is being published to the server dataset. If it is a repeated field, the field name will be followed by an asterisk (*).
datasetField: The name of the dataset column the form field will publish to. If the data is being published in wide format, and the form field being published to the server dataset column is a repeated field, then this value will be followed by an asterisk (*).
updateLogicAction: The publishing type. This can be "REPLACE" for Replace publishing, "ADD_TO_NUMERIC_VALUE" for Add publishing, or "CONCATENATE_TO_TEXT" for Concatenate publishing.
updateLogicOptions: An object used by Concatenate publishing. It is an object with two properties:
- separator: Which string value should be used to separate items in the list.
- position: Has a value of "BEGINNING" if new items should be added to the beginning of the list, and a value of "END" if new items should be added to the end of the list.

Both of these values will have a null value if it is for a field mapping that does not use Concatenate publishing. If the server dataset was originally created before SurveyCTO 2.80, then the whole property will have a null value (instead of an object); this is perfectly fine.

joiningField: The form field used for identifying rows. When setting up dataset publishing on the server, this is the Form field to identify unique records. For example, on row 36 of the example, the <joiningField> is defined as "caseid". So, based on the field mapping, when a form is submitted, if the value of the field "caseid" matches a value in the 'id' column of the server dataset, then the new data will be published to that row. If the value of the field "caseid" does not exist in the 'id' column, then the data will be added as a new row. This needs to be a field defined in the fieldMap. If it is a repeated field (used in long format publishing), the field name will be followed by an asterisk (*).

relevanceField: The field used to determine when the data should be published. If that field has a value of 1, then it will trigger publishing; otherwise, the data will not be published. When setting up dataset publishing on the server, this is the field specified under Include form submissions whenever this field is 1. This does not have to be a field defined in the fieldMap. If it is a repeated field (used for long format publishing), the field name will be followed by an asterisk (*).

isAutoConfigured: Boolean value. If this has a value of "true", then when a form is attached to this server dataset, dataset publishing is set up automatically. Used for enumerator datasets, so when a form is attached to the enumerator dataset, dataset publishing is automatically set up, so new enumerators are automatically added, with the 'id' and 'name' columns updated. Should always be "true" for enumerator datasets, and "false" for all other server datasets.

You cannot upload a server dataset XML file to the server while there are form links or data links to forms that are not deployed. So, once you know how to identify form and data links, you can delete them from the XML file before you upload the file. For an example, see scenario 2 below.

Case management properties

These properties are exclusive to cases datasets. They are defined in the <caseManagementOptions> element. To learn more about case management, check out our Guide to case management, especially part 3.

displayMode (string): How the case management menu will be displayed to data collectors, either as a "tree" or a "table".

showFinalizedSentWhenTree (bool): If this value is "true", and the displayMode is "tree" when forms are marked as finalized or sent, then when enumerators are viewing a case in SurveyCTO Collect, a list of finalized and sent forms will be listed in green. If this value is "false", then finalized/sent forms will not be listed in "tree" view. This does nothing if the displayMode is "table", since when a case is selected from the table, it will display the finalized forms for that case, even if showFinalizedSentWhenTree has a value of "false".

showColumnsWhenTable (XML): When the displayMode is "table", this is the list of columns that will appear in the case management menu. Each column name needs to be inside a <columnNames> element, so between <columnNames> and </columnNames> tags.

otherUserCode (string): When the enumerator property is used by cases, this is the code that needs to be entered to show all enumerators in the enumerator dataset, including hidden enumerators, from the Manage Cases menu.

entryMode (string): Which enumerator selection mode will appear by default when selecting an enumerator from the Manage Cases menu. Has value of "LIST" for list mode, "ENTRY" for entry mode, and "SCAN" for scan mode.

enumeratorDatasetId (string): ID of the enumerator dataset linked to this server dataset, which is used for selecting the enumerator.

To learn more about the enumerator management properties, check out our documentation Managing enumerators.

Enumerator dataset properties

These properties are exclusive to enumerator datasets. They are defined in the <idFormatOptions> element. They are used when generating a new enumerator:

prefix (string): What every new enumerator ID should START with.

suffix (string): What every new enumerator ID should END with.

numberOfDigits (integer): How long the new enumerator ID should be, not including the prefix or suffix.

allowCapitalLetters (bool): Has value of true if the randomly generated enumerator ID can contain not just numbers, but also capital letters. Has value of false if the enumerator ID should just contain numbers (not including the prefix or suffix).

Other properties

Within the definition element, there are three more properties:

discriminator (string): Has value of "CASES" if this is a cases dataset, and a value of "ENUMERATORS" if this is an enumerator dataset. If it is neither, then this will have a value of "DATA".

uniqueRecordField (string): The Unique ID field, the dataset column with the unique IDs. Must be a column header defined in fieldNames (see above).

allowOfflineUpdates (bool): Has value of "true" if the server dataset can be updated offline, or "false" if not. Will only work with compatible servers.

Instance and version

Outside of the definition element, but still inside the dataset element, there is one more element: instance. The instance element has a single element: version. This is the server dataset version when the dataset definition was initially downloaded (the version on the server goes up whenever it is updated, including an update from a form submission.

No matter the value, the instance value will reset to 1 when you deploy it to your server, so there is no need to change nor worry about it.

Changing the XML file

Generally speaking, it is a lot easier to change the server dataset from the server console's Design tab. But, you can also change the behavior of the server dataset by changing the XML file. To change a dataset, first download the dataset definition XML and data, delete the dataset from the server, make the needed changes to the XML file, then upload the XML file to the server with the CSV file. Here is a breakdown of those steps:

On the Design tab of the server console, scroll to the server dataset you would like to change.
For that server dataset, click Download, then click both Download data and Dataset definition to download the dataset data and definition. Make sure both the CSV dataset data and XML dataset definition have been downloaded successfully.
Once you have confirmed you have both the dataset definition and data, delete the dataset from the server.
Open the XML file in a text editor.
Change the XML file as needed.
Save the file.
On the server console's Design tab, click a + (plus) on the left, then Add server dataset.
Click the New dataset from definition tab.
Under Please choose a definition file to upload, click Select file, and select and open the dataset definition XML file.
Click Upload a dataset definition.
At the popup that says "Dataset created successfully", click OK.
When the dataset definition has been successfully uploaded, click its Upload button.
Under File with new dataset contents (.csv file), click Select file.
Select and open the CSV dataset data file.
Leave Append selected, and click Upload.
At the popup that says "Dataset modified successfully", click OK.

To learn more about downloading and uploading server datasets, check out our support article Working between servers.

Make sure you do not delete the dataset definition while data collection is still active. Wait for a "rest" period where enumerators are no longer submitting data to the server, since otherwise, there will be no server dataset to publish to. You can also inform all enumerators to stop submitting data (or stop data collection overall) during specified times so you can use that time to delete the old server dataset, and upload the new one.

Exercises

Here, we will walk you through some exercises so you can practice what you have learned. These are both real scenarios you may face with a server dataset.

Scenario 1: Change server dataset ID

You have a server dataset with the dataset ID 'example_server_dataset', but you would like to change it to 'respondent_data'. That way, when you reference it in a form definition's pulldata() or search() expression, it will be easier to understand what it is referring to. Click here to download the dataset definition.

Open the server dataset XML file in a text editor.
Select the content between the <id> tags.
Change this to 'respondent_data'.
Save the file.
Upload the dataset definition to your server.
Make sure the dataset ID of the dataset you just uploaded is now 'respondent_data'.

Tip: While this is the only way you can change the dataset ID, you can easily change the dataset title using the Rename button of the server dataset on the Design tab.

Scenario 2: Remove an attached form

A colleague has sent you a dataset definition that will be used in a future survey (to download the file, open the link, and click the download symbol in the upper-right). However, that form definition contains a <formLink> to a form definition that is not deployed on your own server. You cannot upload the dataset to your server until that form link has been removed (or deployed a form definition to your server that has that form ID). Download the dataset definition here.

Open the server dataset XML file in a text editor.
Scroll to the <formLinks> element on rows 8-12.
Select the <formLink> element from rows 9-11.
Delete that section.
Save the file, and upload it to your server.

The empty element <formLinks></formLinks> is considered to be the exact same as <formLinks/>. If you'd like, you can change the empty <formLinks> elements to that more simplified empty tag, but it is not required.

Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.