Using data from another form

This support article is accompanied by the sample workflows in this folder. If you are a beginner, start with the beginner workflow. For help deploying the workflow, check out our support article on deploying form definitions and server datasets.

There are times where two forms are completed separately, but the second form needs data submitted by the first. For example, what if you are completing a midline survey, and you need the list of household members gathered during the baseline survey?

Using both server dataset publishing and pre-loading, you can publish data from one form to a server dataset, and then pull the data from the server dataset into a second form. In this support article, we will guide you and link you to resources so you can learn:

  1. How to set up server dataset publishing.
  2. How to set up pre-loading from a server dataset.
  3. How to keep your collection devices up-to-date with the latest pre-load data.
  4. Demonstrate a full working example that illustrates the above.

using-data-1.png

In the above chart, you can see that the enumerator fills out Form A, and when it is completed, its data is published to the server Dataset. Then, the data in the server dataset is pulled by Form B and Form C using the pulldata() function.

Note that form A could hypothetically pre-load data from the server dataset too. These features are very flexible.

1. Publishing data to the server dataset

First, you need to publish the data from the first form into a server dataset. If you are not familiar with dataset publishing, check out these resources, then come back here:

  1. Documentation: Introduction to advanced dataset usage
  2. Documentation: Publishing form data into server datasets
  3. Webinar: Build smarter forms with server dataset workflows

Data stored in a server dataset can be used by any other form. So, once server dataset publishing is set up, data published from that first form can be used by other forms.

Tip: It is a good idea for the server dataset column headers to have the same names as the form fields that publish to them. For example, you can set up the "name" field of the form to publish to the 'name' column of the server dataset, set up the "phone_number" field to publish to the 'phone_number' column, etc. This will make it easier to keep track of which fields publish to which columns.

2. Pulling data from the server dataset

Once the device has been updated with the latest server dataset data (which we'll discuss in section 3), the pulldata() function is used in a subsequent form to pull data from that server dataset. If you are not familiar with pre-loading and pulling data, check out these resources:

  1. Documentation: Pre-loading data into a form
  2. Webinar: Learn how to save time and enhance data quality by pre-loading data

Using the pulldata() function, you can pull server dataset data published by the first form, into the second form.

3. Device setup (including automatic updates)

It is important to keep in mind that in order to pull data from the first form, the first form instance has to be submitted to the server and published to the server dataset. Then, once server dataset publishing is complete (this can take up to five minutes), SurveyCTO Collect will need to download and install the latest version of the second form, which will include the updated server dataset. That way, the second form will be able to use the latest data submitted by the first form.

using-data-2.png

Form instances completed using the first form (the publishing form) should be submitted to the server as soon as possible. If they are not submitted, their data will not be published to the server dataset, and the second form will not be able to use that new data.

Similarly, the second form will need to be updated every time the server dataset is updated (e.g. every time new data is published to it) by downloading and installing the latest form definition. If devices are not updated with the latest form and dataset data, they will only be able to pull old, outdated data. Device data updates are required even if the second form (which does the pre-loading) will be completed on the same device that submitted the data.

It is possible to do all of this manually, but this involves making sure all enumerators submit their completed form instances in a timely manner, and also installing the latest form definition every time there is even a tiny change to the server dataset it pulls from. Instead, you can also use SurveyCTO Collect settings to make sure this all happens automatically.

Follow these steps to set up automatic submissions and updates:

  1. From the main menu, tap the three-dot menu in the upper-right.
  2. Tap General Settings.
  3. Checkmark all of the settings under Auto Send/Receive. These are:
    1. Auto send with Wi-Fi
    2. Auto send with network
    3. Auto send with Wi-Fi
    4. Auto download with network
    5. Auto download on demand
    6. Auto install downloaded updates
    7. Display send/receive status (optional, but recommended)

With these settings enabled, whenever a form is marked as finalized, it will be submitted to the server as soon as the device is online. Also, whenever an enumerator opens a form while online, SurveyCTO Collect will check for any updates, and then install them if any are found. That way, enumerators will have the latest baseline data attached to the midline form.

Tips:

  • If mobile internet service will be limited or expensive, you may want to disable the "network" Auto Send/Receive settings so you don't accidentally go over your data limit.
  • If any of those settings are not available, you can enable them from the Admin Settings.
You can apply these settings manually on each device, but you can also use quick setup so when the user logs into the device, these settings are automatically applied. To learn more, check out our documentation on managing device settings (section Default device configurations and quick setup), as well as our best practices for configuring and securing enumerator devices.

4. Examples

These examples include sample workflows. For assistance deploying them to your own server, check out our support article on deploying form definitions and server datasets.

The sample forms ask about the crops grown by the household, the household size, and each household member's name and annual income. When the first form, the baseline form, is submitted to the server, its data is published to a server dataset. Then, the second form, the midline form, uses the pulldata() function to pull that data from the server dataset.

using-data-3.png

Basic workflow

This example is accompanied by the beginner setup in this folder.

In this example, in the household midline form, take a look at the calculate field "pull_crops" on row 13. It uses this calculation to pull the space-separated list of crops grown:

pulldata('beginner_hh_dataset', 'crops_grown', 'hh_key', ${hh_id})

The value of that field is then used in different properties of the field "crops_produced" on row 18:

  1. The choice_filter expression uses the field "pull_crops" to make sure that only crops that were selected in the baseline form appear.
  2. The relevance expression uses the field "pull_crops" so if "None of the above" had been selected, then that field does not appear at all.

A similar process is used to pull other pieces of data from the CSV file. For example:

  1. On row 14, the name of the head of the household is pulled in the field "head_name" using this calculation:
    pulldata('beginner_hh_dataset', 'hh_name', 'hh_key', ${hh_id})
  2. On row 15, the last income reported is pulled in the field “last_income” using this calculation:
    pulldata('beginner_hh_dataset', 'income', 'hh_key', ${hh_id})

The only difference in between these calculations is the second parameter; we are pulling from the same server dataset and row each time, but we are pulling from different columns ('hh_name' and ‘income’, respectively). That way, the enumerator knows the name of the household head so they know whom to ask for when they arrive on site, and the form can calculate the "income_diff" using the last income reported.

This sample only pulls a handful of fields from the server dataset, but you can pre-load as many fields as you need. 

You cannot publish data from encrypted fields. If a form is encrypted, and you would like a field to be published, make sure the field has a publishable value of yes.

Intermediate workflow: Pulling repeated data

This example is accompanied by the intermediate setup in this folder.

When repeated data is published to a server dataset, it is stored in wide format. It takes the original field name, concatenates (adds on) an underscore _ to the end, followed by the repeat index number. For example, take a look at this part of the form definition:

using-data-4.png

The field "name" is a repeated field. So, the first instance of the field "name" will publish to the 'name_1' column, the second instance of "name" will publish to the 'name_2' column, and so on. Here is an example of what that data may look like for households with the 'id' values of 101 and 102:

id name_1 name_2 name_3 name_4
101 Adnan Bhavna Charles Dana
102 Aiden Bailey Calvin Delilah

This has been simplified to only show the 'id' and 'name_#' columns. In practice, there will also be 'age_#' columns, 'income_#' columns, and more.

Learn more about the format of exported (and published) data in the product documentation.

With the above in mind, you can construct an expression that can pre-load repeated data stored in wide format. First, you will need to construct the column name you would like to pull from. In the midline form, the field "hh_index" has a calculation of simply index(). So, the first instance of that field will return 1, the second instance will return 2, and so on. Next, to construct the column name, you will need to combine the original field name with the index using the concat() function. Take a look at this expression:

concat('income_', ${hh_index})

That expression takes the name of the field from the baseline form, "income", and concatenates an underscore to the end of it, followed by the repeat index. That way, the first repeat instance of that field will return "income_1", the second will return 'income_2', and so on. If you look at the server dataset data, you'll recognize these as column headers.

Finally, you can use that generated column header name in a pulldata() function. For example, the calculate field "last_income" on row 21 has this calculation:

pulldata('intermediate_hh_dataset', concat('income_', ${hh_index}), 'hh_key', ${hh_id})

The second parameter (concat('income_', ${hh_index})), instead of using the literal column header name, uses that concat() expression to dynamically build the column header name based on the repeat index number. That way, the first repeat instance of the calculation of the field "last_income" pulls from the 'income_1' column, the second repeat instance pulls from the 'income_2' column, and so on. In other words, the first repeat instance will pull the income of household member 1, the second repeat instance will pull the income of household member 2, and so on. That way, the income difference can be calculated for each individual household member.

Advanced workflow

If you would like to try a more advanced version of publishing and pre-loading data, check out the files in this folder. In that system, it will ask which household members from the baseline form are still household members in the midline form, ask what happened to previous household members, and allow the adding of new household members.

Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.

0 Comments

Article is closed for comments.