Best practices for choice value names in select_multiple and select_one fields

When choosing values for a choice list, it is generally recommended that each choice has a numeric value (e.g. 1, 2, 3, etc), even though “string” values are allowed (i.e. you can use letters for values). One reason is that numeric values are preferred by most data analysis software packages. Those packages often allow you to label variables, but some of them, such as Stata, only allow labelling if your choices use numeric values. Consult your application’s documentation to check the preferred format for choice values.

crops crops_1 crops_2 crops_3 crops_4
1 3 4 1 0 1 1

However, if there are no obvious advantages to using numeric values for your choices, strings can be excellent choice values. Good string values can be immediately read and understood, while numeric codes may need to be looked up. For example, ‘apl’ is more immediately understood than “1” to mean ‘apple’. This also makes binary values generated in exports from select_multiple fields more immediately understandable. These string values shouldn’t be too long, since each binary field that is generated in exports will be a concatenation of the name of the select_multiple field and its value (for example, the value ‘apl’ within the select_multiple field ‘crops’ would become the ‘crops_apl’ column).

crops crops_apl crops_ban crops_car crops_dat
apl car dat 1 0 1 1

Extended explanation

Numeric values are great for exported data, but they also make various form methods easier to implement.  For example, you can use the numeric index of a repeat group (returned using the index() function) to check which numeric values were selected in the select_multiple field; e.g. you can use the expression selected(${crops}, index()) if the value of ‘crops’ is 1 3 4, but not if its value is apl car dat, since index() will always be a number (for a detailed example, check out the Follow-ups sample form, method 4). 

1.png

However, if there is no real reason to use numeric values, string values may be the better option for your choice values. For example, imagine you have a form with this select_multiple field:

Say an enumerator fills out the form, and they select apples, carrots, and dates. When exporting the data, the “crop” field’s data will look like this:

crops crops_1 crops_2 crops_3 crops_4
1 3 4 1 0 1 1

Here, 1 means that the choice in the select_multiple field was selected, and 0 means it was not. But at a glance, you don’t know which header refers to which crop (which crop is “crop_1”?). Instead, we can give each choice a string value:

2.png

That way, when exporting the data, the headers of each column are easier to understand:

crops crops_apl crops_ban crops_car crops_dat
apl car dat 1 0 1 1

Here, it is obvious even at a glance that the respondent selected apples, carrots, and dates, but not bananas.

Updating form design during data collection

Be sure to settle on choice values BEFORE starting data collection. While you can update the form design mid-way, it may end up looking something like this (with a new array of columns for the more nicely named split binaries):

crops crops_apl crops_ban crops_car crops_dat crops_1 crops_2 crops_3 crops_4
1 3 4 0 0 0 0 1 0 1 1
apl car dat 1 0 1 1 0 0 0 0

The first row is before the changes (numeric values), and the second row is after (string values). They don’t line up, which can be confusing during data analysis.

Value name length

Something else to keep in mind is that while there is no limit to value or field name length in SurveyCTO, other applications, such as Stata, DO have variable length limits. So, it is recommended to use short string values in choice lists when using this form design method, since it will be appended to the end of the field name (for example, “crops_ban” is generated from the field name “crops” and the value “ban”).

Odd exception - Adding additional choices not pre-loaded from the CSV

If you are pre-loading choices from a CSV file or dataset (see our documentation or this webinar recording for details), and you would like to add additional choices, those additional choices must have values that are numeric, not strings.

For example, let’s say we have a choice list from a pre-loaded a CSV file. For the labels, we will use the data in the ‘name’ column in the CSV, and for the values, we will use the data in the ‘id_key’ column. But then, you also want to have a “None of the above” option that is not in the CSV file. The value for the extra option needs to be numeric. If we tried to give “None of the above” a value of “none”, or even just “n”, it would not work because of how the search() function matches strings. It needs to be numeric, such as “0”, so it is overlooked in the string matching process.

Advanced example: Repeat group with choice-label()

Another option is to have a repeat group that cycles through each option selected in a select_multiple field, and then use the choice-label() function to get the name of the choice if it was selected. You can also use that repeat group to ask follow-up questions, and with the choice-label() function, even a quick glance at the data will tell you which crop each column refers to. Check out this sample form to see this in practice (feel free to upload it to your own server).


Consult our documentation on using expressions in your forms to read more about the index() and choice-label() functions, and loading multiple-choice options from pre-loaded data to read more about the search() function.

Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.

0 Comments

Article is closed for comments.