Server datasets are what make SurveyCTO data collection workflows dynamic, supporting a range of functionality such as case management, enumerator management, and pre-loading previously submitted data into new blank forms. While the feature has many helpful sub-features, it is necessary for data stored in server datasets to be readable by users with access to the server and the SurveyCTO Collect app.
For instance, case management systems include personally identifying information (PII) to help assigned caseworkers confirm they're speaking to the right person but at the same time, anyone with sufficient permissions to access the same team on a server can also view cases and manage them.
While careful planning and application of the principle of least privilege can limit the scope of risk, the level of data sensitivity or organizational policy may require that you take additional precautions. For this reason, we have introduced a system for encrypting the contents of server datasets at-rest, blocking access even for authorized users who have successfully logged in but do not possess the key that was used to encrypt the data. This system is similar to SurveyCTO's support for optional encryption of form data.
With it, you can continue to provide your fieldwork team with a view of helpful PII while guarding it more strongly against everyone else, helping to improve security protocols.
Note: This system augments SurveyCTO’s already high levels of security with additional options for those requiring extra precautions due to highly sensitive data. In general, data stored in server datasets is secure, but this encryption system provides an extra layer of protection. |
Overview
This server dataset encryption system has the following parts and requires some technical know-how:
- A Python library for performing the following tasks:
- Generating an encryption key
- Encrypting CSV data for use as pre-load data
- Decrypting CSV data
- Generating a convenient QR code for the encryption key
- A field plug-in for decrypting pre-loaded data within a form
- A field plug-in for encrypting data before it is published to a server dataset
To test this out and understand it more clearly, test this refugee data management use case which is installable via the Hub.
Encrypting data before upload to a server dataset
If you wish to seed a server dataset with sensitive data as a starting point that needs to be decrypted and displayed in forms, you must first encrypt it before uploading it. The scto-encryption Python package supports this with the following capabilities:
- Generating encryption keys
- Encrypt and decrypt CSV data
- Generating QR codes for easily inputting an encryption key into a form
The encryption functionality of this package is selective, allowing you to exclude specified columns from encryption. By selectively encrypting sensitive columns and leaving non-sensitive data unencrypted, you maintain the functionality of your pre-loaded data while enhancing security. Next, we’ll discuss how to handle encryption within forms.
Encrypting data in a form
Depending on your workflow, new data from the field might need to be encrypted to protect it. The encrypt field plug-in supports this. However, the supporting form design itself should not include the encryption key to use as form definitions are not encrypted. Instead, the supporting form should support the manual capture of the encryption key, either by scanning a QR code or by text input.
Field(s) in the design supporting the field plug-in will store encrypted versions of those value(s). It is essential that the fields used directly to collect sensitive data and to store the encryption key be protected in the form design through form encryption (otherwise the sensitive data and key will be exposed in the form data regardless).
Decrypting data in a form
If you only need to encrypt data to protect it at the point of collection, SurveyCTO's built-in form encryption feature is enough. However, if you wish to display and/or edit already collected data, you'll need to pre-load it from a server dataset to which it has been published or manually uploaded.
If you're publishing data from form submissions that are sensitive, you'll want to use the encrypt field plug-in to protect it. If you're pre-populating a server dataset with sensitive data through a file upload, you'll want to first encrypt it with the scto-encryption Python package.
Once encrypted data has been pre-loaded from a server dataset, you can decrypt it using the decrypt field plug-in. As above, the form should not store the encryption key in the design itself for security reasons. Instead, the form should support manual entry via QR code scanning or via text input. Once the key is provided, the data can be decrypted using this field plug-in.
As above, it is essential that the field that stores the key for decrypting data be protected using form encryption (or else the key will be exposed as part of submitted form data).
Re-encrypting decrypted data
If your use case involves successively recalling records over time, displaying them, and selectively updating them, you'll want to decrypt the sensitive data you pre-load, display and/or modify it, and encrypt it again. The form would need to be configured to publish that data back to the dataset it was pre-loaded from, so the latest data is available the next time it is needed.
To accomplish such workflows, you'll want the encrypt and decrypt field plug-ins working together as part of the same form workflow.
Data access
When configured correctly, data collected using these workflows are accessible only by people with the encryption key who also have access to:
- SurveyCTO Collect with the relevant form(s) installed. All data collection users will need access to the encryption key that works with the encrypt and decrypt plug-ins (they don't need the form encryption key that is also crucial to such workflows). Whether you have one or hundreds of data collection users, they'll need to follow careful procedures not to compromise the security of the key.
- Server console users with access to the team the relevant form(s) and server dataset(s) are located in. Such users can export the encrypted data from the form(s) or server dataset(s) in use for a project and use the scto-encryption package to decrypt it.
As long as access to the key is carefully controlled, this system provides a similar level of data security compared to form encryption, with data being hidden even from the SurveyCTO engineering team thanks to data being stored at-rest in server datasets (as well as in forms).
Conclusion
By selectively encrypting sensitive columns and leaving non-sensitive data unencrypted, you maintain the functionality of your pre-loaded data while enhancing security.
0 Comments