The SurveyCTO platform allows data to be uploaded from outside the platform which can be uploaded in CSV or Excel format. If you are unfamiliar with comma-separated value (CSV) data, or if you are struggling to create .csv files that work properly, this is the article for you. Read on to learn:
- What a .csv file is.
- Editing found .csv files.
- When to use Excel or .csv files.
- Common problems with generating .csv files.
- Tips on how to create working .csv files with popular software.
|What sorts of external data can you upload? Outside data can be uploaded in order to be pre-loaded into forms, which is integral to many data collection workflows including case management workflows. Using server datasets, you can also attach external data to Data Explorer workbooks to visualize pre-existing data alongside newly collected data. Further, custom web form user interface translations can be uploaded.|
What is a .csv file?
A comma-separated values (CSV) file is the most common and basic format for storing data. In a .csv file, a table of values is stored with columns delimited by commas, and with line breaks separating rows. The .csv file format is commonly used to transfer data between software systems, thanks to its simple and consistent format.
On the left you can see a conventional table of data. On the right you can see the same table of data, viewed in CSV format.
Editing found .csv files
Sometimes you'll receive data in CSV format that you need to edit. To edit data in CSV format, first import it into a suitable software program, like Microsoft Excel, Google Sheets, Stata, and similar. Depending on your usage plans (see the next section) you may decide to keep the data in Excel format, which SurveyCTO also accepts.
See our documentation on importing CSV data into Excel for guidance. For other software, you'll want to be sure that the importing software recognises a.) the delimiter that's being used in .csv file (e.g. a comma), b.) and that the file encoding format is UTF-8 if you're working in a non-English script, or a Latin alphabet that uses special characters (discussed in more detail below).
Edit the imported data, save your work, and export again into CSV format if required. Continue reading to learn how to avoid common problems when creating .csv files.
When to use .csv or Excel files
As above, SurveyCTO accepts both CSV and Excel format data uploads. Excel workbooks aren't an internally supported data format but they are converted to .csv on the server as a convenience. The greater the number of rows and columns of data you're uploading, the longer this conversion process will take. As a rule of thumb, if you're uploading a number of records approaching or greater than 10,000, it would probably be worthwhile to convert to CSV yourself before uploading the data.
For special characters to be preserved in an Excel workbook (read the section below) they should be entered in a UTF-8 encoded font. Excel's character encoding is wider than UTF-8, so what looks correct in Excel may not look the same after the conversion process on the server. If you're having trouble with special, and non-English characters being preserved when uploaded from Excel, this could also be a reason to export first to CSV so you can view the CSV data in a text editor before uploading.
Otherwise if you prefer working in Excel, it will be your most convenient format.
Common problems with creating .csv files
Despite being a simple and common file type, there are some common problems SurveyCTO users run into while trying to create .csv files.
Using semicolons instead of commas
Some computers will generate .csv files where values are separated by semicolons (
;) instead of commas (
,). However, SurveyCTO expects true comma-separated data when uploading into the platform, using commas. For example, if you upload a .csv file that uses semicolons into a server dataset, you might see this result:
In the above screenshot, the CSV data uploaded to the server dataset is revealed using the "Edit" interface. Semicolon-separated data populates a single column of a server dataset. As no commas were present, the data is interpreted as having a single column, instead of the four you would expect.
Data stored in this way cannot be retrieved as you would expect using pre-loading.
Special characters not being preserved
If you work with a non-English script, or a Latin alphabet that uses special characters, storing such text in .csv file format might lead to problems displaying it correctly. Either the whole text, or just the special characters will be swapped out for symbols or other characters. This has to do with character encoding. While English characters are supported by older, less flexible encoding standards like ASCII, the broader variety of scripts in use are supported by more modern standards, like UTF-8 which the SurveyCTO platform prefers.
On the left you can see incorrectly encoded French special characters, saved in ASCII format, and on the right, you can see the same text correctly encoded in UTF-8 format.
If you run into encoding problems, SurveyCTO Desktop can help. Try the following to re-encode a .csv file into UTF-8. To do so:
Troubleshooting .csv generation
The first step with troubleshooting .csv file problems is looking at the CSV data in its raw form. We recommend that you open your .csv file in a text editor like Notepad on Windows, or TextEdit on MacOS. While not easily readable when not split across columns and rows, you should be able to identify which separator was used, and whether a character encoding problem exists in the .csv file.
If either problem exists, we suggest you start from the beginning with generating your .csv file.
How to generate CSV data correctly
See below for software-specific advice about generating .csv files, but note that your computer's default settings may have an overriding effect on the delimiter that's used.
Your computer's default settings
Depending on the software you use, and your computer's language and locale settings, .csv files may be generated using semicolons as the delimiter, instead of commas. So, the first thing you'll want to do if you're generating .csv files using desktop software is to check on your computer's system defaults.
On Windows 10, do as follows:
- Go to Control Panel > Time & Language
- In the sidebar, select Region
- Under Related settings on the right, click on Additional date, time & regional settings > Change date, time or number formats (below Region).
- Finally, click on Additional settings, change the List Separator to comma, and make sure your Decimal Symbol is something other than a comma, such as a period.
On MacOS, do as follows:
- Go to System Preferences > Language & Region
- Click on Advanced… > General
- For Number separators, enter a comma for Grouping (we suggest a period for Decimal)
With your computer's default settings correctly configured, you'll need to focus on settings specific to your software.
Google's browser-based spreadsheet software, Google Sheets is probably the easiest and most reliable way to generate compliant .csv files. If you use Google Sheets, the above system default settings are irrelevant, as the locale settings that are relevant are your browser's. However, no matter your locale, Google Sheets should always generate true comma-separated value data, and correctly encode special characters in UTF-8 format.
To download a Google Sheets spreadsheet in .csv format, go to File > Download > Comma separated values.
The desktop version of Excel is governed by the system defaults discussed above on both Windows and MacOS, but you'll also need to select the right option when generating .csv files (as above).
In Microsoft Excel 2016 or higher, go to File > Save as…, and in the File format dropdown menu, pick CSV UTF-8. If you do this after having set the system delimiter, you should get a correctly formatted .csv file.
If you are using an older Microsoft Office version, go to File > Save as…, and in the File format dropdown menu, pick CSV. Then, you should either 1) open the CSV file in a Text editor, like Notepad, and save it with the correct encoding (in the Save As dialog, click the Encoding drop down and select the “UTF-8” option), or 2) use SurveyCTO Desktop to re-encode your CSV file to UTF-8 as above.
|For help importing CSV data into Microsoft Excel, see this article.|
For a free desktop alternative to Excel, you could try LibreOffice. In LibreOffice's Calc application, go to File > Save as, and pick the "Text CSV" File type option.
If you are working on a dataset in Stata that will be your pre-loaded data for a form, you can use the command
export delimited to save your dataset in CSV format. This command contains options that allow you to encode your CSV file correctly (option
encoding()), or specify the delimiter (option
decimalseparator()), if needed. Type help export delimited for more details.
If you happen to use another software package not listed here, we suggest that you Google how to generate .csv files with that software. It is likely to be a common query.
Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.