If you are unfamiliar with comma-separated value (CSV) data, or if you are struggling to create .csv files that work properly, this is the article for you. Read on to learn:
- What a .csv file is.
- Editing found .csv files.
- Common problems with generating .csv files.
- Tips on how to create working .csv files with popular software.
|The SurveyCTO platform uses .csv data to pre-loaded data into forms, and you can also attach external datasets in CSV format to Data Explorer workbooks to visualize pre-existing data alongside newly collected data.|
What is a .csv file?
A comma-separated values (CSV) file is the most common and basic format for storing data. In a .csv file, a table of values is stored with columns delimited by commas, and with line breaks separating rows. The .csv file format is commonly used to transfer data between software systems, thanks to its simple and consistent format.
On the left you can see a conventional table of data. On the right you can see the same table of data, viewed in CSV format.
Editing found .csv files
Sometimes you'll receive data in CSV format that you need to edit. To edit data in CSV format, first import it into a suitable software program, like Microsoft Excel, Google Sheets, Stata, and similar.
See our documentation on importing CSV data into Excel for guidance. For other software, you'll want to be sure that the importing software recognises a.) the delimiter that's being used in .csv file (e.g. a comma), b.) and that the file encoding format is UTF-8 if you're working in a non-English script, or a Latin alphabet that uses special characters (discussed in more detail below).
Edit the imported data, save your work, and export again into CSV format if required. Continue reading to learn how to avoid common problems when creating .csv files.
Common problems with creating .csv files
Despite being a simple and common file type, there are some common problems SurveyCTO users run into while trying to create .csv files.
Using semicolons instead of commas
Some computers will generate .csv files where values are separated by semicolons (
;) instead of commas (
,). However, SurveyCTO expects true comma-separated data when uploading into the platform, using commas. For example, if you upload a .csv file that uses semicolons into a server dataset, you might see this result:
In the above screenshot, the CSV data uploaded to the server dataset is revealed using the "Edit" interface. Semicolon-separated data populates a single column of a server dataset. As no commas were present, the data is interpreted as having a single column, instead of the four you would expect.
Data stored in this way cannot be retrieved as you would expect using pre-loading.
Special characters not being preserved
If you work with a non-English script, or a Latin alphabet that uses special characters, storing such text in .csv file format might lead to problems displaying it correctly. Either the whole text, or just the special characters will be swapped out for symbols or other characters. This has to do with character encoding. While English characters are supported by older, less flexible encoding standards like ASCII, the broader variety of scripts in use are supported by more modern standards, like UTF-8 which the SurveyCTO platform prefers.
On the left you can see incorrectly encoded French special characters, saved in ASCII format, and on the right, you can see the same text correctly encoded in UTF-8 format.
|If you run into encoding problems, SurveyCTO Desktop can help. Try the following to re-encode a .csv file into UTF-8. To do so:
Troubleshooting .csv generation
The first step with troubleshooting .csv file problems is looking at the CSV data in its raw form. We recommend that you open your .csv file in a text editor like Notepad on Windows, or TextEdit on MacOS. While not easily readable when not split across columns and rows, you should be able to identify which separator was used, and whether a character encoding problem exists in the .csv file.
If either problem exists, we suggest you start from the beginning with generating your .csv file.
How to generate CSV data correctly
See below for software-specific advice about generating .csv files, but note that your computer's default settings may have an overriding effect on the delimiter that's used.
Your computer's default settings
Depending on the software you use, and your computer's language and locale settings, .csv files may be generated using semicolons as the delimiter, instead of commas. So, the first thing you'll want to do if you're generating .csv files using desktop software is to check on your computer's system defaults.
On Windows 10, do as follows:
- Go to Control Panel > Time & Language
- In the sidebar, select Region
- Under Related settings on the right, click on Additional date, time & regional settings > Change date, time or number formats (below Region).
- Finally, click on Additional settings, change the List Separator to comma, and make sure your Decimal Symbol is something other than a comma, such as a period.
On MacOS, do as follows:
- Go to System Preferences > Language & Region
- Click on Advanced… > General
- For Number separators, enter a comma for Grouping (we suggest a period for Decimal)
With your computer's default settings correctly configured, you'll need to focus on settings specific to your software.
Google's browser-based spreadsheet software, Google Sheets is probably the easiest and most reliable way to generate compliant .csv files. If you use Google Sheets, the above system default settings are irrelevant, as the locale settings that are relevant are your browser's. However, no matter your locale, Google Sheets should always generate true comma-separated value data, and correctly encode special characters in UTF-8 format.
To download a Google Sheets spreadsheet in .csv format, go to File > Download > Comma separated values.
The desktop version of Excel is governed by the system defaults discussed above on both Windows and MacOS, but you'll also need to select the right option when generating .csv files (as above).
In Microsoft Excel 2016 or higher, go to File > Save as…, and in the File format dropdown menu, pick CSV UTF-8. If you do this after having set the system delimiter, you should get a correctly formatted .csv file.
If you are using an older Microsoft Office version, go to File > Save as…, and in the File format dropdown menu, pick CSV. Then, you should either 1) open the CSV file in a Text editor, like Notepad, and save it with the correct encoding (in the Save As dialog, click the Encoding drop down and select the “UTF-8” option), or 2) use SurveyCTO Desktop to re-encode your CSV file to UTF-8 as above.
|For help importing CSV data into Microsoft Excel, see this article.|
For a free desktop alternative to Excel, you could try LibreOffice. In LibreOffice's Calc application, go to File > Save as, and pick the "Text CSV" File type option.
If you are working on a dataset in Stata that will be your pre-loaded data for a form, you can use the command
export delimited to save your dataset in CSV format. This command contains options that allow you to encode your CSV file correctly (option
encoding()), or specify the delimiter (option
decimalseparator()), if needed. Type help export delimited for more details.
If you happen to use another software package not listed here, we suggest that you Google how to generate .csv files with that software. It is likely to be a common query.
Do you have thoughts on this support article? We'd love to hear them! Feel free to fill out this feedback form.