Difference between revisions of "Data Set File Formats"
(→Working with Excel Files) |
|||
Line 1: | Line 1: | ||
Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank. | Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank. | ||
+ | |||
+ | ===Fast Load=== | ||
+ | When creating a [[data set]], you can use the "fast load" option (turned on by default) if your data meets the following requirements: | ||
+ | *The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number. | ||
+ | *The [[data set]] has no missing values or NaNs. | ||
+ | |||
+ | If your data does not meet these requirements, uncheck the "Fast Load" box when creating the [[data set]]. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file. | ||
+ | |||
=Working with Excel Files= | =Working with Excel Files= |
Revision as of 09:53, 24 July 2014
Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank.
Fast Load
When creating a data set, you can use the "fast load" option (turned on by default) if your data meets the following requirements:
- The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number.
- The data set has no missing values or NaNs.
If your data does not meet these requirements, uncheck the "Fast Load" box when creating the data set. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.
Working with Excel Files
- If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate data set. The data sets will be named after the corresponding worksheets.
- Excel files take longer to load that plain text files, so if you're working with large data sets you might want to use .txt files.