Data Set File Formats
Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank.
When creating a data set, you can use the "fast load" option (turned on by default) if your data meets the following requirements:
- The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number.
- The data set has no missing values or NaNs.
Using fast load reduces the number of checks Rave does when loading your data. This makes the loading process significantly faster, but it may cause errors (such as only loading a portion of the data file) if your data does not meet the above requirements.
If your data does not meet these requirements, uncheck the "Fast Load" box when creating the data set. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.
Note: The "Fast Load" check affects data files loaded by the following buttons in the Manage data gui: "Create this Data Set", "Append Data", "Replace Data"
Working with Excel Files
- If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate data set. The data sets will be named after the corresponding worksheets.
- Excel files take longer to load that plain text files, so if you're working with large data sets you might want to use .txt files.