Difference between revisions of "Data Set File Formats"
(→Fast Load) |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank. | Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank. | ||
+ | |||
+ | |||
+ | ===Allowable Variable Names=== | ||
+ | In general, your variable names can be any string of text that doesn't contain whatever delimiter is used for your data file (e.g. in a .csv file your variable names cannot contain commas, but in other files types commas are ok. | ||
+ | |||
+ | But there are a couple additional restrictions (well, only one right now): | ||
+ | *Your variable names cannot be enclosed in angle brackets: <Variable Name>. However, angle brackets may otherwise appear in your names. E.g. "this<that" is a valid variable name, as is "thisthat>" but "<thisthat>" is not. | ||
+ | |||
+ | |||
+ | |||
+ | ===Fast Load=== | ||
+ | When creating a [[data set]], you can use the "fast load" option (turned on by default) if your data meets the following requirements: | ||
+ | *The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number. | ||
+ | *The [[data set]] has no missing values or NaNs. | ||
+ | |||
+ | Using fast load reduces the number of checks Rave does when loading your data. This makes the loading process significantly faster, but it may cause errors (such as only loading a portion of the data file) if your data does not meet the above requirements. | ||
+ | |||
+ | If your data does not meet these requirements, uncheck the "Fast Load" box when creating the [[data set]]. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file. | ||
+ | |||
+ | '''Note:''' The "Fast Load" check affects data files loaded by the following buttons in the Manage data gui: "Create this [[Data Set]]", "Append Data", "Replace Data" | ||
=Working with Excel Files= | =Working with Excel Files= | ||
− | *If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate [[ | + | *If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate [[data set]]. The [[data sets]] will be named after the corresponding worksheets. |
*Excel files take longer to load that plain text files, so if you're working with large [[data sets]] you might want to use .txt files. | *Excel files take longer to load that plain text files, so if you're working with large [[data sets]] you might want to use .txt files. |
Latest revision as of 16:05, 17 October 2014
Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank.
Allowable Variable Names
In general, your variable names can be any string of text that doesn't contain whatever delimiter is used for your data file (e.g. in a .csv file your variable names cannot contain commas, but in other files types commas are ok.
But there are a couple additional restrictions (well, only one right now):
- Your variable names cannot be enclosed in angle brackets: <Variable Name>. However, angle brackets may otherwise appear in your names. E.g. "this<that" is a valid variable name, as is "thisthat>" but "<thisthat>" is not.
Fast Load
When creating a data set, you can use the "fast load" option (turned on by default) if your data meets the following requirements:
- The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number.
- The data set has no missing values or NaNs.
Using fast load reduces the number of checks Rave does when loading your data. This makes the loading process significantly faster, but it may cause errors (such as only loading a portion of the data file) if your data does not meet the above requirements.
If your data does not meet these requirements, uncheck the "Fast Load" box when creating the data set. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.
Note: The "Fast Load" check affects data files loaded by the following buttons in the Manage data gui: "Create this Data Set", "Append Data", "Replace Data"
Working with Excel Files
- If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate data set. The data sets will be named after the corresponding worksheets.
- Excel files take longer to load that plain text files, so if you're working with large data sets you might want to use .txt files.