Warning: Parameter 1 to SyntaxHighlight_GeSHi::configureParser() expected to be a reference, value given in /var/www/vhosts/rave.gatech.edu/httpdocs/help/includes/Hooks.php on line 207

Warning: Parameter 1 to SyntaxHighlight_GeSHi::resourceLoaderRegisterModules() expected to be a reference, value given in /var/www/vhosts/rave.gatech.edu/httpdocs/help/includes/Hooks.php on line 207
Difference between revisions of "Data Set File Formats" - Rave Documentation

Difference between revisions of "Data Set File Formats"

From Rave Documentation
Jump to: navigation, search
(Fast Load)
Line 6: Line 6:
 
*The [[data set]] has no missing values or NaNs.
 
*The [[data set]] has no missing values or NaNs.
  
If your data does not meet these requirements, uncheck the "Fast Load" box when creating the [[data set]]. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.  
+
If your data does not meet these requirements, uncheck the "Fast Load" box when creating the [[data set]]. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.
  
 +
'''Note:''' The "Fast Load" check affects data files loaded by the following buttons in the Manage data gui: "Create this [[Data Set]]", "Append Data", "Replace Data"
  
 
=Working with Excel Files=
 
=Working with Excel Files=

Revision as of 10:57, 24 July 2014

Rave can read .txt, .csv, .xls, and .xlsx files. The file must be formatted as a "flat file database," which simply means it has the variable names in the first row, and data records in all remaining rows. Each column must be separated by a comma or tab (or be separate columns in Excel), and each row of the file must have the same number of columns. If data is missing, it should be listed as "nan", not simply left blank.

Fast Load

When creating a data set, you can use the "fast load" option (turned on by default) if your data meets the following requirements:

  • The format of each column can be inferred by the first row of data (the 2nd row of the file, after the column headers). I.e. if the value in the first row of a given column is a number, then every entry in that column is a number.
  • The data set has no missing values or NaNs.

If your data does not meet these requirements, uncheck the "Fast Load" box when creating the data set. As its name implies, turning off fast load will significantly increase the amount of time needed for Rave to load your data file.

Note: The "Fast Load" check affects data files loaded by the following buttons in the Manage data gui: "Create this Data Set", "Append Data", "Replace Data"

Working with Excel Files

  • If you load an Excel file that contains multiple worksheets, each (non-empty) worksheet will be loaded as a separate data set. The data sets will be named after the corresponding worksheets.
  • Excel files take longer to load that plain text files, so if you're working with large data sets you might want to use .txt files.