Revision as of 15:57, 22 August 2013

Introduction

A data set is a collection of variables and any corresponding data values.

Important points about data sets:

Data sets are either created by loading a new file, duplicating an existing data set, or creating a new data set from a design of experiments.
You can have any number of data sets loaded in a single Rave session
...but each data set is completely independent of the others. There can be no interactions between data sets. Each graph displays data from a single data set, each optimizer acts on a single data set, etc.
If a data set contains data, every variable must have the same number of rows of data.

Variables

A variable is the basic building block of your data sets in Rave. Variables are defined by the following attributes:

Variable Name

Each variable has a name. When you load a data set from a file, the variables names are taken from the first row of the file. Otherwise, Rave will ask you to name variables as they are created.

The following rules apply to variable names:

Variable names cannot contain a tab character
Variable names cannot contain a comma if your data file is comma delimited. For tab delimited or xls files, commas are ok.
Variable names cannot contain the double quotes character: "
Spaces are allowed; any leading/trailing spaces will be removed.

If your data set uses user-supplied functions that Rave must parse (i.e. plain text functions), the following rules also apply:

Variable names must begin with a letter
Only letters, numbers, spaces, and _ are allowable characters (no other symbols are allowed)
Spaces are allowed, as Rave will internally replace these with "_" as needed.

Variable Types

Each variable may be an:

Class

Each variable has one of the following classes that describes the information it encodes. When you load a new data set, Rave will attempt to determine the class of each variable. Typically numerical variables will begin as "continuous" and string variables will begin as "string". You can change these classes from the Manage Data Sets window.

Continuous numerical variables can take any value between their min and max values.

Integer numerical variables can take only integer values between their min and max values.

Discrete numerical variables can only take values from a user-defined list of allowable values.

Logical variables can only take 0 or 1 values. You can customize how these values appear on graphs, for example "True" vs "False" or "Yes" vs "No".

Text variables are like Discrete variables, but instead of taking numerical values they take any text string. Any operation that performs math cannot be used with string variables.

Note: Currently all dependent variables must be continuous, although the variable class is really only used by independent variables so this is not too important. (If the output of a function is discrete, then it doesn't matter if Rave thinks it is continuous.)

Allowable Values (Independent Variables Only)

Depending on the class of variable, each independent variable has a limited set of values it is allowed to take. These are enforced whenever Rave lets the user choose a value, for example, by using a slider control.

Note that these limits may not be enforced when working with optimizers from the Optimization Toolbox or the Global Optimization Toolbox.

Continuous variables have defined minimum and maximum values. The variable can take any value within this range. When you load a new data set, these values are set to be the minimum and maximum values found in the data file.

Integer variables have defined minimum and maximum values. The variable can take any integer value within this range. When you load a new data set, these values are set to be the minimum and maximum values found in the data file.

Discrete variables have a list of allowable values. They can only take values from this list.

Logical variables can be true or false. You can optionally force them to always be true or always be false.

Text variables have a list of allowable strings. They can only take values from this list.

Data and Current Values

Modifying Data Sets

There are several actions you can take in Rave to modify a data set. It will always be apparent when you are performing such an action. Working with graphs, optimizers, or surrogate models will never alter your data set without asking you first.

When you modify a data set, all graphs that use that data set will be automatically updated to reflect the new data.

Certain modifications will reset row colors, selection state, or visibility state. For example if you replace the entire data set, all rows will reset to their default state (visible, unselected, default color).

Some of the most common ways to modify a data set are:

Add new columns by loading a function
Add new rows by appending the results of an optimizer
Change individual data values by editing the main table
Delete rows by right clicking the main table header
Add new rows by appending a design of experiments, or replace all rows in the data set

Working With Missing Data Values

Rave allows your data set to contain "missing values." The following rules generally apply:

Currently, Rave only considers data to be missing for numerical variables. Missing data for string variables are simply considered to be a blank string.
Missing data appears in the main data table as "NaN" colored light gray.
Rave will not let you create a data set that contains a variable (column) in which every value is missing.
- If you modify a variable within Rave to make all data values NaN, bad things may happen.
When you create a data set, any "cell" in an otherwise numerical column that is a string or is empty will be converted to a "NaN" missing value.
You can replace missing numerical values in the main table by selecting the NaN cell and typing a number.
To replace a number with NaN you must type case sensitive "NaN" in the cell.
Each visualization will display missing data values differently, but in general visualization will only display rows of your data set in which EVERY variable being displayed in the visualization has no missing values.

@@ Line 33: / Line 33: @@
 *[[Functional variable]]
 *[[Random variable]]
-'''[[Independent Variables]]''' are not calculated by functions. Rave can use the values of independent variables stored in the [[data table]] to draw graphs, but since Rave has no way of calculating new values of independent variables, they cannot be used in [[continuous graphs]] or in most [[optimizers]]. When you load a new data set from a file, each column in the file becomes a new independent variable. After loading a data set, you can modify the values of independent variables by editing the [[main table]].
-'''[[Dependent Variables]]''' are calculated by functions that you have loaded from the [[Model Tab]]. Dependent variables can be functions of independent variables, or other dependent variables (or both). Rave can call the function that calculates dependent variables' values to generate new data, allowing dependent variables to be used for optimization and continuous graphs. However you can never directly modify the values of independent variables; you can only edit the values of the independent variables.
 ===Class===

Rave Documentation

Difference between revisions of "Data sets"