Data sets
Contents
Introduction
A data set is a collection of variables and any corresponding data values.
Important points about data sets:
- Data sets are either created by loading a new file, duplicating an existing data set, or creating a new data set from a design of experiments.
- You can have any number of data sets loaded in a single Rave session
- ...but each data set is completely independent of the others. There can be no interactions between data sets. Each graph displays data from a single data set, each optimizer acts on a single data set, etc.
- If a data set contains data, every variable must have the same number of rows of data.
Variables
A variable is the basic building block of your data sets in Rave. Variables are defined by the following attributes:
Variable Name
Each variable has a name. When you load a data set from a file, the variables names are taken from the first row of the file. Otherwise, Rave will ask you to name variables as they are created. Variable names can be any string as long as it does not contain a tab.
See also: Renaming Variables
Dependency
Each variable may be independent or dependent:
Independent Variables are not calculated by functions. Rave can use the values of independent variables stored in the data table to draw graphs, but since Rave has no way of calculating new values of independent variables, they cannot be used in continuous graphs or in most optimizers. When you load a new data set from a file, each column in the file becomes a new independent variable. After loading a data set, you can modify the values of independent variables by editing the main table.
Dependent Variables are calculated by functions that you have loaded from the Model Tab. Dependent variables can be functions of independent variables, or other dependent variables (or both). Rave can call the function that calculates dependent variables' values to generate new data, allowing dependent variables to be used for optimization and continuous graphs. However you can never directly modify the values of independent variables; you can only edit the values of the independent variables.
Class
Each variable has one of the following classes that describes the information it encodes. When you load a new data set, Rave will attempt to determine the class of each variable. Typically numerical variables will begin as "continuous" and string variables will begin as "string". You can change these classes from the Manage Data Sets window.
Continuous numerical variables can take any value between their min and max values.
Integer numerical variables can take only integer values between their min and max values.
Discrete numerical variables can only take values from a user-defined list of allowable values.
Logical variables can only take 0 or 1 values. You can customize how these values appear on graphs, for example "True" vs "False" or "Yes" vs "No".
Text variables are like Discrete variables, but instead of taking numerical values they take any text string. Any operation that performs math cannot be used with string variables.
Note: Currently all dependent variables must be continuous, although the variable class is really only used by independent variables so this is not too important. (If the output of a function is discrete, then it doesn't matter if Rave thinks it is continuous.)
See also: Changing variable classes
Allowable Values (Independent Variables Only)
Depending on the class of variable, each independent variable has a limited set of values it is allowed to take. These are enforced whenever Rave lets the user choose a value, for example, by using a slider control.
Note that these limits may not be enforced when working with optimizers from the Optimization Toolbox or the Global Optimization Toolbox.
Continuous variables have defined minimum and maximum values. The variable can take any value within this range. When you load a new data set, these values are set to be the minimum and maximum values found in the data file.
Integer variables have defined minimum and maximum values. The variable can take any integer value within this range. When you load a new data set, these values are set to be the minimum and maximum values found in the data file.
Discrete variables have a list of allowable values. They can only take values from this list.
Logical variables can be true or false. You can optionally force them to always be true or always be false.
Text variables have a list of allowable strings. They can only take values from this list.
See also: Changing variables allowable values
Data and Current Values
Modifying Data Sets
There are several actions you can take in Rave to modify a data set. It will always be apparent when you are performing such an action. Working with graphs, optimizers, or surrogate models will never alter your data set without asking you first.
When you modify a data set, all graphs that use that data set will be automatically updated to reflect the new data.
Certain modifications will reset row colors, selection state, or visibility state. For example if you replace the entire data set, all rows will reset to their default state (visible, unselected, default color).
Some of the most common ways to modify a data set are:
- Add new columns by loading a function
- Add new rows by appending the results of an optimizer
- Change individual data values by editing the main table
- Delete rows by right clicking the main table header
- Add new rows by appending a design of experiments, or replace all rows in the data set