Difference between revisions of "Metadata file format"

Latest revision as of 20:00, 12 November 2011

Each of the eight metadata types described below is stored in the .rvm file in a section beginning with the line %#KEYWORD. The keywords are listed below. The eight metadata types

1 %#FUNCTIONS - Functions that act on this data set.
2 %#RANGES - Variable Classes and List of allowable values
3 %#CONSTRAINTS - Constraints
4 %#CURRENTPOINT - Current point
5 %#TARGETSANDPREFERENCES - Target values and optimization info
6 %#COLORS - Variable Colors
7 %#ROWSTATE - Row Colors, Visibility, and Selection
8 %#POINTSOFINTEREST - Points of Interest (coming soon)

%#FUNCTIONS - Functions that act on this data set.

The FUNCTIONS section contains a list of all functions that act on this data set. Loading a .rvm file with a functions section is the same as loading each of these functions using the model tab, but the rvm file completely automates the process without asking for any user input.

Each variable that is an output of a function appears on a different row in this section. This section has no “header” row –the first variable appears on the first row. The format for each row is (with a tab/comma between each entry):

1) Variable name as you want it to appear in Rave (must obey normal matlab rules except that it may contain spaces). If this variable is considered an “objective function” (i.e., if it’s value is a function of preferences and targets), precede its name with an asterisk. Example: *MyObjective.

2) Full path to function file. This is either a .m or a .txt function. If you created a function by pasting text in or typing manually into the function editor, this will point to the m file that was created in your rave default directory.

3) The output of the function that corresponds to the variable named in (1). I.e., if you function has 5 outputs and this variable is the third output, this value will be 3.

4-n) The names of all variables that are inputs to this function. If the file specified in (2) is a .txt file, you can omit these (the variables will be determined by parsing the text file). If the file in (2) is a .m file, these are required. IMPORTANT: RAVE only supports feed-forward function analysis, so this list can only include variables that either appear as columns in your initial data set, or appear as the first entry in a previous row in this file. Although in theory you could just list every previous variable in this list, that will lead to excessive function evaluations within rave, as each input that is itself a function will be also be evaluated each time this function is evaluated (if all your functions are lightning fast, that might not be a big deal).

Note that if you have a single function that exports 5 variables, and you want to include all of them, you will have 5 rows where entries 1 and 3 are different for each row, and 2 and 4-n are the same.

Important: The FUNCTIONS section should always be listed first in the .rvm file so that the remaining sections can refer to variables defined in the FUNCTIONS section. If you don’t have a FUNCTIONS section, the remaining metadata files can only refer to variables included in your initial data set (or that have otherwise been created before loading the metadata files).

%#RANGES - Variable Classes and List of allowable values

Section format: Each variable appears in its own row, which has the following fields (separated by delimiter) If you omit any variables, they will be treated using the default rave settings.

1) Variable name (exactly as it appears in original data file)

2) Variable class, choose from: constant, continuous, discrete, logical, integer, or string

3-n) Remaining columns contain numerical values that define the allowable values for this variable. Depending on the data type specified in (2), the interpretation of these values is different. If the variable named in (1) is a function output as defined by the FUNCTIONS section, the values listed here don’t define the allowable values, but define the min/max range of variability, which will be used to set initial ranges on things like profilers, contours, and colormaps.

Interpretation of columns 3-n is:

For constant variables: the first value listed defines the value of the variable. Any other values in the row have no effect.
For continuous or integer variables: the min/max of the values that appear here define the min/max allowable values for the variable. All other values in the row have no effect.
For discrete or string variables: Each unique entry in the row is an allowed value for the discrete variable. Duplicate entries have no effect.
For logical variables: If all entries in the row are 0, this variable can only be “off”, if all entries are 1, this variable can only be “on”. Otherwise, the values are ignored and variable can be on or off. Duplicate entries have no effect.

For variables modeled by functions: the min/max values that appear in the row define the min/max “known” values for the variable. (All other values in the row have no effect.) These min/max values are used as the default limits when preparing contour plots, colormaps, prediction profilers, etc, however if in the course of creating those graphs, new values outside this range are found (for example, by using an optimizer) these ranges will automatically be widened to encompass the newly discovered bounds.

%#CONSTRAINTS - Constraints

This section contains constraints that act on the data set and/or variables modeled by functions defined in the FUNCTIONS section of this rvm file. Each constraint appears on its own row, and must be of the form (expression)<=0. Note that RAVE does not currently distinguish between <= and <. Each row consists of the following entries, separated by tabs:

(1) The constraint expression, excluding the “<=0”. I.e., just the left hand side of the constraint inequality. Only variables in the initial data set or listed in the first column of the .rvefun file can be included in this expression.

(2) (optional) The constraint color, which will be used to draw this constraint whenever it appears on a graph. Must be in matlab color format, including the square brackets (i.e., “[1,0,0]” for red) This entry may be excluded for any/all constraints, and they will be drawn using the default color instead.

%#CURRENTPOINT - Current point

This section contains the definition of the current point, used as the starting point for derivative profilers, contour plots, etc. This section MUST contain a value for each variable that appears in the initial data set. If values are also included for other variables (such as those that are outputs of functions defined in the FUNCTION section of the rvm file) those values will have no effect.

Section format: First row contains all variable names included in the intial data set. You MUST include ALL of those variables.

Second row contain the current value for the corresponding variables named in the first row. If any of the variables are “strings”, this value should NOT be the string itself, but its index if all the allowable strings (as defined in the RANGES section of the rvm file) were sorted alphabetically. For example, if a string variable can have values a,b,c, or d, and the “current point” value is “b”, it should be recorded in this file “2”, not “b”. If there is more than one row of values, multiple analyses will be created. (See note at end of this section)

%#TARGETSANDPREFERENCES - Target values and optimization info

This section contains information used by optimizers: the “best” value for each variable, and the “preference” for that variable. Any variables not listed in this section will be given a target of ‘-inf’ (i.e., minimize) and a preference of ‘0’.

Section format: First row contains variable names included in the initial data set or the .rvefun file. You can omit any of the variables you don’t care about.

Second row contains the target value for each variable named in the first row, either ‘-inf’ to minimize, ‘inf’ to maximize, or a numerical value for target matching.

Third row contains the preference for each variable named in the first row. These are used by some objective functions to form weighted aggregate objectives. These can be any numerical value between 0 (don’t care at all) and 1 (care a whole lot). Ideally, the values in this row should sum to 1.

%#COLORS - Variable Colors

Each variable in RAVE is assigned a unique color, so that each time it appears in a graph it is drawn in the same color. You can use this file to define those colors. Any variables you omit from this file will be assigned a color automatically be RAVE.

File format: First row contains variable names included in the initial data set or the FUNCTIONS section. You can omit any of the variables you don’t care about.

Second row contains a 3-element colorspec vector for each variable named in the first row. These vectors must contain the square brackets, e.g. enter [1,0,0] for red. All values must be between 0 and 1, i.e. in the typical MATLAB format.

%#ROWSTATE - Row Colors, Visibility, and Selection

This section no header row, and has the same number of rows as your initial data set. Each row contains 3 columns. The first column indicates the row color (an integer between 1 and 10), the second column indicates whether this row is visible or not (1 for visible, 0 for invisible) and the third row indicates whether this row is currently selected or not (1 for selected, 0 for unselected). Usually you will want to start with all rows visible and unselected. Note that for this file, colors are defined by a single integer instead of a 3-element vector. These integers are mapped to actual colors as defined by your raveprefs file. So all rows of the same color should have the same value, but you specify the actual color by changing your raveprefs file.

%#POINTSOFINTEREST - Points of Interest (coming soon)

In addition to the data in your original data set, you can specify a set of “points of interest,” which are special data points that are particularly interesting to you. For example, the result of an optimization might be a point of interest. These do not appear in the same manner as the regular data, but can be individually viewed/hidden on any graph, and each point has its own unique marker, not affected by the formatting of the current graph. NOTE: you probably want to keep this list fairly short. The format of this section is identical to the initial data set: variable names in the first row and data values in subsequent rows. HOWEVER, each row (except the first) also has 5 additional columns, which contain (in the order listed below):

1st additional column: Marker shape. A single character of the following: s,p,h,o,v,^ ,+,.

(That last . is a legal option) These mean: square, 5-sided star, 6-sided star, circle, triangle, cross, point.

2nd additional column: Marker size. An integer value indicating the size of the marker. Typically in the 4-20 range.

3rd additional column: Marker fill color. A 3-element color vector, including the square brackets. E.g. [1,0,0] for red. OR the word ‘none’ (without quotes) to have no fill.

4th additional column: Marker edge color. A 3-element color vector, including the square brackets. E.g. [1,0,0] for red. OR the word ‘none’ (without quotes) to have no edge.

5th additional column: Edge width. An integer value indicating how thick to draw the edge. You probably want either 1 or 2.

Rave Documentation