Difference between revisions of "Working with Random Variables"
(→Method 1: Declaring a Variable to be Random) |
(→Method 1: Declaring a Variable to be Random) |
||
Line 15: | Line 15: | ||
=Working with Random Variables= | =Working with Random Variables= | ||
==Method 1: Declaring a Variable to be Random== | ==Method 1: Declaring a Variable to be Random== | ||
− | To declare a variable to be random, right click its column header in the main table and select '''Treat as a Random Variable'''. To indicate that a particular variable is random, its name in the main table column header turns purple, and its values in the table are replaced with the word "Random". | + | To declare a variable to be random, right click its column header in the main table and select '''Treat as a Random Variable'''. |
+ | |||
+ | To indicate that a particular variable is random, its name in the main table column header turns purple, and its values in the table are replaced with the word "Random". | ||
+ | |||
+ | The reasoning behind this is that random variables have no single value for each row in your data set, rather they have a ''distribution'' of values. Any variables that are functions of one or more random variables are also defined by distributions. In order to force these function variables to still have a single value, Rave automatically converts any variables that are functions of random variables into ''statistic variables''. You will notice that the column headers of any such functions are changed to reflect this. | ||
+ | |||
+ | For example, suppose you have three variables, x,y, and z, such that z=f(x,y). If you change y into a random variable, you will see that the column header for z changes to something like "mean(z)" to indicate that the values displayed in the table are no longer the values of z itself, rather they are the mean value of z calculated over the distribution of y. | ||
+ | |||
+ | To calculate the statistics, Rave uses a sampling-based approach. This works as follows: suppose again that we have z=f(x,y), where y is random and x is deterministic. If Rave needs to calculate z for x=5 and y=N(0,1), (i.e. y is normally distributed with mean 0 and standard deviation 1), Rave samples many random values of y such that these sampled values are approximately distributed as N(0,1). Supposing 5000 such samples were used, Rave then evaluates z 5000 times, each time using x=5 and y=(each of the 5000 randomly sampled values in turn). This yields 5000 values of z, which Rave then aggregates back into a single value using the specified statistic, for example mean(z) would return the average of these 5000 values. | ||
==Method 2: Sampling Data from Distributions== | ==Method 2: Sampling Data from Distributions== |
Revision as of 15:36, 14 August 2013
Contents
Introduction
Random variables are useful for modeling uncertainty in a system. Unlike regular variable, which are defined by (deterministic) values, random variables are defined by distributions. Rave supports the use of random variables through various sampling (Monte Carlo) based approaches. The main ways to use random variables in Rave are:
- Declare a variable to be random, in which case it is treated as being random for all purposes in Rave.
- Sample data according to random distributions. This lets you generate a data set that involves randomness, but Rave otherwise treats this data as if it were a deterministic sampling.
In either case, Rave samples the random variables according to distributions that you define in Rave. The process of defining distributions is described below.
Note: only independent variables can be treated as random. Function variables become random when one or more of their input variables are random.
Defining and Modifying Distributions
Working with Random Variables
Method 1: Declaring a Variable to be Random
To declare a variable to be random, right click its column header in the main table and select Treat as a Random Variable.
To indicate that a particular variable is random, its name in the main table column header turns purple, and its values in the table are replaced with the word "Random".
The reasoning behind this is that random variables have no single value for each row in your data set, rather they have a distribution of values. Any variables that are functions of one or more random variables are also defined by distributions. In order to force these function variables to still have a single value, Rave automatically converts any variables that are functions of random variables into statistic variables. You will notice that the column headers of any such functions are changed to reflect this.
For example, suppose you have three variables, x,y, and z, such that z=f(x,y). If you change y into a random variable, you will see that the column header for z changes to something like "mean(z)" to indicate that the values displayed in the table are no longer the values of z itself, rather they are the mean value of z calculated over the distribution of y.
To calculate the statistics, Rave uses a sampling-based approach. This works as follows: suppose again that we have z=f(x,y), where y is random and x is deterministic. If Rave needs to calculate z for x=5 and y=N(0,1), (i.e. y is normally distributed with mean 0 and standard deviation 1), Rave samples many random values of y such that these sampled values are approximately distributed as N(0,1). Supposing 5000 such samples were used, Rave then evaluates z 5000 times, each time using x=5 and y=(each of the 5000 randomly sampled values in turn). This yields 5000 values of z, which Rave then aggregates back into a single value using the specified statistic, for example mean(z) would return the average of these 5000 values.