Friday, March 9, 2012

MonteCarlo Simulations of Trade Data in Excel

“What can be tested, must be tested.” -Victor Niederhoffer

Summary
1.  MonteCarlo simulation allows us to find the probability of certain outcomes based on random inputs and repeat this process thousands of times.
2.  First, take your sample data and find the descriptive statistics and frequency distribution using Excel’s descriptive statistics and Histogram functions.
3.  Second, assign a random variable based on the frequency distribution of the sample data.
4.  Third, assign a Normal Random Variable to the random data by using the Mean and Standard Deviation of the random data using the =NORMINV() formula.
5.  Fourth, run the descriptive statistics and the frequency distribution on your randomized data.
6.  Fifth, compare the descriptive statistics and the frequency for the sample data and the randomized data and make conclusions.

If you have been around the financial world even a little bit, you are probably familiar with MonteCarlo simulations.  If not, let me explain.  A MonteCarlo simulation is a problem solving technique used to approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables.  Investopedia.  Monte Carlo simulation enables us to model situations that present uncertainty and play them out thousands of times on a computer.  Use of a random variable ensures that we are not biasing the results – that is, we are leaving up to chance whether the outcome fits within our parameters.

Excel is a popular tool to generate random variables.  Use of the =Rand() formula in a cell returns a value that is equally likely to return a value between 0 and 1.  By pressing the F9 key, you can reset the value in the cell.

Back to the Coin Toss Example

Remember our fair coin where the odds of getting a head or tails are even?  For this example, let’s say we want to answer 2 questions and we can’t use a formula to find the result.

1.  What are the odds that a coin will come up with heads exactly 7 times when flipped 10 times?
2.  What are the odds that a coin will come up with at least 7 heads when flipped 10 times?

Using the =Randbetween() formula in Excel will help us answer this question.  The =Randbetween() formula returns a random result for the entries that we make.  So if we enter:

=Randbetween(0,1)

Excel will return 0 or 1 randomly.  We can assign Heads to equal 1 and Tails to equal 0.

A summary of building this spreadsheet is excellently presented here.

Here are the results for flipping exactly N number of heads:

Result (exactly)
No. of Trials
Probability
Cumulative Probability
0 Heads
11
0.09%
0.09%
1 Head
137
1.09%
1.17%
2 Heads
515
4.08%
5.25%
3 Heads
1547
12.26%
17.51%
4 Heads
2636
20.89%
38.40%
5 Heads
3074
24.36%
62.76%
6 Heads
2574
20.40%
83.16%
7 Heads
1418
11.24%
94.40%
8 Heads
568
4.50%
98.90%
9 Heads
130
1.03%
99.93%
10 Heads
9
0.07%
100.00%
TOTAL
12608
100%


For the spreadsheet I built, we assumed 12,608 trials of flipping a coin 10 times.  7 heads came up in 1,418 of those trials so, 1,418/12608 = 11.24% chance that heads will come up exactly 7 times in 10 coin flips.  The actual probability for this is 11.7% so our model does a pretty good job of approximating.  The cumulative probability column shows that there is a 94.40% chance that in 10 flips of a coin, you will have 7 or fewer heads show up.  Stated otherwise, there is only a 5.6% chance of 8 or more heads will show up when you flip a coin 10 times.

This chart is what is referred to a histogram and shows the number of occurrences of exactly N number of heads in our 12,608 iterations (where 1 iteration is 10 coin flips).  Notice the bell shape?  That is because this is a normal distribution.  More on that later.

This chart shows the individual and cumulative probability of flipping N heads in 10 coin flips.

Here is the results for the test where we are looking for at least 7 or more heads in 10 coin flips:

Results
No. Trials
Probability
at least 1 head
12608
99.91%
at least 2 heads
12471
98.83%
at least 3 heads
11956
94.75%
at least 4 heads
10409
82.49%
at least 5 heads
7773
61.60%
at least 6 heads
4699
37.24%
at least 7 heads
2125
16.84%
at least 8 heads
707
5.60%
at least 9 heads
139
1.10%
at least 10 heads
9
0.07%
greater than 10
0
0.00%

Thus, in 12,608 trials, the odds of flipping 7, 8, 9 or 10 heads in 10 coin flips is 16.84%.  The actual result is 17.2% so our model again does a fair job of predicting the actual odds.  If we had run the iteration say 100,000 times, our probability formula would have gotten closer and closer to the actual result. 

Applying Monte Carlo Simulations to Trading Data

Monte Carlo simulation is a statistical tool. The primary to use it in trading is to monitor a strategy. For example, if we have 300 trades in a market, we have a set of data. We
can then take sample from the data set over and over again to build up a trading record that didn’t actually happen (by using random variables), but could have potentially happened.  You then compare the hypothetical track record to some back tested results and to the actual results, and they should all match. If they don’t match, and the real time trading is doing significantly poorly, then we can only assume that that strategy needs to stop being traded.

Let’s say that I have a trading system that trades the E-Mini S&P 500 contract.  The trading system has completed 306 trades and has the following profit probabilities:

Points
Count
Probability
Cumulative Probability
0
22
7.31%
7.31%
1
21
6.98%
14.29%
2
49
16.28%
30.56%
3
34
11.30%
41.86%
4
42
13.95%
55.81%
5
32
10.63%
66.45%
6
19
6.31%
72.76%
7
16
5.32%
78.07%
8
16
5.32%
83.39%
9
15
4.98%
88.37%
10
35
11.63%
100.00%

You can read the table as follows:  there is a 7.31% chance that the system will make less than 0 points, there is a 6.98% probability that the system will make between 0 and 1 points, etc.  Or, there is a 14.29% probability that a given trade will make a profit of less 1 point, a 30.56% chance that a given trade will make a profit of less than 2 points, etc.

Remember, that if we are to accept that the market has a random distribution of outcomes, then using a random variable to determine a given outcome should align closely with the trade data that we have.

Step 1 – Find the descriptive statistics of the sample data

Using your raw data from your trading statistics, let Excel do the heavy lifting for you and run the descriptive statistics.  For a quick outline on how to run descriptive statistics in Excel, click here.

Raw Data
Mean
4.64951
Standard Error
0.211601
Median
3.75
Mode
0
Standard Deviation
3.701505
Sample Variance
13.70114
Kurtosis
0.359983
Skewness
0.984527
Range
15.75
Minimum
0
Maximum
15.75
Sum
1422.75
Count
306

A quick review of the descriptive statistics output above:

-  The Mean is the average profit for all of the trades.  The trading system averaged 4.65 points for every trade it took.
-  The Median is the “number in the middle” ie if you were to number all the profits from high to low, what is the number that falls in the middle of the sample.
-  The Mode is the most frequently repeated number in the data.
-  The Minimum and Maximum is the lowest and highest number in the sample.
-  The Count is the total number of trades you took.
-  The Range is the Maximum minus the Minimum.
-  The Sample Variance is a measure of the data from the Mean – it is essentially the average of the squared distance from the Mean.  A higher sample variance means more dispersion from the Mean.
-  The Standard Deviation is the square root of the variance.  It measures how close the data is to the Average or Mean.
-  The Standard Error indicates how close the Mean is to the true population and is calculated by dividing the standard deviation by the square root of the total number of observations.  Standard Error gives us confidence intervals surrounding the average:

Z
% Certainty
Lower Bound
Upper Bound
1 (.99)
68%
4.44
4.86
2 (1.96)
95%
4.23
5.07
3 (2.58)
99%
4.01
5.07

Step 2 – Assign a random variable to each profit/loss point target

We can then associate a profit target probability with a given random variable.  Recall that Excel will generate a random number between 0 and 1 by using the =Rand() formula. The table below applies a random number generated by Excel and returns a given profit objective.

Points
Count
Probability
Cumulative Probability
Random Number Assigned
0
22
7.31%
7.31%
Less than .0731
1
21
6.98%
14.29%
Greater than or equal to .0731 and less than .1429
2
49
16.28%
30.56%
Greater than or equal to 14.29 and less than .3056
3
34
11.30%
41.86%
Greater than or equal to .3056 and less than .4186
4
42
13.95%
55.81%
Greater than or equal to .4186 and less than .5581
5
32
10.63%
66.45%
Greater than or equal to .5581 and less than .6645
6
19
6.31%
72.76%
Greater than or equal to .6645 and less than .7276
7
16
5.32%
78.07%
Greater than or equal to .7276 and less than .7807
8
16
5.32%
83.39%
Greater than or equal to .7807 and less than .8339
9
15
4.98%
88.37%
Greater than or equal to .8339 and less than .8837
10
35
11.63%
100.00%
Greater than or equal to .8837

For example, a random number of “.06” will return a point value of “0”.  Running the random number generator a number of times will ensure that we get 7.31% of our outcomes to equal “0” points.

The key to this is to create two tables.  The first table is as follows:

Vlookup Table
0
0
0.07309
1
0.142857
2
0.305648
3
0.418605
4
0.55814
5
0.664452
6
0.727575
7
0.780731
8
0.833887
9
0.883721
10

This table will act as your random number generator lookup table.  Notice that it corresponds to the table above.

The second table is as follows (remember this table goes down 10,000 rows):

Trial
Rand
Result
1
0.4346237
4
2
0.4944565
4
3
0.0626749
0
4
0.6907086
6
5
0.1438746
2
6
0.5635747
5
7
0.8293976
8
8
0.9459425
10
9
0.7881337
8
10
0.8844642
10


The column labeled “Trial” is simply the trial or experiment number.  The column labeled “Rand” is where we entered the formula =Rand() to return a random number between 0 and 1.  The “Result” column is where we entered the following formula:

=VLOOKUP (Rand , VLookupTable, 2)

Let me break down this formula.  =VLOOKUP() is a formula that searches for a value in the first column of a table array and returns a value in the same row from another column in the table array.  The formula above tells Excel the following:

1.  Search for the Random number generated in the “Rand” column
2.  Search in the first column of the VlookupTable
3.  Return the value in the second column of VlookupTable as the “Result”

The cool thing about =VLOOKUP() is that it looks for a value that is equal to exactly what you are looking for and if it can’t find it, it returns the closest match so long as it is not greater than the value it is searching for.

So for Trial #1 above, Excel searches for the value .4346237 in the VLookupTable and returns the Value 4.  This is exactly what we wanted – remember 4 points is associated with a probability greater than or equal to .4186 and less than .5584.

Make sure you have enough iterations on your spreadsheet.  For this example, I made 10,000 iterations of the above test.

Step 3 – Create a Normal Random Variable for your Random Data

The next step is to create a Normal Random Variable for the Random Data.  In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve.  Excel will simulate a normal random variable with a Mean and Standard Deviation that you calculate by using the formula:

 =NORMINV (rand(), Mean, StandardDeviation)

First, calculate the Mean and Standard Deviation of the “Rand” column using the =AVERAGE() and =STDEV() functions in excel.  For my random distribution, I found the following:

Mean
4.61
St Dev
3.03

Second, for each random Trial, calculate a Normal Random Variable using the =NORMINV() function and the average and standard deviation you just calculated.  You will then have a table that looks like this (and goes down 10,000 rows):

Trial
Rand
Result
Normal Rv
1
0.4346237
4
4.111805545
2
0.4944565
4
4.569135141
3
0.0626749
0
-0.039411122
4
0.6907086
6
6.121969224
5
0.1438746
2
1.385592354
6
0.5635747
5
5.096909138
7
0.8293976
8
7.499332975
8
0.9459425
10
9.486621837
9
0.7881337
8
7.038646925
10
0.8844642
10
8.245218044

A little bit of explanation on the Normal Rv column – for Trial No. 1, you are telling Excel to find the 43rd percentile of a normal random variable with a with an average (mean) of 4.61 and a standard deviation of 3.03.

Step 4 – Get Descriptive Statistics and Frequency Distributions for the Random Data

Now the fun part – we are going to compare the original raw data with the random test results to see if our sample data.

First, run descriptive statistics and Frequency or Histogram on the random test data.

We get this Frequency distribution for the random trials:

Points
Frequency
Individual %
Cumulative %
0
620
6.20%
6.20%
1
517
5.17%
11.37%
2
746
7.46%
18.83%
3
1036
10.36%
29.19%
4
1283
12.83%
42.02%
5
1339
13.39%
55.41%
6
1250
12.50%
67.91%
7
1078
10.78%
78.69%
8
809
8.09%
86.78%
9
605
6.05%
92.83%
10
338
3.38%
96.21%
More
379
3.79%
100.00%

Here is the Histogram chart:

Here are the random data descriptive statistics compared with the sample data’s descriptive statistics:


Random Data
Sample Data
Mean
4.62
4.65
Standard Error
0.03
0.21
Median
4.56
3.75
Mode
#N/A
0.00
Standard Deviation
3.02
3.70
Sample Variance
9.11
13.70
Kurtosis
0.09
0.36
Skewness
0.03
0.98
Range
24.93
15.75
Minimum
-7.35
0.00
Maximum
17.57
15.75
Sum
46195.60
1422.75
Count
10000.00
306.00

Here are the two Histogram charts compared:

Step 5 – Draw conclusions from the data

The Mean or average in both cases was close 4.62 for the random data and 4.65 for the sample data.  This tells us that we will average about 4.6 points if we trade this system enough times.

The Standard Deviation was much higher for the Sample Data – which was to be expected because in the real world, our returns will vary quite a bit more.  For the Sample Data, we can conclude that about 68% of our trades will be +/- 3.06 points around the average of 4.65.

Looking at the Histogram, we can also see that the Sample Data was skewed to the left – meaning that it is more likely to get a return of less than 4 points.  In fact, we have a 55.81% chance of getting a return less than 4 points in our sample data.  In the randomized data, this was only likely 42% of the time.














No comments:

Post a Comment