There appears to be a lot of confusion about and misunderstanding of the meaning of statistical significance in some recent threads when discussing the value of simulated results vs the results of actual play.

The software and math guys (me among them) are correctly saying that, when simulating, we sometimes need to simulate billions of rounds in order to arrive at statistically valid numerical results. At the same time, some of those trying to evaluate the results of their actual play for the purposes of finding holes in their game or deciding whether to switch systems, are throwing up their hands and saying, "What's the point? I'll never play billions of rounds within my lifetime?"

Is there a contradiction here? If not, then how can the two worlds be reconciled? This will be my attempt to try and clear things up.

The main concept to grasp is that all observed results are statistically significant to within some margin of error. The more samples you observe, the smaller that margin of error becomes. Obviously, below some threshold, the number of samples can be insignificant in both practical and mathematical terms. You may have seen this referred to as the Standard Error.

Sometimes concepts like this are easiest to grasp when considering the ridiculous extremes. And I do mean ridiculous! For example, it should be easy to see that for flipping a coin once, the observed result will either be 100% heads or 100% tails and so it will differ from the known result of 50% by 50%. Now if we imagine being able to toss that coin an infinite number of times, then the result will become infinitely close to 50% and the standard error will become infinitely close to zero. Notice that I didn't say that the result will become 50% and the standard error will become zero, but they will become close to within some minuscule range (infinitely close to zero) with some high probability (infinitely close to 100%)

Of course, we have no use for these extreme results. We live in the finite world. So how many samples is enough? Well, it depends on what you are observing and what you want to use the results for.

For a simple process like tossing a coin, it turns out that the standard error is 0.5/sqrt(samples), which converges fairly quickly. After only 10,000 tosses, the standard error is 0.005 or 0.5%, which means that you have a 99.7% chance the result you have observed is within +/- 3 standard errors, or within +/-1.5% of the true result. That's a 3% margin of error. Good enough for you? Maybe (but I hope not). Good enough for a simulation who's goal is to determine the true result to within 2 decimal places? Absolutely not.

For blackjack, a typically used standard deviation for the EV of a single round is 1.1. So the standard error is 1.1/sqrt(rounds). After 10,000 rounds you have a 99.7% chance of being able to calculate the true EV to within +/-3.3%. That's a 6.6% margin of error!! After a million rounds you're down to a 0.66% margin of error. Maybe good enough for you to estimate your expected win rate to within a few dollars. Certainly not small enough to be able to declare that system X has a 0.57% EV and system Y has a 0.64% EV and therefore system Y is superior, and that's after 1 million rounds.

And that's the point of it. We use simulation as a method of calculating specific numbers which have true (unknown) values, but which are too difficult to calculate directly, and we want those numbers to be within a certain level of accuracy. The more rounds we simulate, the closer our numbers will be to the true (unknown) result. We can then use those numbers for making other calculations or for comparing systems. Some simulations, like the ones done in order to compute SCORE are accumulating many different statistics, some of which are for events which are more rare than others and so billions of iterations are needed in order to reduce the standard error for those rare events to an acceptable size.

Now, does this mean you need to play billions of live rounds in order to benefit from the knowledge obtained via the simulation? The answer is "No". Unlike the simulator, your goal is not to achieve the precise statistical result that the game offers. Your goal is simply to extract the money at a rate close to that predicted by the simulator. If your actual results are different by a few decimal points, then you will still be making money. If fact, if your results are within one standard deviation of the predicted results after N0 rounds, then you will still be making money. This level of accuracy is attainable by playing a number of rounds which is certainly achievable within your playing years.

In summary:
  • Multi-billion round simulations are needed in order to get the precise statistically significant numbers you need to make informed decisions about your play. If you are making decisions based on the results of short live play experiments of 10,000 rounds, then you are making a mistake. There is a significant chance that the inferior system could out perform the superior one over the course of the experiment due to excessive margin for error. This is especially true of you are only making tweaks to an existing system, as opposed to comparing different systems.
  • You don't need to achieve the precise results predicted by the simulator. You only need to achieve results which are somewhat close in order to get the money. This can be done within a much smaller number of rounds played which is easily attainable.
  • When comparing systems and other decisions, use the simulation results to make the decision. These will tell you which has the higher potential. If the difference in potential is large enough, then you can play enough rounds to enjoy the benefit.


I hope this helps!