Statistical Significance

**Gronbog** · 06-29-2016, 02:49 PM

There appears to be a lot of confusion about and misunderstanding of the meaning of statistical significance in some recent threads when discussing the value of simulated results vs the results of actual play.

The software and math guys (me among them) are correctly saying that, when simulating, we sometimes need to simulate billions of rounds in order to arrive at statistically valid numerical results. At the same time, some of those trying to evaluate the results of their actual play for the purposes of finding holes in their game or deciding whether to switch systems, are throwing up their hands and saying, "What's the point? I'll never play billions of rounds within my lifetime?"

Is there a contradiction here? If not, then how can the two worlds be reconciled? This will be my attempt to try and clear things up.

The main concept to grasp is that all observed results are statistically significant to within some margin of error. The more samples you observe, the smaller that margin of error becomes. Obviously, below some threshold, the number of samples can be insignificant in both practical and mathematical terms. You may have seen this referred to as the Standard Error.

Sometimes concepts like this are easiest to grasp when considering the ridiculous extremes. And I do mean ridiculous! For example, it should be easy to see that for flipping a coin once, the observed result will either be 100% heads or 100% tails and so it will differ from the known result of 50% by 50%. Now if we imagine being able to toss that coin an infinite number of times, then the result will become infinitely close to 50% and the standard error will become infinitely close to zero. Notice that I didn't say that the result will become 50% and the standard error will become zero, but they will become close to within some minuscule range (infinitely close to zero) with some high probability (infinitely close to 100%)

Of course, we have no use for these extreme results. We live in the finite world. So how many samples is enough? Well, it depends on what you are observing and what you want to use the results for.

For a simple process like tossing a coin, it turns out that the standard error is 0.5/sqrt(samples), which converges fairly quickly. After only 10,000 tosses, the standard error is 0.005 or 0.5%, which means that you have a 99.7% chance the result you have observed is within +/- 3 standard errors, or within +/-1.5% of the true result. That's a 3% margin of error. Good enough for you? Maybe (but I hope not). Good enough for a simulation who's goal is to determine the true result to within 2 decimal places? Absolutely not.

For blackjack, a typically used standard deviation for the EV of a single round is 1.1. So the standard error is 1.1/sqrt(rounds). After 10,000 rounds you have a 99.7% chance of being able to calculate the true EV to within +/-3.3%. That's a 6.6% margin of error!! After a million rounds you're down to a 0.66% margin of error. Maybe good enough for you to estimate your expected win rate to within a few dollars. Certainly not small enough to be able to declare that system X has a 0.57% EV and system Y has a 0.64% EV and therefore system Y is superior, and that's after 1 million rounds.

And that's the point of it. We use simulation as a method of calculating specific numbers which have true (unknown) values, but which are too difficult to calculate directly, and we want those numbers to be within a certain level of accuracy. The more rounds we simulate, the closer our numbers will be to the true (unknown) result. We can then use those numbers for making other calculations or for comparing systems. Some simulations, like the ones done in order to compute SCORE are accumulating many different statistics, some of which are for events which are more rare than others and so billions of iterations are needed in order to reduce the standard error for those rare events to an acceptable size.

Now, does this mean you need to play billions of live rounds in order to benefit from the knowledge obtained via the simulation? The answer is "No". Unlike the simulator, your goal is not to achieve the precise statistical result that the game offers. Your goal is simply to extract the money at a rate close to that predicted by the simulator. If your actual results are different by a few decimal points, then you will still be making money. If fact, if your results are within one standard deviation of the predicted results after N0 rounds, then you will still be making money. This level of accuracy is attainable by playing a number of rounds which is certainly achievable within your playing years.

In summary:

Multi-billion round simulations are needed in order to get the precise statistically significant numbers you need to make informed decisions about your play. If you are making decisions based on the results of short live play experiments of 10,000 rounds, then you are making a mistake. There is a significant chance that the inferior system could out perform the superior one over the course of the experiment due to excessive margin for error. This is especially true of you are only making tweaks to an existing system, as opposed to comparing different systems.
You don't need to achieve the precise results predicted by the simulator. You only need to achieve results which are somewhat close in order to get the money. This can be done within a much smaller number of rounds played which is easily attainable.
When comparing systems and other decisions, use the simulation results to make the decision. These will tell you which has the higher potential. If the difference in potential is large enough, then you can play enough rounds to enjoy the benefit.

I hope this helps!

**Bodarc** · 06-29-2016, 03:01 PM

Thanks Gronbog

**LoneWoLF** · 06-29-2016, 03:27 PM

No one was arguing any of that, at least I wasnt. I know there's a difference between knowing if something is statistically significant by simulating a billion rounds and knowing you dont need a billion rounds to reach that expected result. Of course you dont need a billion rounds to reach your expected target. My point is, these people advocating side counts are completely delusional, they think their system is some type of godly system where in fact they're just hitting positive variance just like they would with any other count. Side counts in a shoe game? There just isnt a high enough frequency sample to have a surplus or deficit of any card value to make side counting that big of a benefit in the long run. But of course they use their anecdotal data and claim they're right without proving to us with a billion round simulation because they also claim there's no simulator that can do what their super non-linear system does LOL. Instead they go side count 100's of cards and hit their EV in 10 minutes with their SCORE OF 500.

I just wish I could do a case study of just playing HiLo with full indices while occasionally counting multiple tables simultaneously when the opportunity arises VS = their super godly count only counting one table at all times. We would play only SHOE games for 1800 hours a year(36 hours a week) and see who comes out on top. Everything stays constant. We play same games, same penetration, same everything and we'll see who comes out on top. I would blow them out of the water. Of course one of the super side counters might claim why does it have to be mutually exclusive, why cant the super side counter also count two tables at once? And my answer will be, I would like to see them try LOL.

**Norm** · 06-29-2016, 03:34 PM

Originally Posted by LoneWoLF

My point is, these people advocating side counts are completely delusional

You seem to be lumping a huge number of people using numerous different methodologies into one group. Then, you come to a conclusion that you think covers all of them.

**LoneWoLF** · 06-29-2016, 04:35 PM

Originally Posted by Norm

You seem to be lumping a huge number of people using numerous different methodologies into one group. Then, you come to a conclusion that you think covers all of them.

Funny you say that, cause the other side of the coin is the same way. The side counters clump all simple practitioners in the same class, but you dont say anything about them doing that.

**Norm** · 06-29-2016, 05:01 PM

Originally Posted by LoneWoLF

Funny you say that, cause the other side of the coin is the same way. The side counters clump all simple practitioners in the same class, but you dont say anything about them doing that.

Once again, you use a broad brush. The 'side counters,' as a group, do no such thing. There are a vast numbers of methods that use side counts. Grouping them all together and assigning some attribute to them makes no sense. Personally, I dislike using side counts. But, there are situations where they possess enormous utility. For example, certain side bets. When you decide to make such labels, you close off possible opportunities.

The point of this forum is to talk about modern AP methodologies. If you want the site to be limited to how to use HiLo; you're in the wrong site. As conditions worsen, we need to look at new opportunities. Obviously, some will not be useful. But, we don't stop opinions. We stop personal attacks, religion, and politics.

**marriedputter** · 06-29-2016, 10:53 PM

Originally Posted by Norm

As conditions worsen, we need to look at new opportunities.

This.

Once I started side counting aces, I never went back. Using a non-typical count can score points for longevity as well (though they'll get you sooner or later). I don't fault those that use "plain-old high-lo" though. Are you satisfied with your count? Yes? Then that's all that matters. It's your money and only you should decide what's worth it. If your count is only getting you $5 an hour and you're happy with that, then that's all that matters.

**LoneWoLF** · 06-29-2016, 11:22 PM

Originally Posted by marriedputter

This.

Once I started side counting aces, I never went back. Using a non-typical count can score points for longevity as well (though they'll get you sooner or later). I don't fault those that use "plain-old high-lo" though. Are you satisfied with your count? Yes? Then that's all that matters. It's your money and only you should decide what's worth it. If your count is only getting you $5 an hour and you're happy with that, then that's all that matters.

How do you know that wasn't just positive variance and any count would have gotten what you're getting? At the end of the day, in a shoe game, side counts are pretty much worthless. Are you playing primarily shoes or pitch? If it's pitch, disregard my statement

**marriedputter** · 06-29-2016, 11:31 PM

Originally Posted by LoneWoLF

How do you know that wasn't just positive variance and any count would have gotten what you're getting? At the end of the day, in a shoe game, side counts are pretty much worthless. Are you playing primarily shoes or pitch? If it's pitch, disregard my statement

It is indeed pitch. I use Zen on 6D and Hi-Opt II on 2D. I am open to side counting aces on 6D, but if I do, it would have to be something that I would implement gradually.

As to variance, I reside in the camp that philosophizes that even a partial percent gain is worthwhile in the long run. The rules of the game may continue to be diminished to the point to where Hi-Lo just won't be worth it anymore for anybody. If that happens, I want to be one of the "last-men-standing" with my more powerful count.

**Three** · 06-30-2016, 07:00 AM

Originally Posted by LoneWoLF

How do you know that wasn't just positive variance and any count would have gotten what you're getting? At the end of the day, in a shoe game, side counts are pretty much worthless. Are you playing primarily shoes or pitch? If it's pitch, disregard my statement

Your count affects your ride to the long run. The long run expectations aren't that different between counts but what happens along the way to get to the long run is. Unless you are counting and not paying attention to trends and swings the difference should be apparent. The severity, frequency and length of downswings will change by the approach you use. Every approach has some short term downswings. It is part of the deal. How likely they are to stack up on each other or be quickly erased by big wins on each side, as I like to say bookending the big loss, will depend on your count. How severe they are in general depends on your count. How often they tend to happen depends on your count. If you are comfortable with your ride to the long run then stick with what you use. If not you may want to use some more advanced counting techniques. Like MP said side counting aces changed his ride for him. He noticed the difference in both heat and results.

**Three** · 06-29-2016, 04:37 PM

Great post Gron. Since I was talking about playing decisions in my last post. We consider 1,000,000 data points to be enough for significant results but for playing decisions that depends heavily on the correlation of the count to the EoRs of the play. For strongly correlated plays don't need as much data to converge and poorly correlated plays may need far more than 1,000,000 data points to even start to converge. There is no set number for significance but the way the results behave show you when the are becoming predictable and when they are randomly scattered. I like a graphical representations like Norm's in the previous post's link. Then I can use the raw data as backup for getting specific.

**mofungoo** · 06-30-2016, 08:38 AM

Originally Posted by Norm

Originally Posted by LoneWoLF

My point is, these people advocating side counts are completely delusional

Originally Posted by Norm

You seem to be lumping a huge number of people using numerous different methodologies into one group. Then, you come to a conclusion that you think covers all of them.

On this site there are 3 or 4 people who advocate side counting cards other than aces with ace neutral counts. Not a huge number. I don't think people who advocate side counting are delusional; there is merit to that approach under certain conditions. What I do know is that some of these advocates don't seem to understand diminishing returns when side counting is applied to the wrong type of game, i.e. 6D and 8D shoe games. Flash is the one who seems to understand diminishing returns, stating that he employs a side count of 7s in 2D games only.

Once again, we need to see proof of how well these custom counts work under various conditions of decks in play and penetration. What happened in a year's worth of play is statistically irrelevant. And an improvement with a particular decision that relies on a deck composition which occurs every 200 -300 hours of play is equally irrelevant, in the long run.

**mofungoo** · 06-30-2016, 11:56 AM

Of course, I stated that I was referring to:

Originally Posted by mofungoo

side counting cards other than aces with ace neutral counts

meaning side counting other cards either counted as zero in a main count, or already counted with the main count.

Non-ace-reckoned counts require side counting the aces in some manner for betting accuracy, as your sims show.