Order-of-Magnitude Probabilities


Representing Probabilities

Order-of-Magnitude values are excellent for representing both low probabilities and high probabilities, and make several common types of calculations easily workable in the head alone.

For representing low probabilities, we can just directly use the value. For a one-in-a-million chance, that probability is just \(E^-6\). Using this scheme, we can easily represent very-rare probabilities (see Wikipedia - Orders of magnitude (probability) for more examples):

Event Probability
Winning the lottery with one ticket
(UK national lottery, 2009)
Yellowstone erupting in any given year \(E^-6\)
Being dealt a straight flush in poker \(E^-5\)
Being dealt four of a kind in poker \(E^-4\)
Sharing a birthday with any one random person \(E^-3\)
Giving birth to twins \(E^-2\)
Rolling a natural 20 on a D20. \(E^-1\)
Getting heads on a fair coin toss \(E0\)

How do we represent high probabilities though? At first glance they'd all be \(E0\)​​​​ (\(99\% \sim E0\)​​​​, \(99.9\% \sim E0\)​​​​, etc.), which wouldn't be very helpful. We can deal with this by remembering that very-likely events have a very small chances of not happening. So we represent high probabilities as the complements of low probabilities. If my server has 99.999% reliability, then we can say that probability of it being up at any time is \(\overline{E{}^-5}\)​​​​ (i.e. the complement of \(E^-5\) or \(1 - E^-5\) to abuse notation a little). Using this syntax, we don't have to worry about always reversing the event we're talking about e.g. we don't have to talk about the probability of the server failing when we actually want to talk about the probability of it being available.

Iterating Probabilities

When given probabilities, commonly we will want to find out what happens when we iterate them. If I know going skydiving has an \(E^-5\)​ risk of death, how about going every week for a year? Or if there's an \(\overline{E^-3}\)​ chance of rain each day in Seattle, what is the probability I'm going to go without sun for the next month?

Notice these are two different types of questions. The first asks what the probability is of dying at any point during the year--even though we could calculate it, the probability of dying every week is not a particularly useful value. The second asks what the probability of it raining every day is. So we actually have two types of iteration we might want to calculate for. And naturally these types of iteration will lead to different answers: if I ask how likely something is to happen at every iteration, I expect that to be less than the given probability. If I ask how likely something is to happen at any iteration, I expect that probability to be more.

Since we're representing low probabilities and high probabilities in two separate ways, and because we have two different types of iteration we want to calculate, we have four cases to consider. Thankfully though, due to some symmetry we'll see, we only end up with two simple types of calculations.

Skip to Methods Summary to just see the rules.

To start, let's take the case of a 'low' probability event and finding out how likely it is to occur on every iteration. Let's say we're playing the lottery (with an \(E^-7\) chance of winning), but we're really greedy so we want to win every time we play.

We know from general probability that for two independent events the probability of both happening is the two probabilities multiplied together. So for two iterations, we have \(E^-7 \times E^-7 \sim E(^-7-7) \sim E^-14\) to win twice in a row. In general, with \(N\) iterations, we have \((E^-7)^N \sim E(^-7 \times N)\), which is just due to the laws of exponentiation since \((10^{^-7})^N = 10^{^-7\times N}\). So we probably shouldn't get our hopes up.

Now what about if we reverse both parts of the question, and we want to know what the probability of a 'high' probability event happening at any point in an iteration? What if we can make a server \(\overline{E^-2}\)​​​ (i.e. it is up 99% of the time), but we want at least an \(\overline{E^-5}\)​​​ one like mentioned before? One way to do this is to set up multiple servers and make it so that if one fails, then the next one takes its place. That way, at some random time, the service I'm running on them will be available if any of the servers are up. So how many servers do we need running concurrently in order to get \(\overline{E^-5}\)​​​ or better?

Well we can just think about what the odds of failure would be. The odds of any individual failure are \(E^-2\)​, and the odds of all of them failing simultaneously are \(E(^-2 \times N)\)​ for \(N\)​ servers, just like we saw above. So for three servers we get a total failure probability of \(E^-6\)​, which means our uptime probability is \(\overline{E^-6}\)​.

So we see that this question (high-probability, any iteration) is just a mirror image of the first case (low-probability, every iteration), and we end up doing the exact same calculation.

We have two cases left to figure out how to calculate, so we'll start with calculating a low probability event occurring any iteration:

Let's go back to lottery case, but with the more reasonable stipulation that we want to win on any iteration of playing the game. There isn't a rule to immediately solve this, so to help, let's visualize the probability space, starting with one iteration:


Where \(x\) is our probability of winning the lottery (greatly exaggerated for visual clarity). And now again with two iterations:


The orange square is the probability of winning both times, and the green square is our probability of losing both times. So the value we're interested in is the total area outside of that green square. Since the area of the entire square is \(1\times 1 = 1\), the probability of winning either time is \(1-(1-x)^2\).

The pattern is the same for more iterations: We calculate the probability of losing every time and take the complement of that. So if we play the lottery \(N\) times, the probability of winning any of those times is \(1-(1-x)^N\).

But it's not immediately clear how you would calculate this when using order-of-magnitude values. The trick is that when \(x\) is small (i.e. we're representing it as a 'low' probability with order-of-magnitude values) then \(1-(1-x)^N\) can be approximated by \(x \times N\). At the end of the post I'll try to justify this approximation, but for now it makes our job very easy:

If we play on the order of 10 times (\(E1\)), then our probability of winning any of those times is \(E^-7 \times E1 \sim E^-6\). If I play on the order of \(E3\) times, then it would be \(E^-4\).

The only thing to keep in mind is that this breaks down when the number of iterations greatly exceeds the unlikelihood of the event i.e. if I go crazy and somehow play the lottery \(E8\) times, I can't exceed a probability of 1, so the order-of-magnitude maxes out at \(E0\).

Our last case is that of a high-probability event occurring every iteration.

Let's say I'm a long-haul trucker and I want to know the odds that I'll make it to the end of my career. The odds of surviving on any given day as a truck driver are \(\overline{E^-7}\)1, so how would we calculate the odds of surviving through my career? Let's look at our probability space again, but this time our event is the larger portion of the space \(1 - x\)​.


And so with two iterations, we're interested in the large square, i.e. \((1 - x)^2\):


But, again, the way we're representing high probabilities is by their small complements, so our \(E^-7\) is actually \(x\) in the diagram. So how do we get \((1-x)^2\) from \(x\)? Well if we know the area outside the big square, then we can \((1-x)^2\) by taking the complement. Recognizing that by representing a probablity via \(E^C\) we exactly are taking the complement, then the actual value we use is the area outside the square, i.e. \(1 - (1 - x)^2\) as we did above.

So again we get to use the approximation \(1 - (1 - x)^N \approx x \times N\)​. Back to our example, if I work for 30 years \(\sim E4\)​ days, then my probability of living through my career is \(\overline{E \left( {}^-7 + 4 \right)} \sim \overline{E^-3}\)​, which is not terrible odds.

As before, if we take this to an extreme, our probability maxes out at \(E0\). Suppose I got life extension and really wanted to commit to the truck-driving career, then around \(E4\) years I'd reach an \(E^C0\) chance of surviving. It's possible to come up with rules for calculating the actual probability beyond that point, but the numbers quickly become absurd and useless for mental arithmetic so we'll leave that alone.

Methods Summary

That all is a lot of description which can be summed up in its tersest form as:

Methods table


Methods legend

So \(P(\forall i \in [1..N]: e_i)\) is the probability of the event occurring every time, while \(P(\exists i \in [1..N]: e_i)\) is the probability of the event occurring any time.

Addendum: Geometric Argument

I claimed that \(1-(1-x)^N\) can be approximated by \(x \times N\), but I don't have a real proof of that; so right now I'll just sketch an argument to get some intuition.

The key is again to use a geometric approach i.e. we recognize that the value we want is the area of a region of probability space and then examine how that region changes as we increase iterations N in order to determine how the probability relates to N

Suppose we're approximating \(1 - (1 - x)^N\), where x is 'small'. It will still be useful to think of these regions as probabilities where the overall area is 1 and each iteration adds a dimension.


The idea is to get an intuition for how much of the space (which always has a total 'volume' of 1 no matter how many dimensions) the green region takes up as a function of the number of iterations (dimensions, geometrically).


Here we see the green portion is \((1 - x)^2\), but before we start drawing conclusions, let's look at 3 iterations:


Here we can see that the volume outside the green is mostly taken up by the slabs at the side and top. There's a more concrete way to say that: when numbers are small, we can ignore their higher powers in approximation. The corner box's volume is \(x^3\), and the edge boxes are \(x^2 \times (1-x)\), so we can safely ignore those. The side slabs have volume \((1-x)^2 \times x = (1 - 2x + x^2) \times x \approx x\), so we can see more directly that they are the dominant contributors to volume outside our green region.

So what about more than three iterations? The situation is more difficult to represent visually, but behaves similarly. For N iterations, we will have N high dimension slabs on the sides of our hypercube that each have an approximate volume of x and the other components will be negligible, so the total volume outside our survival case is \(\approx N \times x\). Of course this breaks down when \(N \gg 1/x\) and is just \(\approx 1\).

  1. 6371 truck driver deaths in 2019: injuryfacts

    3,592,000 truck drivers employed in 2017: census

    \(371 / (3592000 \times 365) = 2.83E^-7 \sim E^-7\)