bayes – Ben Lowery @ STOR-i

The Monty Hall problem and its generalisations: Part 2

ben-lowery — Fri, 01 Apr 2022 13:01:44 +0000

In the previous blog post we looked at the infamous Monty Hall problem and its controversial (but correct) solution. The main problem has been talk of countless , , and ; providing a nice introduction to probability and Bayes theorem. And while it is fun to rehash the same story, it might be worth looking at how to broaden the problem and see how the same core principles can be applied to a more obtuse setting. With this we can explore some of these re-formulations, starting with expanding the number of doors in our game.

Monty Hall: Live from the !

Let’s envision the following fictitious scenario; in which after the success of his three door final showdown, Monty and his team have been gifted a bigger budget to make a more elaborate show, and a new, possibly infinitely big studio. Here, Monty utilises this increased budget to order his producers to purchase more doors. Now that he possesses a studio filled to the brim of disused doors, Monty again places one car behind a door and keeps count of where it is placed. While a flood of goats trundle in and hide behind the rest, he asks a contestant, now slightly more intimidated than their predecessors (see below gif), to pick a door. The contestant hesitantly chooses, giving way to our host opening every door but the contestants and one final door.

A contestant on the reboot of let’s make a deal who’s confidence clearly indicates that they read the last blog post on how to win.

Now we again pose the question, stick or switch? Given the information we attained from the original Monty Hall situation, it makes sense to have an intuitive guess that switching will be in the contestants best interest. And we can test this again by using Bayes’ Theorem and some basic Probability theory.

Like in the original incarnation, we define events and variables for the new version. Instead of 3 doors, we now possess $d$ doors. Each of these doors are assigned the event it may possess the prize behind it, we can define these events as $D_1,...,D_d$ . While also allowing $G$ to be the event we open all but doors 1 and $d$ to reveal a goat. So with this in mind, we can formulate the following Bayes equation for any door in particular, say $i$ :

$\mathbb{P}[D_i|G]=\frac{\mathbb{P}[D_i]\mathbb{P}[G|D_i]}{\mathbb{P}[G]}.$

The individual probabilities for the right hand side are calculated as:

$\mathbb{P}[D_1]=...=\mathbb{P}[D_d]=1/d \\ \mathbb{P}[G|D_1]=\frac{1}{d-1} \\ \textrm{(As we are just restricted to opening every other door if the prize is here)} \\ \mathbb{P}[G|D_d]=1 \\ \textrm{(As we are just restricted to opening every other door if the prize is here)} \\ \mathbb{P}[G|D_2]=...=\mathbb{P}[G|D_{d-1}]=0 \\ \textrm{(If the prize lies in all these doors we want to open, we clearly can’t open them).}$

Since $D_1$ to $D_d$ cannot occur simultaneously, then we can use some more simple statistical properties, specifically , to express the probability of a goat behind all but doors 1 and $d$ as follows in this slightly long, but hopefully intuitive derivation:

$\mathbb{P}[G]=\sum_{i=1}^d \mathbb{P}[D_i]\mathbb{P}[G|D_i]\\= \mathbb{P}[D_1]\mathbb{P}[G|D_1]+ \mathbb{P}[D_d]\mathbb{P}[G|D_d]+\sum_{i=2}^{d-1}\mathbb{P}[D_i]\mathbb{P}[G|D_i]\\ =\frac{1}{d}\cdot \frac{1}{d-1}+\frac{1}{d}\cdot 1=\frac{1}{d}\left(\frac{1}{d-1}+1\right)=\frac{1}{d}\left(\frac{d}{d-1}\right).$

Remember we opened all but door 1 and $d$ , so all doors in-between will have a probability 0 of having the car behind it. Thus, we substitute the above derivations back into Bayes Theorem equations, but only for doors 1 and $d$ are,

$\mathbb{P}[D_1|G]=\frac{1/d\cdot (1/d-1)}{1/d\cdot d/(d-1)}=\frac{1}{d} \\ \mathbb{P}[D_d|G]=\frac{1/d\cdot 1}{1/d\cdot d/(d-1)}=\frac{d-1}{d}$

While tedious, this derivation is pivotal in allowing a generalisation of the problem. Generalisations are crucial in mathematics, allowing us to expand our problem from an initial set of constrained numbers (like only 3 available doors) to as many doors as we want, and all we need to do is plug that number into $d$ .

To test this – and provide a little sanity check – we can hark back to our original Monty Hall Problem and seeing that substituting 3 doors gives us probabilities of switching and staying as 2/3 and 1/3 respectively. So now we have a rather straightforward method to show, no matter how many doors we have, if we open all but one door and the original door, it is in our best interest to switch. In addition to this, although trivial to point out, we see that with more doors, our likelihood of winning when switching increases. For example, with $d=7$ doors, we should, by plugging our values in, attain the staying probability (i.e. door 1) as 1/7, and switching (to door $d$ ) as 6/7.

To see this in practice let’s run some simulations for 3, 5, 7 and 9 doors after carrying out Monty Hall’s deal 3000 times.

We see the winning chance from switching increases as the doors increase

which is pretty good.

Winning isn’t everything

Our second generalisation into Monty Hall’s problem is one in which we are looking to try flip the odds back into the favour of the host. Given the generosity of winning when we expand the problem to $d$ doors, it is now worth seeing if limiting the number of doors Monty opens can make it more – or less – likely for the contestant to win when switching. If we think about this in a logical manner, it should be the case that now we have the option of which door we can switch to, we are less likely to get the prize than in the previous scenario if we switch. But it is worth calculating how much of a detriment this new rule is to our contestant. And analysing how drastically our odds can change by opening less and less doors.

This time we will take a scheduled commercial break ~~out of laziness~~ to relieve ourselves of any more probability equations and focus solely on numerical computations. Consider an example where, given $d$ doors, we open $k$ of these. More specifically let’s look at the case of having 10 doors and we open a subset of these, analysing the number of times we win if we switch, we win if we stay, and the new third case that we don’t win if we stayed or switched. This is seen in the following graph.

As we can see in the choice between 10 doors, when opening 8 (which is the max number of doors we can open) and opening 6 (in which we have a choice as to what we can switch to), there exists a significant dip in the probability of winning when switching, with it then being more likely to not win the game whatever we do. This is due to the truly random choice we now have with the selection of the door we might want to switch to. Despite all these changes, the chance of switching consistently gives better odds than staying. We can see this in the generalisation of opening $k$ from a set of $d$ doors. This is given as the following equation:

$\mathbb{P}[\textrm{Winning when switching}]=\left(\frac{d-1}{d}\right)\cdot\left(\frac{1}{d-k-1}\right)$

For those in dire need of a Bayesian derivation (as I normally am), one can refer . As a little test, we can take $d=10$ and $k=2$ and see that the probability of winning when switching as $\approx 0.129$ , which leaves our simulation pretty damn close to what we want.

And with this we can conclude our investigation into suspected goat farmer Monty Hall and his mystery doors. But was this investigation as concrete as the numbers suggest?

Statistical stage fright

In these two blogs we’ve seen how applying some mathematical rigour allows us to understand, dissect and create advantages in a game of seemingly random luck. With this being said, as often is the case of applying Mathematics to the real world, our logical reasoning may still not be perfect, nor reveal the true solution to the problem. since arguments could be made in that randomising the choices of contestants in the simulation and using conditional probability detracts from the human element in the game. That in which the host, the atmosphere and the audience play a crucial role in the dilemma posed to the contestant, perhaps leading to a bias in the options available. This is something that probability and random simulations simply cannot account for. Hence it could even be contested that in reality, based on the host’s hints and approach towards the contestant, the probability of finding the winning car can range from 1/2 to 1. A paper on this dilemma of the human element can be see .

Source

The Monty Hall problem and its generalisations: Part 1

ben-lowery — Wed, 30 Mar 2022 15:50:40 +0000

In this two part series of blog posts, we will explore how a simple game of seemingly random choice, inspired by an innocuous 60’s TV show can result in an intriguing investigation into both the simplicity and deceitful nature of probability.

Pick a door, any door

In a 1975 letter to the American Statistician, Steve Selvin posed a problem loosely based off the 1960’s American TV show Let’s Make a Deal. The game consisted of three doors, two of which had goats behind them, and a third door containing a dream car. The host of the show, Monty Hall, asks the contestant to select a door. Monty then chooses a remaining door to reveal a goat. He then asks the contestant if they would like to switch to the one remaining unrevealed door, or stay with their initial choice.

Years later in 1990, Marilyn vos Savant, who rose to fame for her supposedly , was posed a similar question in her “Ask Marilyn” column for Parade magazine (!). The responses of both Selvin and vos Savant respectively, postulated that it will be in the best interest for
the contestant to switch doors.

This came as a rather counterintuitive conclusion to many as it may be initially thought that there is no difference in staying or switching, there is a still a 50% chance the car lies behind either remaining door. This idea was widely shared amongst the public, with Vos Savant. How could it be a better option to switch doors? The criticism ranged from soccer moms, to amateur Mathematicians, and even those possessing PhD’s in Maths based disciplines.

Historical re-enactment of Marilyn Vos Savant response to critics.

However the two savvy protagonists of this story were correct in their rational… and more importantly, had the maths to prove it.

The host knows all

To understand why it is in our best interest to switch, let’s pose the problem in a more formal sense by having a run through. We have three doors that can be labelled 1,2,3. Our esteemed host, Monty Hall, asks the contestant to pick a door. Say they pick door 1. Monty, who knows what’s behind each door, opens door 2 and reveals a goat. He then poses the question ”would you like to stick with your choice or switch to door three?”. To help quell the contestants dilemma, we can first think about the odds of initially picking the correct door. With three doors, the contestant has a 1/3 chance of picking correct straight away. Then if we remove a door, does that change our odds when switching?

A highly detailed drawing of the monty hall problem.

One naive way to think about it is that, if each door has equal probability, surely switching makes no difference? Since removing a door will leave us with two options, we must have a 1/2 chance of winning either way? The problem with this approach is we do not account for the host knowing what is behind each door. Given that the first door possesses a probability of 1/3 of containing our Car. The other two doors must attain a 2/3 probability it lies behind either one; and behind one of these doors, must lie a
goat. Since Monty knows that, in this scenario, door 2 has a goat, this door is revealed and we are left with the option of switching to door 3. Yet we still have a 2/3 probability that the Car doesn’t lie behind the first door, so this probability carries over to represent just door 3. Hence we have probabilities of 1/ 3 for door 1, 0 for door 2, and 2/3 for door 3. Concluding that switching will be the correct decision to this dilemma.

Our newest contestant, Thomas Bayes

Wordy anecdotes are all well and good, but can we consolidate our understanding with mathematical rigour? Using some Bayesian statistics this is indeed possible (an excellent book on the topic for newcomers to the area can be ).

We start with the idea that, given two events ( $A$ and $B$ ), we know that if we are given information for one of these events, $A$ say, then we can calculate the probability of event $B$ happening given the information of $A$ . We denote this $\mathbb{P}[B|A]$ . A special formula, known as Bayes’ theorem, then states the following with this information:

$\mathbb{P}[B|A]=\frac{\mathbb{P}[B]\mathbb{P}[A|B]}{\mathbb{P}[A]}$

We apply this to Monty Hall problem as follows. In this problem we have three doors 1,2,3 and we have the events that the prize lies between each respective door, denoted $D_1, D_2, D_3$ . Lets suppose we select door 1 initially, and our host Monty opens door 3 to show a goat. So it follows that $\mathbb{P}[D_3] =0$ .

Now let $G$ be the information that there is a goat behind door 3. We can use Bayes’ Theorem to formulate the following probabilities for finding the Car behind doors 1 or 2, given the information of $G$ , as:

$\mathbb{P}[D_1|G]=\frac{\mathbb{P}[D_1]\mathbb{P}[G|D_1]}{\mathbb{P}[G]}, \\ \\ \mathbb{P}[D_2|G]=\frac{\mathbb{P}[D_2]\mathbb{P}[G|D_2]}{\mathbb{P}[G]}$

We can calculate the probabilities of each quantity of the Right hand side intuitively as:

$\mathbb{P}[D_1]=\mathbb{P}[D_2]=1/3 \\ \mathbb{P}[G|D_1]=1/2 \\ \textrm{(As, if the prize was behind door 1, we can choose either 2 or 3 to open)} \\ \mathbb{P}[G|D_2]=1 \\ \textrm{(As we are just restricted to opening door 3 if the prize is behind 2)}\\ \mathbb{P}[G|D_3]=0 \\ \textrm{(If the prize lies here, we can’t open it)}$

Since $D_1, D_2, D_3$ cannot occur simultaneously, then, we can use some more simple statistical properties, specifically mutually exclusivity to express the probability of a goat behind door 3 as:

$\mathbb{P}[G]=\sum_{i=1}^3 \mathbb{P}[D_i]\mathbb{P}[G|D_i]=(1/3)(1/2)+(1/3)(1)+(1/3)(0)=1/2$

Placing all these values back into our Bayes equations yields:

$\mathbb{P}[D_1|G]=\frac{\mathbb{P}[D_1]\mathbb{P}[G|D_1]}{\mathbb{P}[G]}=\frac{(1/3)(1/2)}{1/2}=1/3, \\ \\ \mathbb{P}[D_2|G]=\frac{\mathbb{P}[D_2]\mathbb{P}[G|D_2]}{\mathbb{P}[G]}=\frac{(1/3)(1)}{1/2}=2/3.$

Hence, if we switch to door 2, it gives us a 2/3 chance of winning and staying with door 1 produces a 1/3 chance.

Numerical Simulation

Unfortunately I couldn’t think of a witty title for this section, but nevertheless it would still be interesting to explore a simulation and see if we can back up these theoretical results. To do this, we run the monty hall problem 1000 times, randomly generating where the car lies, which door we open and the probability of winning if we stay or switch. We should see that switching means we win about 66.6% of the time and the plot below, showing the aggregated probabilities as we run the simulation more times, support our Bayes based theory.

Graph showing the likelihood of winning if you stayed or switched in the Original Monty Hall Problem. With the graph showing how the probability of success from each option changes with more simulations of the scenario, leading to final probabilities of winning as 0.691 and 0.309 for switching and staying respectively.

And with this, we close our first show. In the next blog post we explore some extensions to this problem, and answering the burning question: what if Monty Hall had access to more doors, more goats, and more contestants?