A Working List of Probability Questions: Week 2

Market Making, Expected Value, and General Topics
June 28, 2020 | Updated July 6, 2020

Welcome to week two of my trading prep series! This week I’m going to focus most of my attention on learning and practicing market making, expected value, and other expanded topics. I’ll also post resources and links to additional problems as I go along for my documentation purposes. Thanks for tuning in!

Day 8   |   Day 9  |   Day 10   |   Day 11  |   Day 12

Update: Instead of doing 55 problems every day consecutively, I’ll be doing problems on weekdays and spending weekends reviewing problems and topics I’ve already covered.

Day Eight: June 29

A friend linked me to a blog post by Messy Matters on confidence intervals/overconfidence that you can find here. I’ve been spending some time with market making questions so I found the excercise and results of the “study” pretty interesting. I also found this powerpoint on recuiting tips when I was looking for new problems to do.

Problem 1. Make me a market on the temperature in an hour. Solution. I was thinking through this problem after taking a walk this morning. At the time, I estimated the temperature to be about 606560^\circ - 65^\circ (the actual temperature was 6464^\circ). Given it was 10 ⁣:35AM10\colon 35AM, it’s still relatively early in the day and there’s quite a bit of room for the temperature to increase to before it plateaus in the afternoon. From my experience, the temperature in a typical summer day in San Francisco usually tops off around 7575^\circ to 8585^\circ. This morning was relatively cool and so I’d expect that today would be one of the cooler days, toping off between 7575^\circ to 7878 ^\circ. Then, just assuming that the temperature increases somewhat linearly, I’d set my bid at 6363^\circ and offer at 6868^\circ.

After setting my initial spread, I’d adjust my interval accordingly to how the second party/market makers chooses to buy from or sell to me. To do this, I sampled from a normal distribution meaned at 65.565.5 with σ=0.83\sigma = 0.83 for the price and flipped a coin to determine whether I was being bought or sold to at that price.

  1. Bid at 66.966.9^\circ: buying from me at a value leaning so far to the right in my range indicates that I severely underestimated the temperature in an hour, as the willingness to buy higher might be an attempt to execute an unfavorable trade with me to then later sell at a higher price with an acceptable margin. However, I also know the temperature cannot change too much so I’d adjust my spread to account for this new information by shifting my position to 65.57165.5-71. I widened my range a bit since I’m pretty skeptical that my original range reflected the actual temperature well and I want to make sure to cover all my bases.
  2. Offer at 68.468.4^\circ: A willingness to sell indicates that, taking into account my decision-making, the second party believes that 68.468.4^\circ optimizes both margins and chance of trade execution. I wouldn’t expect the exact temperature to be 68.468.4^\circ since not all buyers with information on the weather would want to buy the exact of the instrument as it’s harder to later sell for profit. I’d adjust my spread down to 66.569.566.5 - 69.5.

Unfortunately, this simulation is flawed as I didn’t really have a second party with better information on the specific product being exchanged. I tried to adjust for that by using the sampler and randomizing but since the real value of the temperature at 11 ⁣:3511\colon35 wasn’t known to either me or my fictitious party, I had to operate under the assumption that my initial spread wasn’t completely wrong, which could have been the case. Next time, I’ll probably try this with a partner. The actual temperature at 11:3511:35 was about 6666^\circ so you can see how the spread deviated from that without this information, but this exercise still helps me practice adjusting my spread to new information.

Problem 2. Smith is in jail and had 33 dollars and can get out on bail if he has 88 dollars. A guard agrees to make a series of bets with him, with Smith’s probability of winning p=0.4p = 0.4 and probability of losting p=0.6p = 0.6. Find the probability that he can leave on bail before losing all his money if he bets 11 every time versus if he bets as much as possible each round to get to 88 dollars but nothing more than that. Which is the optimal strategy? Solution. This problem is from Aldous’s STAT 150 class and is a spinoff of the gambler’s ruin problem. If Smith bets one dollar at a time, the Markov chain that describes his money at each state ii follows:

Let pip_i, i[0,8]i\in[0,8] be the probability that Smith’s money reaches 88 before 00 when starting from state ii. Then we have

pi=0.4pi+1+0.6pi1 for i[1,7]p0=0p8=1\begin{aligned} &p_i = 0.4p_{i+1} + 0.6p_{i-1} \text{ for } i \in [1,7]\\ &p_0 = 0\\ &p_8 = 1\\ \end{aligned}

You can solve this extremely convoluted system of linear equations to find that p3=0.0964p_3 = 0.0964. Alternately, the transition matrix for this Markov chain is:

P=(1000000000.600.400000000.600.400000000.600.400000000.600.400000000.600.400000000.600.400000000.600.4000000001)P = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0.6 & 0 & 0.4 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0.6 & 0 & 0.4 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0.6 & 0 & 0.4 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0.6 & 0 & 0.4 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0.6 & 0 & 0.4 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0.6 & 0 & 0.4 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0.6 & 0 & 0.4\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}

Taking PnP^n with nn sufficiently large gives us the steady state of this Markov chain, and P4,9nP^n_{4,9} gives us p3p_3 as well!

For the case that Smith bets as much as possible each round as needed to reach 88 dollars, we have the Markov chain:

This in turn gives us the system of equations

p3=0.4p6+0.6p0=0.4p6p4=0.4p8+0.6p0=0.4p6=0.4p8+0.6p6=0.4+0.6p4\begin{aligned} &p_3 = 0.4p_6 + 0.6p_0 = 0.4p_6\\ &p_4 = 0.4p_8 + 0.6p_0 = 0.4\\ &p_6 = 0.4p_8 + 0.6p_6 = 0.4 + 0.6p_4 \end{aligned}

Solving for this gives us p3=0.256p_3 = 0.256. This is considerably higher than betting one dollar each time, so Smith should choose this strategy when betting.

Problem 3. What is the expected number of tosses for a us to toss a HTHHTH? Solution. This is another Markov chain problem. Note that we start all the way over whenever we roll a tails before our first heads so let’s call this state 00. We can also account for all the other possible states: 1=H1 = H, 2=HT2 = HT, and 3=HTH3 = HTH. Assuming a fair coin we have the following Markov chain:

Let’s denote E[i]\mathbb{E}[i] as the expected number of turns to get HTHHTH from each state ii. Then we have the following system of linear equations:

E[0]=1+0.5E[0]+0.5E[0]E[1]=1+0.5E[1]+0.5E[2]E[2]=1+E[3]+0.5E[0]E[3]=0\begin{aligned} &\mathbb{E}[0] = 1 + 0.5\mathbb{E}[0] + 0.5\mathbb{E}[0]\\ &\mathbb{E}[1] = 1 + 0.5\mathbb{E}[1] + 0.5\mathbb{E}[2]\\ &\mathbb{E}[2] = 1 + \mathbb{E}[3] + 0.5\mathbb{E}[0]\\ &\mathbb{E}[3] = 0 \end{aligned}

Solving these gives E[0]=10\mathbb{E}[0] = 10 so we expect it to take 10 turns to flip our first HTHHTH.

Problem 4. Players AA and BB play a marble game. Each player has a red and a blue marble in a bag and draws one marble from their bag each round. If both marbles are blue AA wins $3\$3 and if both are red AA wins $1\$1. If the two colors are different, BB wins $2\$2. If you can play this game 100 times, who would you play as?

Solution. Since there is an equal chance of picking marbles with different colors as picking two marbles that are the same color, the expected value of playing player AA versus player BB is the same (EV =$1= \$1). Then, minimizing risk, we can calculate the variance of the payoff for each player:

σA2=(11)214+(31)214+(01)212=32σB2=(01)212+(21)212=1\begin{aligned} \sigma^2_A &= (1-1)^2 \cdot \frac{1}{4} + (3-1)^2 \cdot \frac{1}{4} + (0-1)^2 \cdot \frac{1}{2} = \frac{3}{2}\\ \sigma^2_B &= (0-1)^2 \cdot \frac{1}{2} + (2-1)^2 \cdot \frac{1}{2} = 1 \end{aligned}

AA’s payoff has larger variance and he holds more risk so it’s advantageous to play as player BB when risk-averse. However, since nn is relatively large, the actual payoff approaches the expected payoff of playing, minimizing the risk of playing as player AA.

Problem 5. You are trapped in a dark cave with three indistinguishable exits on the walls. One of the exits takes you 3 hours to travel and takes you outside. One of the other exits takes 1 hour to travel and the other takes 2 hours, but both drop you back in the original cave through the ceiling, which is unreachable from the floor of the cave. You have no way of marking which exits you have attempted. What is the expected time it takes for you to get outside? Solution. This is a simple EV problem. Since the doors are indistinguishable and we can’t makr our escape attempts, the probability of selecting each of the exists is identical and independent from the others. Then let XX be a random variable measuring the time it takes to make it out of the cave. There is 13\frac{1}{3} chance we make it outside in time 33, 13\frac{1}{3} chance we waste an hour and end up back in the cave, and 13\frac{1}{3} chance we waste two hours and end up back in the cave. Then,

E[X]=133+13(1+E[X])+13(2+E[X])\mathbb{E}[X] = \frac{1}{3} \cdot 3 + \frac{1}{3}(1 + \mathbb{E}[X]) + \frac{1}{3}(2 + \mathbb{E}[X])

Solving this equation gives E[X]=6\mathbb{E}[X] = 6 so we expect to spent 66 hours attempting to escape before we’re finally outside.

Day Nine: June 30

For today’s problems I took a step back and reviewed some general problems and conditional probability.

Problem 1. A deck of cards has 3 Red and 3 Blue cards. At each stage, a card is selected at random. If it is Red, it is removed from the deck. If it is Blue then the card is not removed and we move to the next stage. Find the average number of steps till the process ends. Solution. Note that the proces ends when there are no red cards left. To get started, I’ll draw the Markov chain for this problem:

Now let’s index the states: State 0=R3B30 = R^3B^3, State 1=R2B31 = R^2B^3, State 2=RB32 = RB^3, State 3=B33 = B^3. Let E[i]\mathbb{E}[i] denote the expected time until the process ends. Then we have the following equations:

E[0]=12(1+E[0])+12(1+E[1])E[1]=25(1+E[2])+35(1+E[1])E[2]=141+34(1+E[2])\begin{aligned} \mathbb{E}[0] &= \tfrac{1}{2}(1+ \mathbb{E}[0]) + \tfrac{1}{2}(1+ \mathbb{E}[1])\\ \mathbb{E}[1] &= \tfrac{2}{5}(1 + \mathbb{E}[2]) + \tfrac{3}{5}(1 + \mathbb{E}[1])\\ \mathbb{E}[2] & = \tfrac{1}{4} \cdot 1 + \frac{3}{4}(1 + \mathbb{E}[2]) \end{aligned}

Solving this system gives E[0]=8.5\mathbb{E}[0] = 8.5.

Problem 2. A process moves on the integers 1, 2, 3, 4, and 5. It starts at 1 and, on each successive step, moves to an integer greater than its present position, moving with equal probability to each of the remaining larger integers. State five is an absorbing state. Find the expected number of steps to reach state five. Solution. The Markov chain for this problem is

This gives us the transition matrix

(0141414140013131300012120000100001)\begin{pmatrix} 0 & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4}\\ 0 & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\ 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2}\\ 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 1 \end{pmatrix}

We’re interested in finding the expected number of steps to reach state five from state one so let E[i]\mathbb{E}[i] denote the expected number of steps to reach step 55 from step ii. Then we have the equations

E[1]=1+14E[2]+14E[3]+14E[3]+14E[5]E[2]=1+13E[3]+13E[4]+13E[5]E[3]=1+12E[4]+12E[5]E[4]=1E[5]=0\begin{aligned} \mathbb{E}[1] &= 1 + \tfrac{1}{4}\mathbb{E}[2] + \tfrac{1}{4}\mathbb{E}[3] + \tfrac{1}{4}\mathbb{E}[3] + \tfrac{1}{4}\mathbb{E}[5]\\ \mathbb{E}[2] &= 1 + \tfrac{1}{3}\mathbb{E}[3] + \tfrac{1}{3}\mathbb{E}[4] + \tfrac{1}{3}\mathbb{E}[5]\\ \mathbb{E}[3] &= 1 + \tfrac{1}{2}\mathbb{E}[4] + \tfrac{1}{2}\mathbb{E}[5]\\ \mathbb{E}[4] &= 1\\ \mathbb{E}[5] &= 0 \end{aligned}

Solving this system of linear equations gives us the set of expected steps from each state {E[1],E[2],E[3],E[4],E[5]}={2512,116,32,1,0}\{\mathbb{E}[1], \mathbb{E}[2], \mathbb{E}[3], \mathbb{E}[4], \mathbb{E}[5]\} = \{\frac{25}{12}, \frac{11}{6},\tfrac{3}{2}, 1, 0\} so we expect about 2.08332.0833 steps to reach 55.

Problem 3. An unfair coin has probability pp of tossing heads and q=1pq = 1-p of tossing tails. How do you make this coin fair? Solution. Note that pH=pp_H = p and pT=pp_T = p. The only way for us to make the coin fair is to somehow combine the H,TH,T outcomes as rolling consecutive heads vs tails has different probability of success. Note that HTHT is rolled with a probability of pqpq and THTH is rolled with a probability of qpqp (note these are equal). Also HHHH has p=p2p = p^2 and TTTT has p=q2p = q^2. We can modify the typical rules of coin flipping in in the following way to make it fair: Let heads be tossing an HTHT and tails be tossing THTH, which both have equal probability and is fair. Then, to account for the remaining two possible outcomes, we just start over when we get HHHH or TTTT, which gives us an automatic reset when we toss two of the same face. Technically, HHTHHT would be rolling a HTHT, but note that P(HHT)=p2qq2p=P(TTH)P(HHT) = p^2q \neq q^2p = P(TTH) so we can’t simply take the first occurance of HTHT or THTH as a fair outcome is not guaranteed.

Problem 4. This is a variation of the gambler’s ruin problem. You have 99 and want to reach 1010. Your probability of success is p=0.4p = 0.4 and failure is q=0.6q=0.6. You may only place bets in multiples of 10p10p. How would you maximize the probability of reaching 1010? Solution. I would bet $4\$4 each time I play. Note that from $9\$9 we can bet either 44 or 88. We have p=0.4p= 0.4 of reaching $10\$10 from each of these bets, but 0.60.6 chance of reaching $1\$1 if we bet 88 dollars and qq chance of reaching $5\$5 if we bet 44 dollars. Once we reach 11, we can no longer bet as all of our bets must be a multiple of 44. Then, if we bet $8\$8, we have a total of probability 0.40.4 to reach our goal. If we bet $4\$4, then we have probability 0.40.4 of getting to $10\$10 the first round. If this doesnt happen and we lose the first bet instead, then we’re at $5\$5 and we have the opportunity to go down to $1\$1 (probability 0.40.4) in which we’re done and can’t bet anymore or go back up of $9\$9 and play over our original scenario over again. Clearly, betting $4\$4 per round is optimal as we have probability 0.40.4 of winning the first round (matching the probability of winning when betting $8\$8) and there also is some additional probability p0>0p_0 >0 that we can reach $10\$10 in later rounds.

Problem 5. Two players, AA and BB, toss a fair coin in that order. The sequence of heads and tails is recorded. If there is a HTHT sequence in the rolls, then whoever rolled the tails wins and the game ends. What is the probability that AA wins? Solution. Let P(A)P(A) and P(B)P(B) denote the probabilities that AA and BB win respectively. Note that P(B)=1P(A)P(B) = 1 - P(A). This next part is pretty cool—we can condition P(A)P(A) on AA’s first coin toss, with has probability 0.50.5 to roll HH and probability 0.50.5 to roll TT. Then we have P(A)=12P(AH)+12P(AT)P(A) = \tfrac{1}{2}P(A \, | \, H) + \frac{1}{2}P(A \, | \, T). If AA tosses heads, then BB essentially becomes the first to toss and so P(A,,T)=P(B)=1P(A)P(A , |, T) = P(B) = 1-P(A). If AA tosses heads however, we can then condition the probability further on BB’s first toss. BB has probability 0.50.5 of rollng TT, in which AA loses, and probability 0.50.5 of tossing HH, which essentially brings us to the scenario that BB is the first person to toss an HH so AA would have probability 1P(AH)1-P(A \, | \, H) of winning. Then we have

P(AH)=120+12(1P(AH))P(AH)=13P(A \, | \,H) = \tfrac{1}{2} \cdot 0 + \tfrac{1}{2}(1 - P(A\, | \, H)) \to P(A \, | \,H)= \tfrac{1}{3}

We can then plug this back into the original expresion to find that P(A)=49P(A) = \frac{4}{9}.

Day Ten: July 1

Come back later for the rest of today’s problems! My friend recommended me take a look at Fifty Challenging Problems in Probability, and I’m taking a couple of todays problems from here.

Problem 1. You play a game with an urn that contains 7575 clue balls 2525 red balls and 11 yellow ball. For each red ball you draw you gain a dollar and if you draw the yellow ball you lose everything and the game ends. What is your strategy for this game given that you can decide to stop or redraw after each turn? Solution. Note the payoff for each blue ball draw is 00 and the payoff for the yellow ball is r-r where rr is the number of red ball already drawn. If we let bb be the number of blue balls already drawn, we can derive the marginal expected value (not sure if this is a term):

E[X]=75b(75b)+(25r)+10+25r(75b)+(25r)+11+1(75b)+(25r)+1(r)=252r(75b)+(25r)+1\begin{aligned} \mathbb{E}[X] &= \frac{75 - b}{(75 - b) + (25 - r) + 1}\cdot 0 + \frac{25 - r}{(75 - b) + (25 - r) + 1} \cdot 1 \\ &+ \frac{1}{(75 - b) + (25 - r) + 1}\cdot (-r) = \frac{25- 2r}{(75 - b) + (25 - r) + 1} \end{aligned}

Our marginal EVEV should be greater than 00 for us to keep playing. Then, setting the numerator to 00 and solving gives us r=252r= \frac{25}{2}. This is the number of balls that have already been drawn, so after drawing 1212 balls our EVEV is still positive so we’ll draw a 1313th one. Then we draw 1313 balls and call it a day (of course this will sometimes be cut short if we draw a yellow ball before then).

Problem 2. A drawer contains red socks and and black socks. When two socks are chose at random, the probability that both are red is 12\frac{1}{2}. How small can the number of socks in the drawer be? Solution. Let rr and bb denote the number of red and black socks respecively. Since we are selecting socks without replacement, we have

rb+rr1b+r1=12\frac{r}{b+r} \cdot \frac{r-1}{b+r-1} = \frac{1}{2}

Then,

2r(r1)=(b+r)(b+r1)2r2=b2+2br+r2br2r22rb22brr2+b+r=0r2(1=2b)r(b2b)=02r(r-1) = (b+r)(b+r -1)\\ 2r^2 = b^2 + 2br + r^2 - b - r\\ 2r^2 - 2r - b^2 - 2br - r^2 + b + r = 0\\ r^2 -(1= 2b)r - (b^2 - b) = 0

Note that b0b\neq 0 since P(rb=0)=1P(r \, | \, b = 0) = 1. So if we choose b=1b= 1 and plug it into the quadratic equation to solve for rr, in which case we get r=3r = 3. Then, the minimum number of socks in the drawer is $4\$4.

Problem 3. Mr. Brown always bets a dollar on the number 1313 at roulette agains the advice of his friend. To help cure Mr. Brown of playing roulette , his friend bets Brown $20\$20 at even money that Mr. Brown will be behind at the end of 3636 plays. How is the cure working? Mr. Brown is piad $35\$35 of his bet amount in roulette when his number comes up and roulette has 3838 numbers to bet on. Solution. In order to go even in 3636 rounds, Mr. Brown needs only win one turn. So he’ll be behind if he loses all 3636 turns, which has probability (3738)36=0.383(\frac{37}{38})^36 = 0.383, which is the probability that he loses his bet with his friend. Note also that his ev after 36 turns is

36(1383537381)1.8936(\tfrac{1}{38} \cdot 35 - \tfrac{37}{38} \cdot 1) \approx -1.89

Then Mr. Brown’s EV of betting with his friend is

20(10.383)20(0.383)1.89=2.7920(1- 0.383) - 20(0.383) - 1.89 = 2.79

So this cure doesn’t seem to be working too well. Mr. Brown’s EV of eahc cycle of 3636 turns is $2.79\$2.79 whenever he makes the $20\$20 bet with his friend.

Problem 4. We throw three dice one by one. What is the probability that we roll three nunbers in strictly increasing order? Solution. Note that in order for the three numbers to be in strictly increasing order, they have to be unique. Then P(unique and strictly increasing)=P(unique)P(\text{unique and strictly increasing}) = P(\text{unique}). We have probability 665646=59\frac{6}{6} \cdot \frac{5}{6} \cdot \frac{4}{6} = \frac{5}{9}. When conditioned on us rolling three numbers, the probability of rolling strictly increasing numbers is simply based on how we want them arranged. There is only one correct arragement, but there are 66 ways to choose how to arrange the three numbers, so we have P(stringtly increasingunique)=16P(\text{stringtly increasing} \, | \, \text{unique}) = \frac{1}{6}. Then we have

P(strictly increasing)=P(strictly increasing and unique)=P(unique)P(stringtly increasingunique)=5916=554\begin{aligned} P(\text{strictly increasing}) &= P(\text{strictly increasing and unique})\\ &= P(\text{unique}) \cdot P(\text{stringtly increasing} \, | \, \text{unique})\\ &= \frac{5}{9} \cdot \frac{1}{6} = \frac{5}{54} \end{aligned}

Then we have probability 0.0930.093 of rolling such an outcome.

Problem 5. The game of craps, played with two dice, is one of America’s fastest and most popular gambling games. These are the rules: we throw two dice. If we roll 77 or 1111 on the first throw, we win immediately and if we roll 2,32, 3 or 1212 we lose immediatly. And other throw is called a “point”. If the first toss is a point, then we keep rolling until we roll our point, or roll a 77 and lose. Wht is the probability of winning? I thought this was a Markov chain problem at first but a closer look made me realize it was just a bit of conditional probability. Note how I got the numbers: there are 3636 outcomes for two dice. There is one way for me to roll a 22, on way to roll a 1212, abd 22 ways to roll a 33. Then the probabiity of losing the first round is 1+1+236=19\frac{1 + 1+ 2}{36} = \frac{1}{9}. Additionally, there are 66 ways to roll a 77 and 22 ways to roll an 1111 so our probability of winning the first round is 29\frac{2}{9} and there is a 23\frac{2}{3} chance of rolling a point. Now, we just need to find the probability of winning conditioned on rolling a point and adding that to the probability of winning the first round to arrive at the total probability of winning. Note that there is probability 112\frac{1}{12} that we roll 44 or 1010, 19\frac{1}{9} that we roll a 55 or a 99, and 536\frac{5}{36} of rolling a 66 or an 88.

Note that in the event that we roll a point in our first round, we need only consider the probabilities of rerolling the point (to win) or a 77 (to lose) as rolling a nonpoint/seven will bring us back to our original scenario of wanting to roll the point. Then, there is probability 33+6\frac{3}{3 + 6} of rolling a 44 or 1010, 46\frac{4}{6} of rolling of a 55 or 99, 55+6\frac{5}{5+6} of rolling a 66 or an 88 (these can all be calculated by the number of ways to roll the point vs. the 77). Then, the probability of winning conditional on rolling a point the first turn is

P(Wpoint)=11239+19410+536511+536511+19410+112390.27071\begin{aligned} P(W \, | \, \text{point}) &= \tfrac{1}{12} \cdot \tfrac{3}{9} + \tfrac{1}{9} \cdot \tfrac{4}{10} + \tfrac{5}{36} \cdot \tfrac{5}{11} + \tfrac{5}{36} \cdot \tfrac{5}{11} + \tfrac{1}{9} \cdot \tfrac{4}{10} + \tfrac{1}{12} \cdot \tfrac{3}{9}\\ &\approx 0.27071 \end{aligned}

Adding this to 29\frac{2}{9}, which is our probability of winning round 11, we have a total probability of winning p=0.493p = 0.493, which is relatively fair for a casino game.

Day Eleven: July 2

Dynamic programming is also relatively fair game at some trading firms so I’ve included some dp questions for future problems.

Problem 1. A basketball player is taking 100100 free throws. She scores one point if the ball passes through the hoop and zero point if she misses. She has scored on her first throw and missed on her second. For each of the following throw the probability of her scoring is the fraction of throws she has made so far. For example, if she has scored 2323 points after the 40th40^\text{th} throw, the probability that she will score in the 41th41^\text{th} throw is 23/4023/40. After 100100 throws (including the first and the second), what is the probability that she scores exactly 5050 baskets? Solution. This is another problem from A Practical Guide. I approached this problem with the mindset of Markov chains at first but then I realized that I could build it up iductively with conditional probability. Let’s define (n,k)(n,k) to be the event that the player scores kk baskets after nn shots and pn,k=P((n,k))p_{n,k} = P((n,k)). Since we make the first shot and miss the second, the third shot has success probability 12\frac{1}{2} (p3,1=12=p3,2)p_{3,1} = \frac{1}{2} = p_{3,2}). For n=4n=4, we can condition it on (3,1)(3,1) and (3,2)(3,2) having taken place. Then,

p4,1=P((4,1)P(3,1))+P((4,1)P(3,2))=2312+012=13p_{4,1} = P((4,1)\, | \, P(3,1)) + P((4,1)\, | \, P(3,2)) = \frac{2}{3}\cdot \frac{1}{2} + 0 \cdot \frac{1}{2} = \frac{1}{3}
p4,2=P((4,2)P(3,1))+P((4,2)P(3,2))=1312+1312=13p_{4,2} = P((4,2)\, | \, P(3,1)) + P((4,2)\, | \, P(3,2)) = \frac{1}{3} \cdot \frac{1}{2} + \frac{1}{3} \cdot \frac{1}{2} = \frac{1}{3}
p4,3=P((4,3)P(3,1))+P((4,3)P(3,2))=012+1223=13p_{4,3} = P((4,3)\, | \, P(3,1)) + P((4,3)\, | \, P(3,2)) = 0 \cdot \frac{1}{2} + \frac{1}{2} \cdot \frac{2}{3} = \frac{1}{3}

The pattern seems to be emerging that pn,k=1n1p_{n,k} = \frac{1}{n-1} for k[1,n]k \in [1, n] regardless of what kk is (obviously k<nk < n since we have missed the second shot), which we can prove inductively.
Base Case: p3p_3 is our base case
Inductive Step: Suppose that Pn,k=1nP_{n,k} = \frac{1}{n} holds for all k[1,n]k \in [1, n]. Then for n+1n+1:

pn+1,k=P(score(n,k1))pn,k1+P(miss(n,k))pn,k=k1n1n1+(1kn)1n1=1n\begin{aligned} p_{n+1, k} &= P(\text{score} \, | \, (n, k -1))p_{n,k-1} + P(\text{miss} \, | \, (n, k))p_{n,k}\\ &= \frac{k-1}{n} \cdot \frac{1}{n-1} + \left( 1-\frac{k}{n} \right) \cdot \frac{1}{n-1} = \frac{1}{n} \end{aligned}

Then we have probability 199\frac{1}{99} of making exactly 5050 shots out of 100100.

Problem 2. Dynamic Programming: Given two arrays of numbers, find the length of the largest subsequence present in both arrays. For example the length of largest subsequence present in [1,2,3,4][1,2,3,4] and [2,3,5,7][2,3,5,7] is 22. Solution. Our goal is to build a method that returns the largest common subsequence (lcs). Let’s look at the brute force solution: we can approach the problem by finding each of the substrings in one of the arrays and seeing if it is present in the second array. There are (nk)\binom{n}{k} combinations for substrings of length kk given an array of length nn. Then we have (n1)+(n2)++(nn)=2n\binom{n}{1} + \binom{n}{2} + \cdots + \binom{n}{n} = 2^n operations for our brute force solution, indicating O(2n)O(2^n) time. Now let’s approach it with dynamic programming: assuming our first array has length mm and our second array has length nn, we can build an m×nm\times n matrix in which the (i,j)(i,j) entry holds the size of largest common multiple of array1[0,1,i]\text{array1}[0, 1, \dots i] and array2[0,1,j]\text{array2}[0, 1, \dots j].Then we have the following solution in C++:

int largestCommonSubsequence(int *array1, int *array2){
	int m = array1.size();
	int n = array2.size();
	int lcs[m+1][n+1]; // our "matrix"

	// constructs m by n matrix of lcs's using dp
	for(int i = 0; i < m; i++){
		for(int j = 0; j < n; j++){
			if(i == 0 || j == 0){
				lcs[i][j] = 0;
			} else if (array1[i-1]= array2[j-1]){
				lcs[i][j] = lcs[i - 1][j - 1] + 1;
			} else{
				lcs[i][j] = max(lcs[i-1][j], lcs[i][j-1]);
			}
		}
		return lcs[m][n];
	}
}

Walking through the for loop: when either ii or jj is 00, at least one of the two subarrays we’re looking at are empty. Then, the lcs is just 00. Otherwise, if the previous number we looked at in the first array is the same as the previous number we looked at in the second array, then we simply need to add one to the value of our previous (i,j)(i,j) entry since we’ve increased the common subseqeunce by one number. Finally, if neither of these are the case, the lcs of the current i,ji,j matrix is the same as the max of the previous i1×ji-1 \times j or i×j1i \times j-1 matrix. This solution has an O(mn)O(mn) time, where mm and nn are the lengths of the two arrays.

Problem 3. 500500 ants are randomly put on a 11-foot string (independent uniform distribution for each ant between 00 and 11 ). Each ant randomly moves toward one end of the string (equal probability to the left or right) at constant speed of 11 foot/minute until it falls off at one end of the string. Also assume that the size of the ant is infinitely small. When two ants collide head-on, they both immediately change directions and keep on moving at 11 foot/min. What is the expected time for all ants to fall off the string? Solution. This problem is relatively simple when you start thinking about it. The most troublesome part seems to be the collision as each collision complicates the process and each any can collide with another at any time. Interesting collisions don’t seem to be relevant to solving this question. When an ant moving to the right collides with an ant moving to the left, the result is the two ants switching directions, but there is still one ant moving left and one ant moving right (at the same speed as well). So its’ as if collisions don’t really take place. Then, the expected time is just the max time for an ant to move across the string given a uniform distribution. This is identical to asking what the expected value of the maximum of 500500 i.i.d random variables with uniform distribution on [0,1][0,1], aka an order statistic problem. Lets find the CDF of X500X_{500}, the maximum of the independently distributed ants.

P(X500x)=P(Xixi[1,500])=P(X1x)P(X2x)P(X500x)=x500\begin{aligned} P(X_{500} \leq x) &= P(X_i \leq x \, | \, i \in [1,500]) \\ &= P(X_1 \leq x)P(X_2 \leq x) \cdots P(X_{500} \leq x) = x^{500} \end{aligned}

Then, we can find the PDF by differentiating to get pX(x)=500x499p_X(x)= 500x^{499}. The expected value of the max of the 500500 i.i.d random variables is

xpX(x)dx=01x500x499dx=500501\int_{-\infty}^\infty xp_X(x) \, dx = \int^1_0 x\cdot 500x^{499} \, dx = \frac{500}{501}

There is an alternate solution to this problem on the Berkeley EECS 126 site, where it’s listed as a test review problem.

Problem 4. In a common carnival game a player tosses a penny from a distance of about five feet onto the surface of a table ruled in 11-inch squares. If the penny (diameter 34\frac{3}{4} inch) falls entirely inside a square, the player received five cents but does not get his penny back and he loses his penny otherwise. If the penny lands on the table, what is the expected payoff of the game. Solution. We’re given that that the penny lands on the table so we know that the penny will fall on a square. Then we only need to find the regions where our penny can land on for it to be fully encapsulated. I’ve drawn two diagrams to model two different appraoches to this problem. We cant first visualize the penny landing on any four-square unit since any four adjacent squares are guaranteed to fully encompass a penny. Then, the part of the penny that associates with each corner must be squeezed into one of the four 1414\frac{1}{4} \cdot \frac{1}{4} squares, which has a total area of 14\frac{1}{4} and probability 116\frac{1}{16} of being landed on.

Regarding the second visualization, this is much more accurate to the possible outcomes of the penny toss since we cannot have an edge of the penny touch a specific corner of the square. Then, we can visualize the locations that the center of the penny must reside in. The radius of the penny is 38\frac{3}{8} so the center of the penny must reside at least 38\frac{3}{8} inch from each side of the square, giving a smaller square of potential center point wuth area 116\frac{1}{16}. Then, the expected value of the game is 11516+4116=1116-1 \cdot \frac{15}{16} + 4 \cdot \frac{1}{16} = \frac{-11}{16}.

Problem 5. Eight eligible men and seven women happen randomly to have purchased single seats in a 1515 seat row of a theater. On average, how many pairds of adjacent seats, are ticketed for marriageable couples? Solution. The book that this problem is from is a bit old so the wording is a bit old-fashioned. Essentially we’re looking to find the average number of potential male-female couples who sit next to one another. The best case and worst case scenario follow

MWMWMWMWMWMWMWMMMMMMMMMWWWWWWW\text{MWMWMWMWMWMWMWM}\\ \text{MMMMMMMMWWWWWWW}\\

In order to build a marriageable couple we need to have MW\text{MW} or WM\text{WM}, or unlike adjacent seatings. If they are unlike, then we have 11 marriageable couple and if not then we have 00. The probability of a marriageable couple in the first two seats is 28157142 \cdot\frac{8}{15} \cdot \frac{7}{14}. Note that 815\frac{8}{15} is also the expected number of marriageable couples in the first two seats. Our sam calculation can applies to any adjacent pair so we can get the average number of marriageable adjacent pairs by taking the number of adjacent pairs and multiplying by the expected value of each adjacent pair being marriageable: 14815=112157.4714 \cdot \frac{8}{15} = \frac{112}{15} \approx 7.47. Then the expected number of marriageable couples is just under half of the people present.

Moving on to a more general case, when we can bb elements of one type and mm of another randomly arranged in a line, the expected number of unlike adjacent elements is

(m+b+1)[2bm(m+b)(m+b1)](m+b+1) \left[2\cdot \frac{bm}{(m+b)(m+b-1)}\right]

which can be shown easily by going through the same process we used for our specific case.

Day Twelve: July 3

Come back later for today’s problems!

Problem 1. Two players bet on the number of rolls of the total of two standard six faced dice. Player AA bets tha a sum of 1212 will occur first and player BB bets that two consecutive 77’s will be rolled before that. What is the probability that AA will win? Solution. This question can be solved either through conditional probability or a Markov chain approach. I’ve opted for a Markov chain since it’s much more straightforward and easily illustrated. Below is the Markov diagram for this problem. I’ve chosen to denote \empty the event of rolling for the first time. Since every time we roll something other than a 77 or a 1212 we essentially reset, each state gives us a certain probability of returning to our original state. It’s worth noting that after rolling the first 77 there is still a nontrivial probability for BB to win as 1212 can still be rolled in the next round.

Then we have the following system of equations to solve:

p=136+2936p+16p7p7=136+2936p\begin{aligned} p_\empty &= \tfrac{1}{36} + \tfrac{29}{36}p_\empty + \tfrac{1}{6}p_7\\ p_7 &= \tfrac{1}{36} + \tfrac{29}{36}p_\empty \end{aligned}

Noting that p7,7=0p_{7,7} = 0 and p12=1p_{12} = 1, solving gives us that p=713p_\empty = \frac{7}{13}. Then, player AA is slightly favored to win over player BB in their bet.

Problem 2. There are 51 ants on a square with side length of 1. If you have a glass with a radius of 1/7, can you put your glass at a position on the square to guarantee that the glass encompasses at least 3 ants? Solution. We can do this by creating 2525 sub-squares and using the pidgeonhole principle to conclude that ones of these sub-squares must have at least 33 ants. In creating these 2525 sub-squares, we want them to cover the entirety of the original 1×11\times 1 square so each of these smaller squares will have a 15\frac{1}{5} side length. Note that a sub-square with side length 15\frac{1}{5} has diagonal length 2150.282\sqrt{2}\frac{1}{5} \approx 0.282. Meanwhile, the radius 17\frac{1}{7} glass cup has a diameter of 270.285\frac{2}{7} \approx 0.285, so any sub-square can be inscribed within the glass. Then, we simply need to choose the sub-square with the most ants and place our glass carefully to cover at least three ants.

Problem 3. The probability you’ll see a falling star in the sky over the course of one hour is 0.44. What’s the probability you’ll see one over half an hour? Solution. This is a past Citadel interview question that I’ve seen variants of at other companies. We can assume that each half hour is independent from the other regarding falling star appearances. Then, the probability of not seeing a shooting star in either half-hour intervals is 0.560.56. Let pp be the probability that we see a shooting star in a thirty minute interval. Then we have

(1p)(1p)=0.56(1-p)(1-p) = 0.56

Solving gives us p0.25p \approx 0.25. This question is meant to be solved mentally so I didn’t include and pen or paper for it. For taking the square root of 0.560.56, I did a linear approximation based off of 0.49\sqrt{0.49} and 0.64\sqrt{0.64}.

Problem 4. You are given two ropes that when lit burn in one hour. Which one of the following time periods CANNOT be measured with your ropes? a) 50 min b) 30 min c) 25 min d) 35 min. Solution. The only time period I can’t think of a solution to is 5050 min. The key to this problem is realizing that we can burn the rope wherever we want. It’s easy to finding 3030 minutes by burning a single rope on both ends. We can burn 3535 by first measuring 3030 minutes. Then, we measure 55 additional minutes by burning the second rope on both ends and 55 additional times in the middle. This process will create two ropes, all of which burn on both of their ends. If any of these ropes burn out fully, all we need to do is light one of the other 55 ropes in the middle. Keep doing this proccess and you’ll have an additional 55 minutes measured. To measure 2525 minutes, we can burn one of the ropes on both ends and once in the middle. If one of the two ropes formed by the burning burns out, then light the other sub-rope in the middle, making sure the rope is still burning in 44 places. This ensures that the first rope will finish burning in 1010 minutes. Then, we need only light the second rope on both ends and two times in the middle. This ensures the second rope burns at 66 places, and so we’ll measure 1010 additional minutes that way.

Problem 5. You have three cards, each labeled nn, n+1n+1, n+2n+2, and you don’t know nn. All cards start facing down so you can’t see them. You flip one card. If you choose to “stay”, you get that card’s value. If you don’t “stay”, then you flip another card. Again, choose to “stay” (and keep the 2nd card’s value) or flip the final card and keep the final card’s value. Design the optimal strategy to maximize the value of the card that you choose and find the expectation of that value. Solution. Note that your expected value will be at least nn. Making our decision on draw 11 ensures we have no information and making our decision of draw 33 presents us with no choice. Then it makes sense for us to make our decision on round 22. Since we have the ability to draw once again after round 22, our decision is simply to take what we drew or draw again. I will refer to each draw ii as did_i. Then, if
d2d1=2:d_2 - d_1 = -2: we have the sequence n+2,n,n+1n+2, n, n+1 so we choose to draw again for ev n+1n+1
d2d1=2:d_2 - d_1 = 2: we have the sequence n,n+2,n+1n, n+2, n+1 so we stay for ev n+2n+2
d2d1=1:d_2 - d_1 = 1: we either go up 11 (n,n+1,n+2n,n+1, n+2) or down 22 (n+1,n+2,nn+1, n+2, n) so we choose to stay for ev n+1.5n + 1.5 d2d1=1:d_2 - d_1 = -1: we either go up 22 (n+1,n,n+2n+1, n, n+2) or down 11 (n+2,n+1,nn+2, n+1, n) so we choose to draw again for ev n+1n+1 Using this strategy, we combine our total expected value:

E[X]=36(n+1)+16(n+2)+26(n+1.5)=n+44\mathbb{E}[X] = \tfrac{3}{6}(n+1) + \tfrac{1}{6}(n+2) + \tfrac{2}{6}(n+1.5) = n+ \tfrac{4}{4}

My optimal strategy is to draw the first two cards and determine whether to draw again with the thought process above with expected value n+43n + \frac{4}{3}. Putting it more simply, if the second draw is higher than the first, then stay, and if not, then draw again.

Week Two Review:

This week I focused on working through more of A Practical Guide, and starting to pull general probability question from resources. i’ve done a bit of market-making practice behind the scenes and am still figuring out the best way of going about that. I didn’t get to review as many topics as I had hoped to and next week I hope to work on some more programming/staticstics questions as recruiting season begins.

Here are this week’s list of resources: