Risk Analysis for Risk
In 2011 I was on vacation with friends, we played a lot of Risk. Somehow we ended up having fights of hundreds of armies against each other. Since with every dice rolling you can only eliminate up to three armies, you need a lot of rounds until a battle is settled. While my friends were occupied in a battle of 200 against 150, I used my freshly acquired Python skills to write a program to do the dice rolling, riskautodice.
The program does exactly what the players do until one player runs out of armies:

The attacker rolls up to three dice, the defender up to two. The number of dice cannot exceed the number of armies. In the game a player can choose to use less dice, the program uses the maximum amount.

The results are ordered descending and paired up. For every pair which is unequal, the person with the lower number loses one unit.
At first, the program would only output the end result. The result turned out to be extremely unstable, a 100 vs 100 fight could end with either side having 50 armies left. If one reran the program, the result was completely different. At the time, we just accepted this as an intrinsic quirk of the game that gave it its name.
Revisiting the Problem
Now, six years later, I learned more about statistics. So I looked at
this again. For this article, I have implemented the dice rolling
function in R because I currently learn that language. The fight
function is not very complicated:
sorted_roll < function (number) { sort(sample(1:6, number, replace = TRUE), decreasing = TRUE) } fight < function (attacking, defending) { armies_att < attacking armies_def < defending # Risk has â€ślast man standingâ€ť fights, therefore while (armies_att > 0 && armies_def > 0) { # The number of dice that can be used is capped to 3 and 2. dice_att < min(armies_att, 3) dice_def < min(armies_def, 2, dice_att) dice_both < min(dice_att, dice_def) # Roll the dice. roll_att < sorted_roll(dice_att) roll_def < sorted_roll(dice_def) #cat(armies_att, ' ', armies_def, '\n') #print(roll_att) #print(roll_def) #cat('\n') for (i in 1:dice_both) { # The attacker has won a single battle, therefore the defender loses a die. if (roll_att[i] > roll_def[i]) { armies_def < armies_def  1 } # The defender has scored. else if (roll_att[i] < roll_def[i]) { armies_att < armies_att  1 } } } return (data.frame(attacker = armies_att, defender = armies_def, diff = armies_att  armies_def)) }
Analysis
I have run it 300 times for each configuration of up to 25 armies. The whole implementation is not really fast, therefore this already took several minutes to run through. For a more sophisticated analysis one should probably reimplement this in C++ and use OpenMP to get some more speed. The amount of data generated seems to sufficiently see the trend, there is no indication that beyond 25 armies something qualitatively different will happen.
data < expand.grid(attacking = 1:25, defending = 1:25, sample = 1:300) data$x < mapply(fight, data$attacking, data$defending, SIMPLIFY = FALSE) data %<>% tidyr::unnest()
just_10_10 < data %>% filter(attacking == 10, defending == 10)
First one has to note that the combat system is extremely unpredictable. There is a lot of spread in the result. Take for instance 10 attackers and 10 defenders. The resulting number of armies will be the following, where negative values mean surviving defenders.
ggplot(just_10_10) + geom_histogram(aes(x = diff), binwidth = 1) + labs('Fight of 10 vs. 10', x = 'Remaining attackers  remaining defenders', y = 'Count')
Even though 10 vs. 10 starts with the same amount of armies on both sides, we can clearly see that the attacker on average has better chances of winning the battle. In most cases the attacker even gets to keep most of the armies. Keeping in mind that there are huge fluctuations in the fights we will take a look at averages.
grouped < data %>% group_by(attacking, defending) %>% summarize(won = mean(attacker > 0), attacker_survival = mean(attacker / attacking), diff_mean = mean(diff), diff_sd = sd(diff), diff_q01 = quantile(diff, .01), diff_q30 = quantile(diff, .30), diff_q50 = quantile(diff, .50), diff_q70 = quantile(diff, .70), diff_q99 = quantile(diff, .99))
Winning Chance
The following shows the ratio of fights that the attacker wins given various number of attacking and defending armies.
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = won)) + scale_fill_gradient2(midpoint = 0.5) + geom_abline(slope = 1) + labs(title = 'Ratio of attacker wins', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Attacker wins')
As we have seen before, it is heavily skewed towards the attacker. The slope of the white line, where the chances to win are equal for both sides, has a much higher slope than the black line which marks the same number of attackers and defenders. We see that defense is hard in this game and requires some 50% more armies in order to just give equal odds for both sides.
One can have a look at the surviving armies of the attacker normalized by armies put into the battle, this is a survival rate:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = attacker_survival)) + scale_fill_gradient() + geom_abline(slope = 1) + labs(title = 'Ratio of attacker survival', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Attacker survival')
Interestingly, the 50% line has almost unit slope, so when both players have the same amounts of armies, the survival rate of the attacker's armies is still 50%. Of course to win one only needs to have a single unit left, therefore the ratio of won battles has a larger slope.
Fluctuations
But there were those great fluctuations. If we look at the standard deviation of armies left, we see the following:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_sd)) + scale_fill_gradient() + geom_abline(slope = 1) + labs(title = 'Spread of armies left', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Standard devidation')
On the bottom and on the left, there are no fluctuations. This is no surprise because these situations correspond to a nearcertain victory or defeat. The fluctuations look rather symmetric. But that does not really tell much about the risk that one is taking in each battle.
So let's have a look at the 1% quantile of the armies that are left after the battle. I have chosen 1% instead of 0% because that should be somewhat stable against outliers whereas 0% will by definition be the most extreme outlier.
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_q01)) + scale_fill_gradient2() + geom_abline(slope = 1) + labs(title = '1% worst outcomes for attacker', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Difference')
Most of the chart corresponds to a defeat, zero armies left. For low numbers it is skewed towards the defender, it you want to be almost completely certain to win, you will need to have at least four armies to attack a single one. For large number of armies, it seems to become more even again. One could draw the line such that the chance of losing is very low. One would need more statistics here in order to get a better resolution because 300 samples are not going to give a very good 1% quantile.
In a game called "Risk", one cannot be that conservative. Therefore let's have a look at the 30% quantile:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_q30)) + scale_fill_gradient2() + geom_abline(slope = 1) + labs(title = '30% worst outcomes for attacker', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Difference')
For very low numbers, the defender has the advantage. This changes at around 55, where the attacker has the advantage again.
The median (50% quantile) looks pretty good for the attacker:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_q50)) + scale_fill_gradient2() + geom_abline(slope = 1) + labs(title = '50% worst outcomes for attacker', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Difference')
If you want to hedge your bets on something that will work out worse in 70% of the cases, you can have a look at the following chart:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_q70)) + scale_fill_gradient2() + geom_abline(slope = 1) + labs(title = '70% worst outcomes for attacker', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Difference')
The best cases that could happen (only 1% even better) look like this:
ggplot(grouped) + geom_tile(aes(x = attacking, y = defending, fill = diff_q99)) + scale_fill_gradient2() + geom_abline(slope = 1) + labs(title = '99% worst outcomes for attacker', x = 'Attacking Armies', y = 'Defending Armies', fill = 'Difference')
With three armies, one has a slim chance of winning against a defender with 17 armies. In the best case scenario, the number of defending armies only has weak impact on the result. This can be seen through the pretty vertical slope of the color boundaries in the plots. That should be compared to the worst case, where the slopes are only down to unity.
Fixed number of defending armies
Let us fix the number of defending armies to 10 and just vary the number of attackers. In the following plot the line denotes the median, the dark ribbon covers 30% to 70% and the gray ribbon from 1% to 99% of the cases. The red line marks 10 attackers, the case of equal power on each side. The blue line marks 0 surviving armies, wins for the attacker are below that line and wins for the defender below.
ggplot(filter(grouped, defending == 10), aes(x = attacking)) + geom_ribbon(aes(ymin = diff_q01, ymax = diff_q99), alpha = 0.3) + geom_ribbon(aes(ymin = diff_q30, ymax = diff_q70), alpha = 0.3) + geom_line(aes(y = diff_q50)) + geom_vline(xintercept = 10, color = 'red') + geom_hline(yintercept = 0, color = 'blue') + labs(title = 'Remaining armies for 10 defending armies', x = 'Attacking armies', y = 'Difference')
We can nicely see that the attacker has the advantage in the median case until we get down to 6 attacking armies. But even if we aim to win 70% of our fights, we can just take around 7 or 8 armies against 10. And if we take 15 armies, the chances of losing are just around 1%, which is extremely slim.
Next we take a look at this from the other side, fixing 10 attackers and varying the number of defenders. The colors are all the same.
ggplot(filter(grouped, attacking == 10), aes(x = defending)) + geom_ribbon(aes(ymin = diff_q01, ymax = diff_q99), alpha = 0.3) + geom_ribbon(aes(ymin = diff_q30, ymax = diff_q70), alpha = 0.3) + geom_line(aes(y = diff_q50)) + geom_vline(xintercept = 10, color = 'red') + geom_hline(yintercept = 0, color = 'blue') + labs(title = 'Remaining armies for 10 attacking armies', x = 'Defending armies', y = 'Difference')
We see that most of the chart is a win to the attacker. Even 25 defending armies are not enough to ensure a defend rate of 99%, we can only get to around 70%. With the median we see that it takes around 18 armies to defend more than 50% of the attacks.
Conclusion
It seems that the attacker usually has a great advantage in terms of numbers. The chance to win is higher. This probably means that an aggressive strategy will serve the player best.