Introduction
Let's say we want to predict the outcome of a soccer match Team A vs. Team B. It would be convenient to have a model that gives probabilities for how much Team A and Team B will score. We could use a complicated Machine Learning (ML) model using data of previous matches, players, weather, etc., but what if we want to set a strong baseline for the outcome prediction? This is where the Poisson distribution comes in. It provides an exceptional baseline for prediction of outcomes. The only thing we need is a rough estimate for the average number of goals a team scores.
Poisson
The Poisson distribution provides a convenient way to model the number of goals with only one parameter. This one parameter is the average, which is called Lambda ($λ$). You can play with the slider below and get a feel for how the probabilities change. We can see that the minimum outcome is obviously 0 goals and the most likely outcome is around the average number of goals.
Formally the Poisson distribution is defined as:
$$P(X = k) = \frac{λ^k e^{-λ}}{k!}$$
An important part of this formula is the exponential decay $e^{-λ}$. As we move further from the average number of goals ($λ$), the probabilities continuously get smaller. Furthermore, the factorial term $k!$ in the denominator will dominate over $e^{-λ} λ^k$ for large $k$. Thus no matter how much we move from the average, the probabilities never quite reach $0$.
In real world soccer matches there is more [skew](https://en.wikipedia.org/wiki/Skewness) in the distribution, which means that outcomes to the right occur a more often than the Poisson distribution suggests. Nevertheless, the Poisson distribution is a great approximation of how many goals a team scores given the average number of goals ($λ$). Now what if we want to know the probability of a specific outcome, like 3-2, 1-1, or 4-0? This is where the outer product comes in handy.
Outer Product
If we have the Poisson distribution both teams getting outcome probabilities is easy using the outer product aka tensor product aka $\otimes$. Using $\otimes$ the resulting shape of two vectors of length $n$ will be $n x n$. For large $n$ the resulting matrix will therefore be very large! Thats why for this case we sum the probabilities for 3+ goals to keep the vectors small. Remember that probabilities for the Poisson distribution never quite reach $0$ as we go to the right so we can get probabilities for outrageous numbers like 999 goals, but at some point it makes sense to aggregate the leftover probability. Let's look at probabilities for 0, 1, 2, 3+ goals.
Team $A$ ($λ=1.1$): $$ A = \begin{bmatrix} 0.333 & 0.366 & 0.201 & 0.1 \end{bmatrix} $$
Team $B$ ($λ=1.8$): $$ B = \begin{bmatrix} 0.165 & 0.298 & 0.268 & 0.269 \end{bmatrix} $$
Note that both vectors should always add up to 1 in order for them to be valid probability distributions.
Now we take the outer product to arrive at the final matrix. This is achieved by multplying each element in $A$ by each element in $B$.
$$
\begin{bmatrix}
0.333 \\
0.366 \\
0.201 \\
0.0989
\end{bmatrix} \otimes
\begin{bmatrix}
0.165 \\
0.298 \\
0.268 \\
0.161
\end{bmatrix} =
\begin{bmatrix}
0.055 & 0.099 & 0.089 & 0.090 \\
0.060 & 0.109 & 0.098 & 0.098 \\
0.033 & 0.060 & 0.054 & 0.054 \\
0.016 & 0.030 & 0.027 & 0.027
\end{bmatrix}
$$
The rows correspond to Team A and columns to Team B.
For example, the outcome 0-2 will have a probability of 0.089 (8.9%). The most likely outcome is 1-1 with 0.109 (10.9%) probability. The predicted probability of 3-3, 4-4, 5-5, etc. will be 0.027 (2.7%).
A cool characteristic of this matrix is that because the vectors are valid probability distributions, the outer product matrix is also a valid probability distribution. In other words, the sum of all elements equals 1.
I hope you have learned something new from this exploration of the Poisson distribution and outer product!