Table of Contents
1. What’s this about
This post is modified from one of my reading notes and coding practices.
The original paper is: Ladd, Daniel W., 1975: An Algorithm for the Binomial Distribution with Dependent Trials.
If the outcomes of a sequence of trials can be assumed to be independent of each other, then the number of successes \(j\) in a total of \(K\) trials, each with a probability of success \(P\) is modeled as a binomial distribution:
\[ Pr(j, K; P) = \left(\begin{array}{c}K\\ j\end{array}\right) P^j (1-P)^{K-j} \]
This article is concerned with the distribution when the independent constraint is relaxed, and the trials are assumed to be dependent of each other, by a “one-stage dependence”, i.e. the outcome of one trials affects the outcome of the next, and no others.
In this post I will cover:
- the formulation of the problem,
- the formulae for the probability mass function and cumulative distribution function,
- a Python implementation, with Cython acceleration for one of the functions,
- some preliminary distribution comparisons between the dependent and independent assumptions.
2. Formulation of the problem
Let \(X_i\) be a sequence of random variables, each \(X_i\) takes the value \(0\) or \(1\):
\[ P = Pr(X_i=1) \; for \; any \; i. \]
The sequence of \(X_i\) forms a Markov chain, with the following conditional probabilities:
\[ \left\{\begin{matrix} & P_{hh} =& Pr(X_i=1 | X_{i-1} =1), \\ 1-P_{hh} =& P_{mh} =& Pr(X_i=0 | X_{i-1}=1), \\ & P_{hm} =& Pr(X_i=1 | X_{i-1}=0), \\ 1-P_{hm} =& P_{mm} =& Pr(X_i=0 | X_{i-1}=0) \\ \end{matrix}\right. \]
For a particular \(i=n\), let \(P_n = Pr(X_n=1)\), and \(P_{n-1} = P = P_n\).
For the Markov model, one has \(P_n = P_{n-1} P_{hh} + (1 – P_{n-1})P_{hm}\).
This leads to \(P = P P_{hh} + (1-P) P_{hm}\), and:
\begin{equation} \label{org73c46b2} P_{hh} = 1 – \frac{1-P}{P} P_{hm} \end{equation}Thus, the 4 conditional probabilities:
- \(P_{hh}\)
- \(P_{mm}\)
- \(P_{hm}\)
- \(P_{mh}\)
are determined from a specified value of \(P\), and either \(P_{hm}\) or \(P_{hm}\).
Denote the probability of having \(j\) hits in \(K\) trials as:
\[ \Phi_{j,K} (P, P_{hh}) \]
or
\[ \Phi_{j,K} (P, P_{hm}) \]
For either case, use \(\Phi_{j,K}\) as a shorthand.
Denote the probability of having \(j\) or fewer successes:
\begin{equation} \label{org4933a54} S(K, j; P) = \sum_{i=0}^{j} \Phi_{i,K} \end{equation}3. The solution schematic
Figure 1: Figure A from Ladd 1975. The schematic representation of the various outcomes of a series of trials.
To read the schematic:
- Each vertical line represents one trial.
- Nodes on the lines are the possible numbers of successes, from \(0\) on the top to \(K\) on the bottom.
- To each any point, e.g. \(\Phi_{2,3}\) on the graph, one goes from the left to the right.
Using the Markov chain assumption, the probability of \(\Phi_{j,K}\) depends on the probabilities of the immediate previous trial. Formally, this is:
\begin{equation} \label{orgf56ec61} \Phi_{j,K} = \eta_{j,K} + \xi_{j,K} \end{equation}where:
- \(\eta_{j,K} \equiv Pr(\text{j successes in K trials, the Kth one fails})\). This is graphically represented as an upward edge pointing from the left towards the point \(\Phi_{j,K}\).
- \(\xi_{j,K} \equiv Pr(\text{j successes in K trials, the Kth one succeeds})\). This is graphically represented as a downward edge pointing from the left towards the point \(\Phi_{j,K}\).
Equation \eqref{orgf56ec61} defines the transitional relationships. Here are the starting values:
\begin{equation} \left\{\begin{matrix} \eta_{0,1} =& 1-P \\ \eta_{1,1} =& 0 \\ \Phi_{0,1} =& 1-P \\ \xi_{0,1} =& 0 \\ \xi_{1,1} =& P \\ \Phi_{1,1} =& P \\ \end{matrix}\right. \end{equation}Then, the general solution is:
\begin{equation} \label{org51d9b00} \left\{\begin{matrix} \eta_{0,K} =& \eta_{0,K-1}P_{mm} = (1-P) P_{mm}^{K-1} \\ \xi_{0,K} =& 0 \\ \eta_{j,K} =& \xi_{j,K}P_{mh} + \eta_{j,K-1}P_{mm} \\ \xi_{j,K} =& \eta_{j-1,K-1}P_{hm} + \xi_{j-1,K-1}P_{hh},\; (j=1,2,\cdots,K-1) \\ \eta_{K,K} =& 0 \\ \xi_{K,K} =& \xi_{K-1,K-1}P_{hh} = P P_{hh}^{K-1} \\ \end{matrix}\right. \end{equation}
4. Python implementation
4.1. Recursive version
It is noticed that \eqref{org51d9b00} is written in a recursive manner, therefore it is most natural to code it using also recursive functions. Below is the Python implementation.
Firstly, the pmf()
function that computes the probability mass function:
import numpy as np def pmf(j, k, p, phh=None, phm=None): '''Compute the probability mass function binomial distr with dependent trials, recursive version Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. Keyword Args: phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. <phh> and <phm> must be provided at least one. If both are given, <phm> is computed from <phh>. Returns: res (float): probability of j successes in k trials. ''' assert j >= 0, "<j> must >= 0." assert k >= 0, "<k> must >= 0." assert p >= 0, "<p> must >= 0." if j == 0 and k == 1: return 1 - p if j == 1 and k == 1: return p if phh is None and phm is None: raise Exception("Either <phh> or <phm> must be provided.") if phh is None: phh = 1 - (1-p) / p * phm else: phm = (p - p*phh) / (1 - p) pmh = 1 - phh pmm = 1 - phm f_jk = compute_jkf(j, k, p, pmm, pmh) s_jk = compute_jks(j, k, p, phh, phm) res = f_jk + s_jk return res
The pmf()
function calls 2 additional functions:
compute_jkf()
: computes the \(\eta_{j,K}\) term.compute_jks()
: computes the \(\xi_{j,K}\) term.
These 2 are the recursive functions:
from functools import lru_cache @lru_cache(maxsize=1024) def compute_jkf(j, k, p, pmm, pmh): '''Compute the probablity of j successes in k trials, with the kth one fails Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. pmm (float): probability of fail given its previous trial is fail. pmh (float): probability of fail given its previous trial is success. Returns: res (float): probability of j successes in k trials, with the kth one fails. ''' if j == 0 and k == 1: return 1 - p if j == 1 and k == 1: return 0 if j == 0: return (1 - p) * pmm**(k-1) if j == k: return 0 phh = 1 - pmh phm = (p - p*phh) / (1 - p) return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm @lru_cache(maxsize=1024) def compute_jks(j, k, p, phh, phm): '''Compute the probablity of j successes in k trials, with the kth one succeeds Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. Returns: res (float): probability of j successes in k trials, with the kth one succeeds. ''' if j == 0 and k == 1: return 0 if j == 1 and k == 1: return p if j == 0: return 0 if j == k: return p * phh**(k-1) pmh = 1 - phh pmm = 1 - phm return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh
Note that we are using the lru_cache
decorators to cache
intermediate results. I have another post that talks a bit more on
caching and lazy evaluation.
Now a simple test run: when \(P_{hh} = 0.5\), i.e. one successful trial infers
no bias for the next one, it should fall back to a binomial
distribution. So we compare the result with that from scipy.stats.binom
:
K = 100 j = 50 p = 0.5 phh = 0.5 prob1 = pmf(j, K, p, phh) print(prob1) from scipy.stats import binom prob2 = binom.pmf(j, K, p) print(prob2)
The output:
0.0795892373871788 0.07958923738717875
The above recursive implementation works all fine for relatively small-sized problems, but will fail once the \(K\) number goes large.
This is because Python’s recursion depth is quite limited, to about 1000. Recursive functions (including tail-recursions) will fail if the stack is full. See the below example call:
# large scale, recursive version K = 1000 j = 5 p = 0.5 phh = 0.5 prob = pmf(j, K, p, phh) print(prob)
Running the above snippet will trigger the following error:
Traceback (most recent call last): File "dependent_binomial.py", line 318, in <module> prob4 = pmf(j, K, p, phh) File "dependent_binomial.py", line 52, in pmf f_jk = compute_jkf(j, k, p, pmm, pmh) File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 112, in compute_jks return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 112, in compute_jks return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 112, in compute_jks return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 112, in compute_jks return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm File "dependent_binomial.py", line 84, in compute_jkf return compute_jks(j, k-1, p, phh, phm) * pmh + compute_jkf(j, k-1, p, pmm, pmh) * pmm [Previous line repeated 486 more times] File "dependent_binomial.py", line 112, in compute_jks return compute_jkf(j-1, k-1, p, pmm, pmh) * phm + compute_jks(j-1, k-1, p, phh, phm) * phh File "dependent_binomial.py", line 72, in compute_jkf if j == 0 and k == 1: RecursionError: maximum recursion depth exceeded in comparison
It is possible to enlarge this maximum recursion depth. But I will illustrate another way, which is to change the recursive calls into for loops.
4.2. Non-recursive version
Referring back to the schematic in Figure 1, it is noticed that to get to a certain point \(\Phi_{j,K}\) on the graph, it works from the left most end to the right.
We can re-draw the schematic graph in a slightly different way, as shown in Figure 2 below:
Figure 2: Similar solution schematic in Figure 1 drawn on a regular grid.
We put all the target nodes of \(\Phi_{j,K}\) on a regular grid, and draw the \(\eta_{j,K}\) transitions as blue, horizontal edges, and \(\xi_{j,K}\) transitions as red, diagonal edges.
From Equations \eqref{org51d9b00}, it is further noticed that we only take a top-down step in the computation of \(\xi_{j,K} = \eta_{j-1,K-1}P_{hm} + \xi_{j-1,K-1}P_{hh}\). The computation of \(\eta_{j,K} = \xi_{j,K}P_{mh} + \eta_{j,K-1}P_{mm}\) then only takes a left-right step from \(K-1\) to \(K\).
So, we will be building 2 2D arrays, one for \(\eta\) and one for \(\xi\) terms, both with only the upper triangle filled. \(j\) will be the row index, and \(K\) the column index.
Here is the non-recursive version:
def compute_jksf(j, k, p, pmm, pmh, phh, phm): '''Compute the transition probabilities leading to the prob of j successes in k trials Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. pmm (float): probability of fail given its previous trial is fail. pmh (float): probability of fail given its previous trial is success. phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. Returns: farr (ndarray): 2d array with shape (k+1, k+1), the (j,k) element is the probability of j successes in k trials and the kth one is fail. sarr (ndarray): 2d array with shape (k+1, k+1), the (j,k) element is the probability of j successes in k trials and the kth one is success. ''' farr = np.zeros([k+1, k+1]) sarr = np.zeros([k+1, k+1]) for kk in range(1, k+1): for jj in range(0, kk+1): if jj == 0: sjk = 0 fjk = (1-p) * pmm**(kk-1) elif jj == kk: sjk = p * phh**(kk-1) fjk = 0 else: sjk = farr[jj-1, kk-1] * phm + sarr[jj-1, kk-1]*phh fjk = sarr[jj, kk-1] * pmh + farr[jj, kk-1] * pmm farr[jj,kk] = fjk sarr[jj,kk] = sjk return farr, sarr
The \(\xi_{j,K}\) term is computed using:
sjk = farr[jj-1, kk-1] * phm + sarr[jj-1, kk-1]*phh
And the \(\eta_{j,K}\) term is computed using
fjk = sarr[jj, kk-1] * pmh + farr[jj, kk-1] * pmm
This compute_jksf()
function will be called by a new pmf2()
function to compute the probability mass function:
def pmf2(j, k, p, phh=None, phm=None): '''Compute the probability mass function binomial distr with dependent trials, non-recursive version Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. Keyword Args: phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. <phh> and <phm> must be provided at least one. If both are given, <phm> is computed from <phh>. Returns: res (float): probability of j successes in k trials. ''' assert j >= 0, "<j> must >= 0." assert k >= 0, "<k> must >= 0." assert p >= 0, "<p> must >= 0." if j == 0 and k == 1: return 1 - p if j == 1 and k == 1: return p if phh is None and phm is None: raise Exception("Either <phh> or <phm> must be provided.") if phh is None: phh = 1 - (1-p) / p * phm else: phm = (p - p*phh) / (1 - p) pmh = 1 - phh pmm = 1 - phm f_jk, s_jk = compute_jksf(j, k, p, pmm, pmh, phh, phm) res = f_jk[j, k] + s_jk[j, k] return res
Now the new pmf2()
function is capable of handling large scale
problems:
# large scale, non-recursive version K = 1000 j = 250 p = 0.6 phh = 0.5 prob3 = pmf2(j, K, p, phh) print(prob3)
The output is:
7.6996966658670975e-289
4.3. Further acceleration by Cython
The above compute_jksf()
function is traversing 2D arrays using pure
Python, and this will be slow. As the computation has inherent
sequential dependence (later terms depend on the earlier ones), this
can’t be vectorized.
One easy way to speed up this computation is to write statically typed code using Cython, and import the compiled module.
Here is my Cython implementation, saved into a sub-folder:
dependent_binom/dependent_binom.pyx
:
cimport cython import numpy as np cimport numpy as np np.import_array() @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def compute_jksf(int j, int k, double p, double pmm, double pmh, double phh, double phm): '''Compute the transition probabilities leading to the prob of j successes in k trials Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. pmm (float): probability of fail given its previous trial is fail. pmh (float): probability of fail given its previous trial is success. phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. Returns: farr (ndarray): 2d array with shape (k+1, k+1), the (j,k) element is the probability of j successes in k trials and the kth one is fail. sarr (ndarray): 2d array with shape (k+1, k+1), the (j,k) element is the probability of j successes in k trials and the kth one is success. ''' cdef int jj, kk cdef double sjk, fjk cdef np.ndarray[np.double_t, ndim=2] farr = np.zeros([k+1, k+1]) cdef np.ndarray[np.double_t, ndim=2] sarr = np.zeros([k+1, k+1]) for kk in range(1, k+1): for jj in range(0, kk+1): if jj == 0: sjk = 0. fjk = (1-p) * pmm**(kk-1) elif jj == kk: sjk = p * phh**(kk-1) fjk = 0. else: sjk = farr[jj-1, kk-1] * phm + sarr[jj-1, kk-1]*phh fjk = sarr[jj, kk-1] * pmh + farr[jj, kk-1] * pmm farr[jj,kk] = fjk sarr[jj,kk] = sjk return farr, sarr
Here, we statically type all index variables (e.g. jj
, kk
) to int
,
and all probability values/arrays to double
. The rest of the code is
identical to the pure Python version.
Then I prepare a simple setup.py
script, saved into the same
dependent_binom/setup.py
folder:
from setuptools import setup, Extension from Cython.Build import cythonize import numpy extension = Extension('dependent_binom', ['./dependent_binom.pyx',], include_dirs=[numpy.get_include()],) setup(name='dependent_binom', ext_modules=cythonize(extension, annotate=True), zip_safe=False)
To compile the code, execute the following command:
python setup.py build_ext --inplace
If everything goes well, this should create a
dependent_binom.c
and something like
dependent_binom.cpython-38-x86_64-linux-gnu.so
in the
dependent_binom
folder.
To use this Cython version:
from dependent_binom import dependent_binom dependent_binom.compute_jksf(j, k, p, pmm, pmh, phh, phm)
A speed comparison is done:
# compute_jksf, python version K = 1000 j = 250 p = 0.6 phm = (p - p*phh) / (1 - p) pmh = 1 - phh pmm = 1 - phm t1 = time.time() farr, sarr = compute_jksf(j, K, p, pmm, pmh, phh, phm) t2 = time.time() print('compute_jksf, Python version time:', t2-t1) # compute_jksf, cython version t3 = time.time() farr2, sarr2 = dependent_binom.compute_jksf(j, K, p, pmm, pmh, phh, phm) t4 = time.time() print('compute_jksf, cython version time:', t4-t3) print(np.allclose(farr, farr2)) print(np.allclose(sarr, sarr2)) print((t2-t1) / (t4-t3))
The output:
compute_jksf, Python version time: 0.5290002822875977 compute_jksf, cython version time: 0.0044057369232177734 True True 120.0707830510309
So about 120x speed up!
4.4. Cumulative distribution function
Having got the probability mass function ready, the CDF function is
just a summation of our previous defined farr
and sarr
arrays,
up to the j
th row and at the k
th column:
def cdf(j, k, p, phh=None, phm=None): '''Compute the CDF of binomial distr with dependent trials, non-recursive cython version Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. Keyword Args: phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. <phh> and <phm> must be provided at least one. If both are given, <phm> is computed from <phh>. Returns: res (float): probability of j or fewer successes in k trials. ''' assert j >= 0, "<j> must >= 0." assert k >= 0, "<k> must >= 0." assert p >= 0, "<p> must >= 0." if phh is None: phh = 1 - (1-p) / p * phm else: phm = (p - p*phh) / (1 - p) pmh = 1 - phh pmm = 1 - phm farr, sarr = dependent_binom.compute_jksf(j, k, p, pmm, pmh, phh, phm) res = farr[:(j+1), k].sum() + sarr[:(j+1), k].sum() return res
Note that the cdf()
function is calling our Cython implemented
compute_jksf()
function. An example:
K = 10 j = 5 p = 0.6 phh = 0.5 out = cdf(j, K, p, phh) print('Cython version CDF:', out)
The output:
Cython version CDF: 0.3381835937500002
4.5. NEW even further acceleration
After some more examinations of the code, I realized that the above
Cython version of the compute_jksf()
function can be further
optimized:
It turns out that we don’t actually need to store the full k+1
columns of the farr
and sarr
matrices. When updating the
transitional probabilities, we are only referencing the immediate
previous trials by using the kk-1
indices (hence the
1st-order Markov model):
sjk = farr[jj-1, kk-1] * phm + sarr[jj-1, kk-1]*phh fjk = sarr[jj, kk-1] * pmh + farr[jj, kk-1] * pmm
and the updated values are saved to the kk
column:
farr[jj,kk] = fjk sarr[jj,kk] = sjk
So, we can just use 2 columns, one for the previous trials, and the other for the new updates, and in the next iteration we just rotate the 2 columns. I only show the updated Cython version below:
@cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def compute_jksf(int j, int k, double p, double pmm, double pmh, double phh, double phm): '''Compute the transition probabilities leading to the prob of j successes in k trials Args: j (int): number of successes. k (int): number of trials. p (float): probability of any trial being success. pmm (float): probability of fail given its previous trial is fail. pmh (float): probability of fail given its previous trial is success. phh (float): probability of success given its previous trial is success. phm (float): probability of success given its previous trial is fail. Returns: farr (ndarray): 1d array with length (k+1), the jth element is the probability of j successes in k trials and the kth one is fail. sarr (ndarray): 1d array with shape (k+1), the jth element is the probability of j successes in k trials and the kth one is success. ''' cdef int jj, kk cdef double sjk, fjk cdef np.ndarray[np.double_t, ndim=2] farr = np.zeros([k+1, 2]) cdef np.ndarray[np.double_t, ndim=2] sarr = np.zeros([k+1, 2]) for kk in range(1, k+1): for jj in range(0, kk+1): if jj == 0: sjk = 0. fjk = (1-p) * pmm**(kk-1) elif jj == kk: sjk = p * phh**(kk-1) fjk = 0. else: sjk = farr[jj-1, 0] * phm + sarr[jj-1, 0]*phh fjk = sarr[jj, 0] * pmh + farr[jj, 0] * pmm farr[jj,1] = fjk sarr[jj,1] = sjk farr[:, 0] = farr[:, 1] sarr[:, 0] = sarr[:, 1] return farr[:, 0], sarr[:, 0]
To highlight the differences:
- the
farr
andsarr
arrays now have shape[k+1, 2]
. the previous
kk-1
column indexing is replaced with0
:sjk = farr[jj-1, 0] * phm + sarr[jj-1, 0]*phh fjk = sarr[jj, 0] * pmh + farr[jj, 0] * pmm
at the end of each
kk
iteration, rotated the columns:farr[:, 0] = farr[:, 1] sarr[:, 0] = sarr[:, 1]
lastly, return only the 1st column:
return farr[:, 0], sarr[:, 0]
A bit of test:
# compute_jksf, python version K = 1000 j = 250 p = 0.6 phm = (p - p*phh) / (1 - p) pmh = 1 - phh pmm = 1 - phm t1 = time.time() farr, sarr = compute_jksf(j, K, p, pmm, pmh, phh, phm) t2 = time.time() print('compute_jksf, Python version time:', t2-t1) # compute_jksf, cython version t3 = time.time() farr2, sarr2 = dependent_binom.compute_jksf_old(j, K, p, pmm, pmh, phh, phm) t4 = time.time() print('compute_jksf, cython version time:', t4-t3) print(np.allclose(farr, farr2[:,-1])) print(np.allclose(sarr, sarr2[:,-1])) print((t2-t1) / (t4-t3)) t3 = time.time() farr3, sarr3 = dependent_binom.compute_jksf(j, K, p, pmm, pmh, phh, phm) t4 = time.time() print('compute_jksf, new cython version time:', t4-t3) print(np.allclose(farr, farr3)) print(np.allclose(sarr, sarr3)) print((t2-t1) / (t4-t3))
The output:
compute_jksf, Python version time: 0.4931979179382324 compute_jksf, cython version time: 0.003989219665527344 True True 123.63267989481234 compute_jksf, new cython version time: 0.0022072792053222656 True True 223.44156405271116
So we managed to push the speed-up from 123
to 223
. Besides, this
is also more memory efficient, as we reduced 2 k * k
matrices to
k * 1
.
One more note, when using this updated version of compute_jksf()
,
the cdf()
function needs a small change:
Change
res = farr[:(j+1), k].sum() + sarr[:(j+1), k].sum()
to
res = farr[:(j+1)].sum() + sarr[:(j+1)].sum()
as both farr
and sarr
are now 1d vectors.
5. The effect of dependent trials
I did some simple tests on the effects of trial dependency, by comparing the pmf curves of dependent binomials with different levels of \(P_{hh}\) values, against the independent binomial. In all cases:
- \(K = 100\)
- \(p = 0.5\)
Below is the Python code:
K = 100 p = 0.5 js = np.arange(0, K+1) p_independent = binom.pmf(js, K, p) figure, ax = plt.subplots() ax.plot(js, p_independent, 'b-', label='Independent') for phh in [0.55, 0.6, 0.7, 0.8, 0.9]: p_dependent = [] for jj in js: p_dep = pmf2(jj, K, p, phh) p_dependent.append(p_dep) ax.plot(js, p_dependent, label='Dependent (phh=%.2f)' %phh) ax.legend() ax.grid(True, axis='both') ax.set_xlabel('j') ax.set_ylabel('Pr') figure.show()
And here is the output figure:
Figure 3: PDF of dependent vs independent binomial distributions. In all cases, the number of trials is K=100, simple trial success probability is P=0.5.
So the effect of positive serial correlation (\(P_{hh} > 0.5\)) tends to flatten out the probability distribution. Even with a slight preference for a hit to be followed by anothe (\(P_{hh} = 0.55\)), the bell curve already shows a notable drop in peak height.
6. Summary
This post covers a bit of the binomial distribution with dependent trials, based on the paper Ladd, Daniel W., 1975: An Algorithm for the Binomial Distribution with Dependent Trials.
Note that this doesn’t cover all of the paper’s content. For instance, the author talked about the determination of confidence intervals based on independent or dependent binomial models. This has potential implications on the comparison of distribution quantiles, a topic that has taken my interests recently. I might post a follow-up covering some of the rest of the interesting things in the paper.
Created: 2022-09-20 Tue 21:24