SparkPost recently released a new API feature for A/B testing transactional email and notifications. (Dig into the A/B Testing API documentation for usage details.)

A/B testing is a common practice for determining if a variation on certain aspects of an email campaign—such as different subject line, call to action, images, and so on—will positively affect engagement rates.

Simple A/B Testing vs. Bayesian A/B Testing

As we noted in the announcement blog post, SparkPost’s implementation of A/B testing uses a statistical algorithm for picking the winning variant of the message. Specifically, we built it using a Bayesian decision model. But what does that mean, exactly?

First, it’s in contrast to the most common approach used by email marketers. Most A/B testing relies on a simplistic, “frequentist” method. In this method, one must simply compare the engagement rates of the emails sent on the variants, and the variant with the higher engagement rate is the winner of the test.

One of the main reasons to not use a frequentist approach is the problem known as p-hacking while trying to determine a winner. Another problem of A/B testing is determining an appropriate window of time to run a test for. The volume of data sent can have a significant impact on how meaningful test results are. On the other hand, sending too much email to a “test” variant with bad engagement is something to be avoided. This is where Bayesian A/B testing comes into play.

Our Bayesian Decision Model

Given two email campaigns A and B with Ad and Bd as the count of delivered emails to each, and Ae and Be as the count of “engagements” for each campaign the probability that campaign B will lead to better engagement rates than A is given by this NodeJS function (using the mathfn  library).

const { logBeta } = require('mathfn');
const probabilityBbeatsA = (A_d, A_e, B_d, B_e) => {
   const [ alphaA, alphaB ] = [ A_e + 1, A_d - A_e + 1];
   const [ betaA, betaB ] = [ B_e + 1, B_d - B_e + 1];
   let total = 0;
   for (let j = 0; j < betaA; ++j) {
      total += Math.exp(logBeta(alphaA + j, alphaB + betaB) - Math.log(betaB + j) - logBeta(1 + j, betaB) - logBeta(alphaA, alphaB));
   return total;

If you’d like to learn more about of the logic used in this code sample, you can read an additional detailed mathematical explanation of of the statistical model we use.

Example Results

% Engaged A # Delivered A % Engaged B # Delivered B p(B>A)
10 10,000 11 100 68.94
25 100,000 26 1,000 77.24
10 10,000 11 1,000 84.98
10 100,000 11 1,000 86.15
20 100,000 25 100 90.36
10 1,000 15 100 94.58
10 10,000 15 100 95.58
25 100,000 26 10,000 98.60
10 1,000 20 100 99.82
10 100,000 11 10,000 99.92
10 100,000 15 1,000 99.99
20 100,000 25 1,000 99.99


As shown in the above table, the Bayesian method of testing can give more information than a pure frequentist approach. We now have confidence that the new variant will be better in the long term. Using a 95% confidence is industry standard in several applications, although anywhere between 90–99% may desired.

Using a Bayesian model for testing, we can also stop an A/B test as soon as we find a variant which beats the default template by the desired confidence threshold.

Why did we choose this particular Bayesian model? It is widely used across the industry for Bayesian A/B testing. Follow up can be found on these references and more!

  • (the model model used by VWO)
  • Our model also matches the results from

—Jason Sorensen
Lead Data Scientist