A/B testing is a method of experimenting and measuring which of two (or more) variants of something like a UX element, ad copy, or an email message is more effective at generating the hoped-for results. In the email marketing world, for example, A/B testing is commonly used to test multiple versions of an email message with a subset of recipients before sending the strongest performer to an entire subscriber list.

Why Marketing and Transactional Emails Require Different Models for A/B Testing

Testing elements such as subject lines, images, or calls-to-action (CTAs) has become a routine part of most email marketers’ approach. That’s because the nature of traditional, list-based email marketing is characterized by linear workflows and discrete campaigns that readily can accommodate a straightforward A/B testing process. A simplified process might look something like: (1) Design, (2) Test, (3) Send campaign, (4) Gather results, (5) Rinse and repeat.

But what about email notifications and similar emails that are, by definition, uniquely generated in real-time? They don’t as easily lend themselves to that linear sort of approach. There is no recipient list, the messages themselves tend to be more dynamic, and the sending period is not easily bounded in time the way a marketing campaign might be.

Teams building SaaS and other digital products need to test their notifications and other app-generated messages “in stream” and in real time; they don’t have the luxury of a linear, campaign-style design-test-send workflow. Real-time experimentation, measurement, and adjustment is best handled programmatically—ideally, with the same code as notifications are sent.

That’s been a challenge for product teams, who’ve had to either make do with various kludgy approaches to grafting testing into their notifications and transactional emails—or implement complex testing scaffolding and infrastructure. Neither approach is ideal for many use cases. If only there was a simpler, more direct way to test these app-generated emails…

Challenge accepted.

SparkPost’s A/B Testing API for Transactional Email and Notifications

We’ve just released a new SparkPost API feature: our A/B Testing API.

Keeping true to our focus on transactional and app-generated email, we’ve built the feature to enable our customers to test API-driven, single recipient messages. As noted earlier, that’s in contrast to the one typically used by email service providers that focus on the email marketing use case. Rather than testing lists, we’ve built our A/B Testing feature to handle real-time, one-to-one use cases like app notifications.

You can configure SparkPost’s A/B Testing API to either simply report results of an A/B test—or automatically select a winner and set it to go live as soon as the test reaches a statistically significant threshold.

Choosing a winning message is not a simple matter of “did version 1 get more opens than version 2?” Making an accurate call requires statistical rigor to make sure the outcome isn’t simply due to chance. That’s why we incorporated a sophisticated algorithm for picking the winning variant of the message, called Bayesian statistical modeling.

Statistical Rigor in A/B Testing: SparkPost’s Bayesian Model

In this model, each variant is tested against a default. The question being answered is: how confident can we be that a particular message variant actually outperforms the default? (More precisely, what is the confidence interval that a given variant is better than the default?) If we are highly confident—say 95% confident—then we can declare a winner. If we don’t reach the desired confidence interval for any of the variants, then the default is the best one.

Some implementation details to note:

  • If you’re testing an existing production message against new variants, then the production message is the default. If this is a brand new message, then one of the variants is chosen to be the default.
  • The default confidence interval will be set to 95% but it’s configurable. There are some ramifications of that option.
  • The higher the confidence interval is set, the more confident you can be in the outcome. However, it also means that more messages need to be sent in the test to achieve it. Additionally, the higher the number of variants being tested, the more messages that will need to be sent in the test to achieve a given confidence interval.

The best part is that you’ll get the benefits of this sophisticated, rigorous statistical model—without needing a data scientist on your team. But if you want to learn more about the numbers and modeling behind our A/B test feature, we will discuss our statistical model in more detail in a future blog post.

How to Get Started with the A/B Testing API

Enough talk. Here’s how to get started with SparkPost’s A/B Testing API.

First, check out the support documentation that describes in more detail how SparkPost’s A/B Testing feature works. Of course, the corresponding API documentation has been updated as well.

A/B Testing is a compelling feature for all sorts of real-time messages like app notifications. It’s time to stop guessing about their performance. The SparkPost A/B Testing API makes it easy to test all your messages, including your app-generated ones, to make sure you’re doing everything you can to encourage user engagement.

By the way, although A/B testing today is an API-only feature, we are working on adding it as a configurable option in the SparkPost web app in the near future.

We Want Your Feedback—and Beta Testers!

We’d love to hear your feedback. Please share your questions and ideas on SparkPost’s community Slack channel.

If our new A/B Testing API feature has whet your appetite for more, I’d encourage you to sign up for SparkPost’s beta testing program. We’re actively looking for additional testers to help us refine a variety of features that we will be releasing in the future.

Happy Testing!