Whether we create technical, marketing, or creative content (sometimes, it's all the above), there’s a way of writing that makes sense to us. Naturally, when working on content with people from other teams, everyone digs in and defends their version.
However, one of the best ways to know for sure if content works is to ask the people we're writing for. The evidence calls a cease-fire.
Instead of arguing the other side round to our opinion, we should make it a habit to ask what content testing data tells us about user needs (not writer needs).
This will shift the conversation from what we think to what our readers and customers think. Enter: content testing.
Content testing is a form of UX research designed to assess content quality and performance. It answers questions such as:
As a result, it helps content strategists (and marketers in general) to improve user experience and, ultimately, generate better results from content. What kinds of results?
Many factors directly impact user experience and content performance. These include language and readability, as well as accessibility and searchability. While we marketers can often pinpoint issues across those areas ourselves, that's not always the case.
So, one benefit of usability testing is that it reveals problem areas we may not have noticed. Another is that we can see those problems through the eyes of our audiences, learning from their expectations and how they experience our content (as opposed to making decisions based on our best guesstimates of those things).
To understand whether or not your content works, there’s no substitute for user testing. So how do you do it?
I’ll run through four approaches—A/B testing, the Cloze Test, five-second testing, and the tap test—and share some thoughts on their effectiveness.
A/B testing is one of the most common types of testing, which involves pitting different variations of live content against one another.
Especially if a piece of content is underperforming, identify some changes that could help it reach your goals. Put these in order of importance and define what test results would be ideal. Or, in other words, what metrics you'd like to improve and by how much.
Starting with the change that’s likely to have the biggest positive impact, create a content variation. This could involve changing a headline or call-to-action, changing the color of a button, or adding a section of content that addresses user needs, questions or objections.
Once you publish a variation, give it some time and then measure the results. If the content is performing better, try out another improvement from your list. If it's still underperforming, go back to step one and change something else. Rinse and repeat until you’re happy with the content’s performance.
This quantitative approach allows marketers to gauge unbiased responses to content. That’s because A/B tests are done with live content and without users knowingly being tested.
However, A/B testing many variations one by one can take a while and require a lot of resources to continually create variations. Plus, it doesn’t yield commentary on why one variation is more or less effective.
Next is the Cloze Test, which uses a written comprehension exercise.
Let's start with a simple sentence:
“Yesterday, I went down to the supermarket to buy some fruit, toilet paper, and laundry detergent.”
Now, remove every sixth word and replace it with a blank (you can remove words less often if you like):
“Yesterday I went to the _________ to buy some fruit, toilet ______, and laundry detergent.”
Give the text to a user and ask them to fill in the blanks. Let’s say the user writes:
“Yesterday, I went down to the grocery store to buy some fruit, toilet paper, and laundry detergent.”
Count how many times the user inserts the “correct” word.
When you mark the answers, look for whether the user has captured the passage's intent, rather than matching your word choice exactly. For instance, I’m writing this in Australia, where we say “supermarket.” But if someone wrote “grocery store” instead, that would still be consistent with the logic of the sentence, so I’d mark it as correct.
If the user gets more than 60% right, the subject matter is reasonably clear.
The Cloze Test has plenty of support and has a solid pedigree in teaching English as a second language. But it has some downsides for testing content.
The result looks definitive: The word the user chooses either matches our version or doesn’t. But a match doesn’t get us any closer to knowing whether the content works. Here’s why:
Let’s tease out the last point—filling in blanks versus being useful. Can you guess the next shape in the series?
It's a circle, right? So the pattern is clear. But is this row of shapes useful? Does it help you to do anything?
That’s why this approach is attractive but artificial. It looks scientific without indicating how people use your content in the real world.
Another method of content testing takes place over just five seconds.
Choose the content you'll test. Then, define what the focal point of that content should be and/or what its purpose is.
For example, you’d most likely want your headline or unique value proposition to be the focal point of a landing page hero section. And as for UI copy, the goal may be to help users navigate with ease.
Show the piece of content you chose to your test participants. But only give them five seconds to process what they see.
Use open-ended questions to understand:
To be fair, a few seconds isn't long. Especially considering that in a normal setting, users would be able to take their time to grasp the subject matter. This approach doesn't account for that disparity. But it can still be helpful to understand how clear your content is and if your approach to content design is effective.
This test observes user behavior as they’re reading the text. It's pretty straightforward to run. You ask a user to read a passage of text and tap a steady beat as they go. In theory, the parts they slow down on are the parts that are hard to read.
I don't think so. There are two problems with the tap test.
First, it’s unnatural. When we’re testing, we want to observe users in the wild, behaving as they would normally. We want the data to reflect how they’ll interact with our content in the real world.
Asking them to tap distorts the environment. Nobody (except for drummers!) taps a steady beat while they’re doing something else. Introducing this task requires extra concentration and distracts the user from the text—the very thing we’re trying to measure.
Second, it’s misleading. Slowing down could indicate that the text is difficult to read, but that's not the only explanation. There are plenty of other times when we might slow down, such as when:
When I applied for a visa, the form instructions were pretty straightforward. But I still read everything three times because I really wanted to get it right. A tap test would say the content was problematic. My experience says otherwise.
Like the Cloze Test, the tap test is attractive because it gives us data, but it’s ultimately unhelpful. It doesn’t get us any closer to answering the question: Can people use this content?
To recap, what are some of the advantages and disadvantages of the methods we’ve covered so far?
Plenty of quantitative approaches will indicate whether content works. Just a few examples include:
Quantitative approaches can be useful in that they're easy to scale up and can capture trends across a whole site. For instance, Google Analytics shows people's behavior interacting with our content. Where do they enter the site? What do they do next?
However, while they do tell us the 'what,' they don’t tell us the ‘why.’ Why are people leaving our site after five seconds? What’s going on for them?
And what about first-click testing, tree testing, and such? Qualitative methods can also explain how people navigate through your content.
Yet, even these methods skim across the surface. They don't tell you what happens when your users meet the content.
So what type of testing does give you that information?
Task-based testing involves giving users some content and asking them to complete a related task. Here’s how it can be done.
Frame the task in a way that makes the content necessary. Why? User testing tasks that evaluate information architectures and interfaces are often too broad to test the content.
So, for instance, you could instruct users to “find information on bin collections.” The test would end when the user lands on the relevant page. However, a better prompt would be something like:
“You’ve noticed that your bin was not collected on time. Find out what to do about this.”
This second version draws in information from the content and would better help you assess whether the content is appropriate for your audience.
Structure and content work together. Content informs how people use the structure and vice versa. So don’t just hand them a printout of plain text in a Word document. Print out a web page. Even better, give them a high-fidelity prototype (wireframes with actual content) or the website itself.
I use an iPad because it’s shiny and saves wrangling bits of paper.
You can gather feedback using the talk-aloud or markup techniques.
Talk-aloud is the classic user testing technique, which involves asking people to verbalize their experiences as they go.
On the other hand, Markup asks users to highlight any sections they find confusing. Different versions of this test ask people to markup both sections they find useful AND sections they find confusing. I prefer to keep it simple and capture the confusing parts for two reasons.
If you like, you can add a reflection question at the end of the test that does capture the emotional response or brand experience.
Once you have the feedback from sessions, collate it. If there’s a passage with a lot of red, you know that the content has problems.
Regardless of whether you go with a task-based approach or not, though, what basic process should you follow for best results.? Here are five steps.
Peep Laja of Wynter rightly said:
Peep continued, “There are four heuristics you need to measure:
For each heuristic, do a Likert Scale (1...5) and ask qualitative, open-ended questions. Ideally, you do this section by section so you know exactly where the problems are and can fix them.”
Next up, determine what type of test is most likely to measure your content's performance accurately. The pros and cons of the five assessments above will help, as well as keeping your goals and heuristics in mind.
Once you create your test, don't immediately start gathering participants. Instead, do an internal review.
Allow time for your Editor in Chief and other key stakeholders to do the test and provide feedback. They may suggest worthy inclusions or even identify aspects of the test that may be confusing, unnecessary or influence participant feedback, skewing the test results.
Once your test has been reviewed and approved internally, begin gathering test users. Choose them wisely.
If you're testing content that was created for your audience as a whole, get a handful of participants from each demographic or psychographic subset of your audience. And, if you're evaluating content for a specific subset of your audience, limit participants to that group. This will ensure that the feedback you get is as accurate as possible.
With the testing phase finished, look for common threads across the feedback and plan for any necessary adjustments. Run A/B tests once changes are implemented to see how they impact performance.
Additionally, to get the maximum value out of a round of content testing, think about how the feedback could apply to other content your organization has produced. Often, it can. But, of course, it would be wise to test any updates you make before finalizing them.
As marketers, let's take a step back from our way of thinking about content creation and usefulness.
Ultimately, if it’s not easily understood by and practical for the people we’re producing it for, it’s of no use to us or them. This is why it’s necessary not just to do content tests but also to choose methods that yield accurate insights, helping us to make informed decisions.
The methods and process we’ve discussed here will give you a good starting point. However, I also recommend reading Steven Douglas’ quick guide on how to create and test content with a user-focused approach.
Whether we create technical, marketing, or creative content (sometimes, it's all the above), there’s a way of writing that makes sense to us. Naturally, when working on content with people from other teams, everyone digs in and defends their version.
However, one of the best ways to know for sure if content works is to ask the people we're writing for. The evidence calls a cease-fire.
Instead of arguing the other side round to our opinion, we should make it a habit to ask what content testing data tells us about user needs (not writer needs).
This will shift the conversation from what we think to what our readers and customers think. Enter: content testing.
Content testing is a form of UX research designed to assess content quality and performance. It answers questions such as:
As a result, it helps content strategists (and marketers in general) to improve user experience and, ultimately, generate better results from content. What kinds of results?
Many factors directly impact user experience and content performance. These include language and readability, as well as accessibility and searchability. While we marketers can often pinpoint issues across those areas ourselves, that's not always the case.
So, one benefit of usability testing is that it reveals problem areas we may not have noticed. Another is that we can see those problems through the eyes of our audiences, learning from their expectations and how they experience our content (as opposed to making decisions based on our best guesstimates of those things).
To understand whether or not your content works, there’s no substitute for user testing. So how do you do it?
I’ll run through four approaches—A/B testing, the Cloze Test, five-second testing, and the tap test—and share some thoughts on their effectiveness.
A/B testing is one of the most common types of testing, which involves pitting different variations of live content against one another.
Especially if a piece of content is underperforming, identify some changes that could help it reach your goals. Put these in order of importance and define what test results would be ideal. Or, in other words, what metrics you'd like to improve and by how much.
Starting with the change that’s likely to have the biggest positive impact, create a content variation. This could involve changing a headline or call-to-action, changing the color of a button, or adding a section of content that addresses user needs, questions or objections.
Once you publish a variation, give it some time and then measure the results. If the content is performing better, try out another improvement from your list. If it's still underperforming, go back to step one and change something else. Rinse and repeat until you’re happy with the content’s performance.
This quantitative approach allows marketers to gauge unbiased responses to content. That’s because A/B tests are done with live content and without users knowingly being tested.
However, A/B testing many variations one by one can take a while and require a lot of resources to continually create variations. Plus, it doesn’t yield commentary on why one variation is more or less effective.
Next is the Cloze Test, which uses a written comprehension exercise.
Let's start with a simple sentence:
“Yesterday, I went down to the supermarket to buy some fruit, toilet paper, and laundry detergent.”
Now, remove every sixth word and replace it with a blank (you can remove words less often if you like):
“Yesterday I went to the _________ to buy some fruit, toilet ______, and laundry detergent.”
Give the text to a user and ask them to fill in the blanks. Let’s say the user writes:
“Yesterday, I went down to the grocery store to buy some fruit, toilet paper, and laundry detergent.”
Count how many times the user inserts the “correct” word.
When you mark the answers, look for whether the user has captured the passage's intent, rather than matching your word choice exactly. For instance, I’m writing this in Australia, where we say “supermarket.” But if someone wrote “grocery store” instead, that would still be consistent with the logic of the sentence, so I’d mark it as correct.
If the user gets more than 60% right, the subject matter is reasonably clear.
The Cloze Test has plenty of support and has a solid pedigree in teaching English as a second language. But it has some downsides for testing content.
The result looks definitive: The word the user chooses either matches our version or doesn’t. But a match doesn’t get us any closer to knowing whether the content works. Here’s why:
Let’s tease out the last point—filling in blanks versus being useful. Can you guess the next shape in the series?
It's a circle, right? So the pattern is clear. But is this row of shapes useful? Does it help you to do anything?
That’s why this approach is attractive but artificial. It looks scientific without indicating how people use your content in the real world.
Another method of content testing takes place over just five seconds.
Choose the content you'll test. Then, define what the focal point of that content should be and/or what its purpose is.
For example, you’d most likely want your headline or unique value proposition to be the focal point of a landing page hero section. And as for UI copy, the goal may be to help users navigate with ease.
Show the piece of content you chose to your test participants. But only give them five seconds to process what they see.
Use open-ended questions to understand:
To be fair, a few seconds isn't long. Especially considering that in a normal setting, users would be able to take their time to grasp the subject matter. This approach doesn't account for that disparity. But it can still be helpful to understand how clear your content is and if your approach to content design is effective.
This test observes user behavior as they’re reading the text. It's pretty straightforward to run. You ask a user to read a passage of text and tap a steady beat as they go. In theory, the parts they slow down on are the parts that are hard to read.
I don't think so. There are two problems with the tap test.
First, it’s unnatural. When we’re testing, we want to observe users in the wild, behaving as they would normally. We want the data to reflect how they’ll interact with our content in the real world.
Asking them to tap distorts the environment. Nobody (except for drummers!) taps a steady beat while they’re doing something else. Introducing this task requires extra concentration and distracts the user from the text—the very thing we’re trying to measure.
Second, it’s misleading. Slowing down could indicate that the text is difficult to read, but that's not the only explanation. There are plenty of other times when we might slow down, such as when:
When I applied for a visa, the form instructions were pretty straightforward. But I still read everything three times because I really wanted to get it right. A tap test would say the content was problematic. My experience says otherwise.
Like the Cloze Test, the tap test is attractive because it gives us data, but it’s ultimately unhelpful. It doesn’t get us any closer to answering the question: Can people use this content?
To recap, what are some of the advantages and disadvantages of the methods we’ve covered so far?
Plenty of quantitative approaches will indicate whether content works. Just a few examples include:
Quantitative approaches can be useful in that they're easy to scale up and can capture trends across a whole site. For instance, Google Analytics shows people's behavior interacting with our content. Where do they enter the site? What do they do next?
However, while they do tell us the 'what,' they don’t tell us the ‘why.’ Why are people leaving our site after five seconds? What’s going on for them?
And what about first-click testing, tree testing, and such? Qualitative methods can also explain how people navigate through your content.
Yet, even these methods skim across the surface. They don't tell you what happens when your users meet the content.
So what type of testing does give you that information?
Task-based testing involves giving users some content and asking them to complete a related task. Here’s how it can be done.
Frame the task in a way that makes the content necessary. Why? User testing tasks that evaluate information architectures and interfaces are often too broad to test the content.
So, for instance, you could instruct users to “find information on bin collections.” The test would end when the user lands on the relevant page. However, a better prompt would be something like:
“You’ve noticed that your bin was not collected on time. Find out what to do about this.”
This second version draws in information from the content and would better help you assess whether the content is appropriate for your audience.
Structure and content work together. Content informs how people use the structure and vice versa. So don’t just hand them a printout of plain text in a Word document. Print out a web page. Even better, give them a high-fidelity prototype (wireframes with actual content) or the website itself.
I use an iPad because it’s shiny and saves wrangling bits of paper.
You can gather feedback using the talk-aloud or markup techniques.
Talk-aloud is the classic user testing technique, which involves asking people to verbalize their experiences as they go.
On the other hand, Markup asks users to highlight any sections they find confusing. Different versions of this test ask people to markup both sections they find useful AND sections they find confusing. I prefer to keep it simple and capture the confusing parts for two reasons.
If you like, you can add a reflection question at the end of the test that does capture the emotional response or brand experience.
Once you have the feedback from sessions, collate it. If there’s a passage with a lot of red, you know that the content has problems.
Regardless of whether you go with a task-based approach or not, though, what basic process should you follow for best results.? Here are five steps.
Peep Laja of Wynter rightly said:
Peep continued, “There are four heuristics you need to measure:
For each heuristic, do a Likert Scale (1...5) and ask qualitative, open-ended questions. Ideally, you do this section by section so you know exactly where the problems are and can fix them.”
Next up, determine what type of test is most likely to measure your content's performance accurately. The pros and cons of the five assessments above will help, as well as keeping your goals and heuristics in mind.
Once you create your test, don't immediately start gathering participants. Instead, do an internal review.
Allow time for your Editor in Chief and other key stakeholders to do the test and provide feedback. They may suggest worthy inclusions or even identify aspects of the test that may be confusing, unnecessary or influence participant feedback, skewing the test results.
Once your test has been reviewed and approved internally, begin gathering test users. Choose them wisely.
If you're testing content that was created for your audience as a whole, get a handful of participants from each demographic or psychographic subset of your audience. And, if you're evaluating content for a specific subset of your audience, limit participants to that group. This will ensure that the feedback you get is as accurate as possible.
With the testing phase finished, look for common threads across the feedback and plan for any necessary adjustments. Run A/B tests once changes are implemented to see how they impact performance.
Additionally, to get the maximum value out of a round of content testing, think about how the feedback could apply to other content your organization has produced. Often, it can. But, of course, it would be wise to test any updates you make before finalizing them.
As marketers, let's take a step back from our way of thinking about content creation and usefulness.
Ultimately, if it’s not easily understood by and practical for the people we’re producing it for, it’s of no use to us or them. This is why it’s necessary not just to do content tests but also to choose methods that yield accurate insights, helping us to make informed decisions.
The methods and process we’ve discussed here will give you a good starting point. However, I also recommend reading Steven Douglas’ quick guide on how to create and test content with a user-focused approach.
Matt Fenwick is a content strategist for government and peak bodies. He enjoys big hairy content projects, like rewriting government travel information for most of the countries in the world. Matt runs True North Content and lives in Canberra, Australia, which is the one city where kangaroos really do roam the streets at night. You can connect with him on LinkedIn.