Lies, damn lies & statistics
My inbox over the last few days has been filling up with emails asking for the Thinkbox view on a recent report from Samsung Ads, which reveals some interesting analysis of their 3.7m TV sets that they’re able to collect viewing data across.
The headline finding was that since lockdown, streaming has overtaken linear TV in terms of time spent per person per day, with streaming or video on demand taking a 59% share of all TV set viewing.
For those just looking for our view on their numbers and how reflective we believe they are of TV viewing, please skip to the bottom of this article.
But, for those willing to indulge me, I thought this an opportune moment to spend a bit of time assessing how to best disentangle the significance of stats that we see reported in the media. After all, there are rather a lot of numbers to choose from, at the moment; I’m sure many of you will have seen the Swan/Trump interview. For those that haven’t, I’ll briefly explain.
A month or so ago, Donald Trump was interviewed by Jonathan Swan from Axios news, on how America was faring relative to other countries in handling the COVID pandemic. When questioned on the rising daily death rate, Trump scuffled about a bit before producing a couple of charts to demonstrate that America was ‘lower than the world’ and ‘lower than Europe’ in their death rates.
After a brief examination of the handouts Swan said, ‘Oh you’re talking about death rates as a proportion of cases, I’m talking about death rates as a proportion of population’.
Lies, damn lies & statistics
The challenge is that unless you’re a seasoned statistician, it’s difficult to assess what the correct stat is for any given situation and its significance.
In the above example, death rates from COVID as a proportion of cases is a good stat to use if you’re looking to assess the relative ability of your healthcare system to cure those infected. But if you’re looking for a measure to assess the relative performance of your government in preventing the spread of the virus, then death rates as a proportion of the population is a much better measure.
In media land we love to throw numbers around, but how can we tell the difference between a reliable, meaningful stat versus one that’s leading us to an incorrect conclusion?
Well, I’ve had a think and have identified three broad tests / questions to pose to help break down what we can and can’t deduce from a statistic. Here goes.
Why has this statistic or analysis been produced? As the marketing body for TV, we’re very conscious that people will ask this question on the data we produce. But it’s right to ask it. Healthy scepticism is good! What is the underlying interest in having this number in the public domain? Understanding this will help you assess any potential underlying bias.
What is being measured? The internet brought about big data sets, which provide a much deeper granularity of measurement than survey or panel-based data. But this is machine data, not people data. Machine data struggles with measures such as reach, as people use several devices and without logged-in user-based data it’s very difficult to measure cross-device usage.
Who is the stat representative of? If you conduct a survey on who people believe is the best football team in the world and you do so in the Old Trafford car park, don’t be surprised by the answer.
Putting this in practice
Let’s put this framework into practice using Samsung’s ‘Behind the Screens’ report.
Why has this statistic or analysis been produced?
The analysis was published by Samsung Ads, who are a new division of Samsung electronics set up to offer advertising solutions using their AVOD inventory and TV set data. Their main sell is to help advertisers build reach through identifying lighter linear TV viewers. Therefore, it’s in their interest to hype up the changing nature of TV viewing and the resulting need to use their services.
What is being measured?
Samsung’s data is based on devices (TV sets), not people, which has big implications for what they (and we) can and can’t deduce from their data.
Samsung Ads split their data into groups such as ‘streamers only’ whom they define as ‘viewers who only stream content’. These represent 14% of the Samsung TV set population.
But by stating that they’re looking at ‘viewers who only stream content’ they’re being misleading. This isn’t viewers who only watch streamed content, it’s TV sets that only play streamed content.
In my house we have three TVs: the main living room set which is connected to Sky and two other sets which aren’t connected to any linear platform and so are ‘streaming-only’ sets. Whilst we only watch streamed content on those sets, we watch lots of linear on the main living room set.
Just because 14% of Samsung’s TV sets are streaming-only, you can’t then interpret that to mean that 14% of viewers are streaming-only as they could be watching lots of linear TV on other household sets.
What we can deduce:
That 14% of recently bought Samsung TV sets are used primarily for streaming.
What we can’t deduce:
a) That 14% of people who recently bought Samsung TVs are streaming only viewers.
b) That it’s possible for Samsung Ads to identify and help advertisers reach light linear TV viewers (which they claim to be able to do in their report).
Who are their stats representative of?
In fairness to Samsung, the report acknowledges that their data isn’t representative of the population.
‘It’s important to note that Samsung Ads Smart TV viewer data is deterministic. It is not projected to a national population, but it represents behaviour from 30M+ Smart TV’s in Europe.’
The data only represents viewing on 3.7m recently purchased Samsung TV sets in the UK (i.e. 8% of all TV sets in the UK). Without a detailed breakdown on this audience we don’t know how this data sample differs from the population as a whole, but we do know that they have a brand spanking new telly and access to all the TV streaming services at a push of a button, so no surprise that viewing to streaming services on those sets is fairly high.
What we can deduce:
That streaming services account for more viewing than linear TV on 3.7m recently bought Samsung TV sets.
What we can’t deduce:
That streaming services account for more viewing than linear TV across either viewers in those households or amongst viewers in the whole population.
So, what is the real picture of viewing?
This is the latest Touchpoints data on video viewing. With fieldwork taking place before and during lockdown, it provides a handy view of how our behaviour changed following lockdown.
Touchpoints estimates linear TV (live and playback) at 3hrs and 12 mins per person per day and all forms of VOD (Broadcaster VOD, Subscription VOD, YouTube and other VOD) at 56 minutes per person per day. Here linear viewing accounts for 77% of all TV set viewing.
For anyone interested in a second data source, BARB data for June 2020 puts linear (including any broadcaster VOD viewed on the TV set within 28 days of broadcast) at 78% and unknown which covers all SVOD, YouTube, Broadcaster post 28 day VOD & DVD viewing at 22% amongst all adults (excludes TV set use where a games console is the content source).
For those who wanted to cut to the chase
The Samsung Ads TV viewing data is an interesting, but unrepresentative view, which reflects the viewing behaviour on 8% of all UK TV sets and is skewed towards viewing on new TVs with the latest streaming tech.
Samsung claim to be able to help advertisers reach light linear TV viewers. However, they have no sight of what type of TV is being watched on other sets in a household, so advertisers should enquire about the accuracy of this product.
Industry ratified sources of viewing behaviour; BARB and Touchpoints, both estimate linear TV to account for roughly 77% of all TV set viewing across the population as a whole, as opposed to the 41% suggested by Samsung.
So, there you have it. Hopefully a handy guide to help you be a bit less Trump and a bit more Swan in your use of statistics.