a ruler

How Deep are Your Metrics?

An article by Ev Williams (one of the founders of Twitter) in Medium got a lot of buzz in the business world. It is amazing what a little trash talking among tech capitalists can do. But his point is really important and I want to apply it to human factors today.

Of course, I am trivializing what Instagram is to many people. It’s a beautifully executed app that enables the creation and enjoyment of art, as well as human connection, which is often a good thing. But my rant had very little to do with it (or with Twitter). My rant was the result of increasing frustration with the one-dimensionality that those who report on, invest in, and build consumer Internet services talk about success.

What he said is that we often choose metrics based on how easy they are to obtain, compare, and explain to others rather than whether they provide real insight into the performance of the system you are measuring. In his case, he was comparing Instagram and Twitter. The business press was reporting that Instagram had grown “bigger” than Twitter. This was based on the fact that Instagram had more users and more monthly unique visitors than Twitter did. Ev responded to this by saying that there is a difference between world leaders having conversations (on Twitter) and people looking at pretty pictures (on Instagram).

Perhaps his message was lost in part because of how he trivialized Instagram, but his point is very true. Monthly unique visitors is a very shallow metric. How much time did the users’ spend? Or even better, how engaged were they with the content? Or even better than that, how much value did they get from their use? Did they enjoy the time? Did it give them any new insights? How much did the interaction affect their lives? Depending on the purpose of the company, these deeper questions will differ in relevance. But these deep questions are what you want base your metrics are.

My Take

We always have to use metrics that are feasible to measure in a valid, reliable, and sensitive way. I am not sure how to measure how much Twitter has affected my own life, let alone a big enough sample to evaluate it at the system level. But we do need to get deeper than the HEATS (hits, errors, accuracy, time, satisfaction) that dominate user testing in human factors. Otherwise, we end up with animated banner ads, which are great at attracting users’ attention and increase memory of the ad content compared to static ads. But they lower brand image and the volume of future purchases. Success? I don’t think so. If someone spends a lot of time on a web page, is that because it was very interesting or very confusing? Hard to say with just time as our metric. Satisfaction is a good complementary metric, but these are fraught with validity issues because of social desirability biases.

In many cases, we need to develop longitudinal research methods that evaluate use over time. This better reflects user interaction with the systems we are trying to evaluate than an artificial lab tests that put novice users in front of an unfamiliar interface to do tasks they don’t really care about without the distractions of their normal daily lives and without the time pressure that comes with life.

Your Turn

What metrics do you use that overcome these limitations? Do you have a strategy to develop them for specific systems? Do you take each use case individually? Please, share some of your ideas with the rest of us. We could all benefit from good ideas and advice.

Image credit: Ejay

Leave a Reply

Your email address will not be published. Required fields are marked *