The Blog.

Big Data versus intuition - why not both?

I have been working through a book on Freemium Economics, by Eric Benjamin Seufert: “Freemium Economics – leveraging analytics and user segmentation to drive revenue: the savvy manager's guide” The book presents the idea on how to leverage analytics and user segmentation to drive revenue in a business, where the revenue is generated only by a fraction of the user base. The book sparked a wider curiosity towards business analytics that I have been satisfying by going through online courses on topics ranging from Web analytics to databases and map reduce.

The first curiously predominant contrast I found in the literature is that of Big Data versus intuition. To me, this is odd, because I see no reason the two should be mutually exclusive, but I've noticed a pattern of many corporations turning into Big Data analysis as a kind of a statistical panacea that would rid them of the individual, human responsibility associated with the notion of intuition.

Companies now have so much data about users' every move that using very simple algorithms, it is possible to extract patterns that can be aggregated in different ways across different user segments to predict their behaviors and optimize revenue. Many go as far as to argue that if optimization done according to a simple predictive model involving massive and arbitrary data and simple outcome variables works to increase revenue, what happens in between - inside the black box - does not really even matter. The changes done could be tiny and arbitrary and no-one should care as long as they observed an increase in a desired stat, such as retention, virality or generally speaking, revenue.

The very role of intuition in analytics is to wonder why things happen as they do, and what the data artifacts truly represent, as well as to generate interesting questions that can be assessed using a number of data sources. If positive changes in product performance can be observed from e.g. minor layout changes to a web page, performed after discovering a small effect by analyzing massive amounts of user data, then how much larger effect sizes in product performance could be observed if the analysis and optimization was not only driven by data, but also intuition? My stance on the necessity of intuition even when working with Big Data is highlighted with my observation that people who have no experience in working with numbers crunched directly from humans (in psychometrics, or psychophysics, for example), have no realization of how much non-random noise such data contains. Many of us uncritically celebrate every signal detected, when in reality, these signals might represent correlated residuals caused by another phenomenon altogether. Alternative explanations should be considered thoroughly, and those are always the product of intuition. This not only might lead to larger increases in product performance with less time, effort and resources used on optimization efforts, but would also protect the organization from engaging in efforts that might do more harm than good in the long run.

The argument that intuition does not matter because simple algorithms are sufficient in order to discover user patterns can be toxic, and the argument that intuition and qualitative methods alone are sufficient for any product is naive and old-fashioned in a world, where acquiring at least some basic quantitative user data is not only convenient but also inexpensive. The proportion in which the two should be utilized, however, depends on the product, the company and characteristics of the users. For instance, when you have a fixed-cost product, naturally every client is representative of one that generates revenue, and carefully executed qualitative research methods can be a very useful utility. If, however, you have a freemium product and generate revenue by providing people the opportunity to purchase additional utilities, your revenue is actually generated by outliers who spend differential amounts of their resources on different utilities (which at least on some aspects would be best designed in a scalar manner, but that should be a topic of another blog post). When virtually all of your revenue is generated by outliers, and you have to capitalize on virality and scalability of your product to reach as many of them as possible, you aren't well off with qualitative methods. Product design is forced into an iterative model where you carefully and differentially cater to both paying and non paying users. The reason non paying users need to be catered to is that they have the potential to become paying users if their finances improve, they work as ambassadors for product virality as well as often directly enable and motivate the paying clients. For instance, do you think the paying users on dating sites would keep purchasing their subscription, if there was not a massive base of non paying users they can mingle with? In a massive setup with iterative product development, where different segments need to be matched with different product experience depending on their needs, intuition is much less reliable and can even lead to faulty business decisions.

Finally, pattern recognition is simply not sufficient. I often hear frustrating exclaims that companies simply want to visualize their data. That as long as they have an attractive dashboard, they consider their analytics as good as done (as long as the curves progress upwards, at least). But in a truly data-driven organization, this isn't even the starting step. After visualization, exploratory analyses step in. After those have been scrutinized, it's time to formulate and test hypotheses about product optimization and its differential influences on different user segments with an experimental approach - a step actively undertaken by surprisingly few organizations, and where intuition is simply absolutely necessary. Although in reality, these steps happen in an iterative, layered and largely automated manner, and not as a pure sequences, as displayed here. But my point is to highlight the declarative nature of the fashionable 'data-driven' label self-attributed by many an organization, together with the fact that so many of us rave over the notion of Big Data without realizing that essentially, it's been around for a very long time, but until recently, no-one had the storage capacity to keep the junk around, or the technology to analyze all of it.