A Case for Human Data Science

"Machines are becoming smarter. But how can we make humans smarter?"

This question was the primary driver for me to start ANELEN, a data science consulting and service company. From personal finance to corporate strategy, every day is filled with decisions. How many choices in life do we make rationally? Are we learning from past actions and consequences? How is the collective decision making of humankind improving over its history?

In the last decade, machines became smarter by learning from the data. However, they are not smart enough for us to be able to delegate everything. Reuters recently reported that Amazon shut down its project that used machine learning to rate the job candidates after they found that the model was automatically downgrading female job applicants. The primary source of such bias was "because Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry."

This news was a lesson for every data scientist. For many people, downgrading the candidate based on the gender is an obvious flaw of logic. However, we may easily overlook the problem when our mind is fixated with the framework of extracting the winning pattern from the historical observation. The case of Amazon is a warning message to data scientists who think their job is complete after optimizing the hyper-parameters of the machine-learning model for another 0.5% accuracy against a validation data set.

Machine intelligence at current form, whether it is a supervised or unsupervised system, only learns from the historical patterns embedded in the data. Human intelligence at its finest not only learns from the past but also knows how the future may be - and must be - different from the past.

"Helping innovative businesses make smarter decisions with data science." - Anelen's mission statement

At Anelen, we often make evidence-based recommendations to the top executives in the client companies. We have the expertise of processing data at volume. We have the statical knowledge to tell signal from noise. We can operationalize the machine learning models and deploy them to the production. However, those are necessary but insufficient requirements for fulfilling our data science mission. Our work requires empathy for our client's challenges. We need to understand their core values. We need to share the vision and goals. Only after that, our analysis and models start to be relevant.

News of artificial intelligence astonishes people every day. People may think that machines are increasingly replacing us with cognitive tasks. I argue that humans are far from being able to auto-pilot anything. If you examine what deep learning is doing today, the majority of the applications are either low-level cognitive tasks, such as recognizing objects in images or accelerating "mechanical" decision-making process. I call it mechanical because those tasks are very well established based on explicit rules or historical examples with recognized patterns.

High-level cognitive tasks on the other hand often require "fuzzy" human interpretations of multi-dimensional data. It is fuzzy partly because the variables of the utility function in our mind is often directly unobservable. Think of consumer engagement with corporate brands for example. How deeply a customer's mind is engaged with a corporate brand is not directly observable. If your responsibility in the corporate was to "deepen the engagements" with the customers, how would you measure such hypothetical construct? Can it be measured by the number of posts on Instagram? Maybe. Whatever you choose to measure, the observable outcomes are not the engagement itself. They are merely proxy variables that are assumed to be the consequence of deepened engagement. Your assumption may be correct today, but it may a different story tomorrow. Machines accept the objective function and their variables as is. They don't question them. This is where the resume classification goes wrong without a continuous model examination with human eyes.

"No man ever steps in the same river twice" - Heraclitus

Humans are capable of realizing that no man ever steps in the same river twice. The recurring patterns are not the exact replay of the past. History may seem to repeat itself, but all situations are unique in reality. We are capable of noticing the factors of the future that changes the historical patterns. We can question the established methods and decision-making system. Humans have such a wonderful gift called intelligence. However, such intelligence does not seem to be utilized as much as it should in everyday life at all levels. Meanwhile, in the information era, the latest developments are impacting the business bottom line at light speed. Our cognitive capability is overwhelmed by the unprecedented amount of signals and noise blended in the vast amount of data.

Today, cutting-edge data processing tools must assist human activities for any decision-making bodies to thrive. A new discipline must be in place to let us make smarter decisions at a high pace. It is called data science. The new discipline owes much to the traditional studies such as statistics but operates at scale powered by computer science and engineering. It expands our decision-making capacity with machine learning, but still maintains the scientific rigor when the hypothesis is generated, the data is collected, and the bias was assessed. The practitioners of data science must have empathy to the human value system and ethics. They also master the art of conveying the message that effectively urges people to take action.

This is a case for human data science, not a blind machine learning delegation. This is why I founded Anelen.

Data scientists used to spend a long time cleansing data and tuning the hyper-parameters of machine learning models. Recently, more and more tools and platforms are becoming available to automate such tasks. Once we are free from those tasks, has the role of data scientists over? Far from it if you ask Anelen as it continues the attempt to answer the core question - Machines are becoming smarter. But how can we make humans smarter? - while helping people make smarter decisions.

Keep up with data technology trend with our newsletter