Top mistakes data scientists make

The rise of the data scientists continues and the social media is filled with success stories – but what about those who fail? There are no cover articles praising the fails of the many data scientists that don’t live up to the hype and don’t meet the needs of their stakeholders.

The job of the data scientist is solving problems. And some data scientists can’t solve them. They either don’t know how to, or are obsessed about the technology part of the craft and forget what the job is all about. Some get frustrated that “those business people” are asking them to do “simple trivial data tasks” while they’re working on something “really important and complex”. There are many ways a data scientist can fail – here’s a summary of top three mistakes that is a straight path towards failure.

Mistake #1 – Less communication is better

What I have seen in the great data scientists is that they are communicators first and data geeks second. A very common mistake that data scientists make is avoiding business people at all costs. This means that they try to maintain a minimal amount of interactions with them in order to go back and do “cool geek stuff”. Now I really like the geeky part of work, I do. That’s why I got into the field in the first place. But we are hired to solve problems and without communication those problems won’t be solved. Data scientists must follow up on the progress of their data analysis and collect feedback from their peers all the time, especially when they don’t find anything peculiar – maybe that’s good news? Not only collecting feedback is important but also adjusting the analysis and assumptions based on the feedback. This is the “science” in the “data science” – scientific method is founded on the principle of redefining hypothesis based on new data. And the only way to collect and interpret new data is by communicating with your stakeholders who have defined the hypothesis in the first place!

Mistake #2 – Delaying simple data requests from business teams

This is a golden one – simple data requests drive data scientists crazy (“it’s just 30 lines of SQL code, yuck!”). And this is where they fail. While it might be very simple for a data scientist – the data might just have become available and it might solve years’ worth of a problem. But the data scientist tends to think like an engineer (“trust me, I’m an engineer”) and tries to build scalable architectures to support long-term solutions. But – the business doesn’t care about the architectures, scale, engineering – they only care about the insights, actionable insights. If you’re not providing them – you fail in their eyes. And, well – they do the sales, so their decisions matter. If you don’t help improving those decisions – you’re just a sunk cost and finance theory has some pretty rough advice how to deal with it. Don’t ignore the simple requests. First make sure they support a decision and that decision will improve the business if it has the data – and when you do, swallow your pride and run those trivial 30 lines of SQL code – you’ll turn to a high ROI unit instead of a sunk cost.

Mistake #3 – Preference for complex solution over easy one

Very costly mistake. It’s actually a whole mantra that’s been built around the data scientist occupation. Depiction of data scientists as ultimate geniuses who can code, do math and statistics, and understand business better than most has done a big disfavor. The expectation becomes a perverse one – the data scientists think that they need to solve the problems by applying the top-of-the-line statistical and computer science methods. Ultimately you get to a situation where the junior data scientists think that everything can be solved with deep learning and don’t know how to explore the data because the industry sold the complexity obsession to them. Basic data exploration and visualization are the main tools for a data scientist and you will spend most of your time exploring data. Not building machine learning models – unless you’re hired to exclusively do so. Not building back-end architectures that scale. Not writing a 10-page in-depth hypothesis testing research for a simple business question. Unless you’re hired for that or were specifically asked to do that. Your main role is discovering actionable insights and sharing them as recommendations with your stakeholders.

Don’t over-complicate the already overly complex field with too many superstitions.The most typical situation showcasing this mistake is when the data scientists want to apply machine learning everywhere, for every use case, every project. This not only slows down the delivery of the desired output but in many cases a machine learning model is not required at all! As I’ve been explaining earlier – the core work of the data scientist if to solve problems, not to apply and use every shiny new tool that’s out there.


So how do I succeed as a data scientist?

As with every field there are many ways so succeed and fail – and many mistakes need to be made to understand which are which – but the fundamental lessons can be learned without trial-and-error. What’s utmost important is being passionate about the problems and building solutions for your stakeholders instead of obsessing over tools and geeky stuff. Unless your role is an engineering one where you are not required to interact with other human beings, you will have to deal with human-to-human communication and run very simple – trivial, in your mind! – code that delivers a non-attractive 3×3 data table. But sometimes the simple is better, and it’s all that is needed – “everything should be made as simple as possible, but not simpler” as one pretty famous scientist Albert Einstein once said.

11 thoughts on “Top mistakes data scientists make”

  1. correction FYI
    “Everything should be made as simple as possible, but no simpler” was a paraphrase of Einstein’s quote below:
    Original quote:
    “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”

    Liked by 1 person

  2. Another thing which those 30 lines of SQL code is get you a respected chair with business. Sometimes your analysis may be complex & wonderful – geeky stuff.However, you need buy-ins from stakeholder to implement the solution or actionable findings which you propose.
    Data analysis on paper with actions owned by Sales/Marketing/delivery/tech & driving them to ground is much than data analysis on fancy slides backed with thousands of code without any action.
    So, idea should be get buy-in first and then propose Mammoth changes!! Baby steps.

    Liked by 1 person

    1. Nicely put, stakeholder buy-in is crucial, not only from trust perspective but from commitment to act upon data scientist’s recommendations – this is a must since insights should always lead to actions & decisions.


  3. Hey Karolis, I came across your post while I was searching how to start a career in Data Science. I am a Software Engineer. I write code most of the time. I have learned all those 3 points in my different jobs in last 5 years. How ? By doing them as mistakes 🙂 . Keep writing


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s