hiring first data scientist

Moving Towards Data Science: Hiring Your First Data Scientist

In October 2020 I joined accuRx as the company’s first data scientist. At the time of joining, accuRx was a team of 60-odd employees who had done an incredible job relying on intuition and a stellar team of user researchers to create products that GPs needed and loved. This, combined with the increased need for good tech solutions in healthcare in 2020, resulted in our reach expanding (literally) exponentially. Suddenly, we were in almost every GP practice in the UK.

We found ourselves in an interesting position: we now had several products that were being used very widely by GPs each day, and another set of nascent product ideas that we were only just bringing to life. We knew that at this point we’d need to start relying more on insight from quantitative data to test out our hypotheses and move our product suite in the right direction.

At this point, we didn’t need advanced ML solutions or the latest big data processing tools. What we really needed was the ability to verify our assumptions at scale, to understand the needs of a very large and diverse group of users and to foster a culture of decision-making in which relying on quantitative data was second nature. This was why I was brought in, and it’s not been without its challenges. Here are a few things I’ve learnt so far: 

1. New roles create new conversations

Adding new members to teams presents a series of inevitable challenges: team dynamics change, the initial cost of onboarding is high and there’s now one more voice in the room when making decisions. The effect of this is substantially amplified when you’re adding not just a new member but a new role to a team.

Before I joined, data science had not been a core part of the product development process. Suddenly, the team were introduced to a host of new concerns, processes and technical requests that they’d not needed to consider before, and addressing these often required a sizeable shift in the entire team’s ways of working.

A few examples of this are:

  • Software engineers had to spend more time adding analytics to feature releases and making sure that the pipelines producing those analytics were reliable.
  • Sometimes, AB test results take a while to trickle in. Given that those results (hopefully) inform the direction a product will move in next, product managers, designers and engineers often found themselves facing a fair degree of ambiguity over how best — and how quickly — to iterate on features and ideas.
  • Having an additional set of information to consider often meant that it took us longer to reach a decision about which direction to move a product in. We now had to reconcile our intuitions with what the data was telling us — and also make a call as to how reliable we thought both of those were!

It’ll take a bit of trial and error, but it’s important to find a way of working that gives product managers, designers and engineers the freedom to ship and iterate quickly without sacrificing your commitment to analytical rigour. In our case, this looked like figuring out which product changes were worth testing, what level of detail was worth tracking and what kinds of analyses are most useful at different stages of the product development process.

2. Effective communication is more than half the battle

It doesn’t matter how useful you think your analyses are — if people don’t know about or understand them, they’re not likely to have much long-term impact. In addition, the way in which you communicate your findings will determine how much impact your analysis ultimately has.

Communicate widely and frequently.

Importantly, it’s not enough to relay your findings to team leads only — the whole team has invested a lot of time and effort adjusting to new ways of working that support analytics, and they expect to be able to see what impact those adjustments have had. Communicating how those changes have positively impacted decision making will go a long way to creating the kind of positive feedback loop needed to motivate your team to keep relying on the processes and techniques that you’ve introduced.

Once you’ve got your team on board, the really tough part is in ensuring that the initial excitement around using data to make decisions persists. A mistake I’ve made (more than once!) is assuming that communication around analytics is a ticket that you can mark as done. If you’re looking to drive a culture change, you’ll need to continually remind people why they should care about the thing as much as you do. As people hear more and more about the positive inroads teams have made off the back of insight from data, relying on data to back up product decisions should start to become expected and more automatic.

Always present data with insight.

Wherever possible, try to communicate your findings in terms of how this will affect decision-making and what people should do as a result. The less abstract you can make the results of an analysis, the better. One simple way to make your results less abstract is to clearly quantify how much impact you think the change will have.

For example, if you’ve run an AB test to determine if a new feature increases your conversion rate, instead of saying ‘The change was statistically significant’, rather try ‘If we rolled out this new change to all our users, it’s likely that our conversion rate would increase from 5% to 7%, which translates to an additional 200 active users per week’.

Similarly, when sharing data visualisations with a team, try to be explicit about what the graph is and isn’t showing. Remember that you’ve spent a lot of time thinking about this visualisation, but someone seeing it with fresh eyes likely doesn’t have as much context as you do. Simple ways to make visualisations clear are to make sure that the exact data you’ve used to define a metric is understood, and that you offer an interpretation of the trend or finding you’ve visualised alongside the graph. If you can, try to explain the implications of the trend you’ve visualised for your team’s goals so that they can take action off the back of the insight you’ve shared.

Speed is good, but accuracy is better.

There’s no surer way to ensure that your work has low impact than by making a habit of communicating incorrect or partially-correct results. If you’re the first or only data scientist in your team, you are the authority on what constitutes good or sufficient evidence and so, ironically, you have very little margin for error.

You’ll often find yourself having to trade-off getting results out to teams quickly and making sure that the analyses producing those results are robust, particularly if you’re working with new, suboptimal or unfamiliar tools. In most cases, I’ve found there’s usually a compromise you can reach — but this requires that you’re very clear about the limitations of the data you’ve used to reach a particular conclusion. When in doubt, caveat!

People will quickly learn if they can trust you, and once broken trust is a tricky thing to get back. This is not to say that you won’t make mistakes — but it’s really important that when these happen they’re caught early, acknowledged widely and that robust processes are put in place to avoid similar mistakes in future.

3. Good data infrastructure is a prerequisite for good data science

When it comes to accurate and useful analyses, it’s a foregone conclusion that they’re enabled by accessible and reliable data. No matter how good your infrastructure, it’s reasonable to expect to have to spend a significant chunk of your time cleaning data before running your analyses. As such, if your data infrastructure is not optimised for analytics, the additional time spent cleaning and wrangling data into a usable format will quickly become a major barrier. Up until this point, we hadn’t prioritised securing best in class analytics tools — getting this right is hard work, and it’s something we’re still working towards.

Death by a thousand cuts…

The effect of this is twofold. First, it adds enough friction in your workflow that you are likely to forego using information that could be valuable because you’re having to weigh the usefulness of the information against the cost of getting it. When an organisation moves fairly quickly, the time and effort this requires is often prohibitive.

Secondly, the probability of making mistakes compounds each time you shift and transform data across different platforms. Each relocation or adjustment of your data is associated with some chance of making a mistake — naturally, the more of this you do, the higher the likelihood that your data is less reliable by the time you actually run your analysis. These two barriers together strongly dis-incentivises people in analytics roles to solve problems creatively, and adds enough friction that your approach to analysis might become a fair bit more rigid and instrumental — and where’s the fun in that!

You become the bottleneck.

Related to this is the issue of accessibility for the wider team. If data scientists are struggling to access data reliably, you can bet your bottom dollar that everyone else is probably worse off! The result of this is that queries for simple information are outsourced to you — and as people become aware that you are able and willing to wade through that particular quagmire, you, ironically, start to become the bottleneck to data-driven decision-making.

At this point, your role starts to become a lot more reactive — you’ll spend a majority of your time attending to high effort, marginal value tasks and find that you’ve got a lot less time and headspace to devote to thinking about problems proactively.

To avoid these pitfalls, you’ll need to make sure that you motivate for the tools you need early on, you automate as much of your own workflow as possible and you provide enough value that people can see that they’d get a lot more from you if you were able to work more efficiently.

 
Author: Tamsyn Naylor
Source: Towards Data Science