Data science is applied in many field, including in developing self-driving cars.
If you’re reading this post, I’m assuming that you’d like to learn how to become a data scientist. If you’ve already done some research, you’ve probably read dozens of guides that start with “learn linear algebra”, and end 5 years later with “learn Spark”. When I was learning, I tried to follow these guides, but I ended up bored, without any actual data science skills to show for my time. The guides were like a teacher at school handing me a bunch of books and telling me to read them all — a learning approach that’s never appealed to me.
The unfortunate part about all the “become a data scientist in 5 easy years” guides is that they’re written by people who’re already expert data scientists. They look at themselves and say “what would someone need to learn to do what I do every day?” They forget what it’s like to struggle to learn something on your own, and what it’s like to need motivation to push you over the next hurdle.
As I learned data science, I realized that I learn most effectively when I’m working on a problem I’m interested in. Instead of learning a checklist of skills, I decided to focus on building projects around real data. Not only did this learning method motivate me, it also closely mirrors the work you’ll do in a data scientist role.
In this post, I’ll share a few steps that will help you in your journey to becoming a data scientist. The journey won’t be easy, but it will be infinitely more motivating than following the conventional wisdom.
Step 1: Question Everything
The appeal of data science is that you get to answer interesting questions using actual data and code. These questions can range from “can I predict whether any flight will be on time?” to “how much does the US spend per student on education?”. To be able to ask and answer these questions, you need to develop an analytical mindset.
The best way to develop this mindset is to start doing it with news articles. Find articles, like this one on whether running makes you smarter and this one on whether sugar is actually bad for you. Think about:
How they reach their conclusions given the data they discuss
How you might design a study to investigate further
What questions you might want to ask if you had access to the underlying data
Some articles, like this one on gun deaths in the US and this one on online communities supporting Donald Trump actually have the underlying data available for download. When you can do this:
Download the data, and open it in Excel or an equivalent tool
See what patterns you can find in the data by eyeballing it
Do you think the data supports the conclusions of the article? Why or why not?
What additional questions do you think you can use the data to answer?
Here are some good places to find data-driven articles:
- New York Times
- The Intercept
After you’ve read articles for a few weeks, reflect on whether you enjoyed coming up with questions and answering them. Becoming a data scientist is a long road, and you need to be very passionate about the field to make it all the way. Data scientists constantly come up with questions and answer them using mathematical models and data analysis tools.
If you don’t enjoy the process of reasoning about data and asking questions, you should think about trying to find the overlaps between data and things that you do enjoy. For example, maybe you don’t enjoy the process of coming up with questions in the abstract, but maybe you really enjoy analyzing health data or education data. I personally was very interested in stock market data, which motivated me to build a model to predict the market.
Before you move on to the next step, make sure that there’s something about the process of data science that you’re passionate about. I can’t emphasize this point enough. If your goal is to become a data scientist, but you don’t have a specific passion, you’re probably not going to put in the months of hard work that you’ll need to learn.