How I got into data science

In the third year of my PhD, I had a decision to make.

Research, especially the academic kind, just wasn’t working for me. I was getting results and publications, but knew that it could never be a lifelong career for me. Things moved slowly. I disliked the territorialism and being a master of abstractions, outside of the productive mainstream of the economy. Then, of course, there were the practical considerations. Many years of post-doctoral drudgery stood between me and any long-term academic job. There was no guarantee for a long-term job, and in the meantime the salary was a pittance. If not academics, what would my career be?

Management consulting

I had heard about management consulting and decided to look into it. I liked the idea of helping businesses with difficult challenges. At around this time, companies like McKinsey and Boston Consulting Group were ramping up their recruitment of PhD graduates. I went to several info sessions and continued on the path of management consulting.

‘Case interviews’ are a kind of practice for consulting where you are asked to analyze a business problem and make a recommendation. It requires understanding of different kinds of businesses, mental math, communication and presentation skills. It is the main format for interviews at management consulting companies. I practiced case interviews for several weeks and began to progress in my skill.

Deciding on data science

One day in the Fall of 2015 while riding my bike to UCLA, I changed my mind. To this day, I believe I had the ability to go into management consulting, but not the drive to be excellent. While I found the case interviews interesting on a surface level, I couldn’t help wondering if there were something else that better fit my interests. While leaving the door to management consulting open, I decided to spend my time developing my data science skills instead.

My research in computational biology was very mathematical. I spent most of my days tinkering with MATLAB code that represented systems of biochemical reactions. The crux of the research was deeply tied to something called stability theory. Dynamical systems, whether they represent biochemistry or aircraft, can have an essentially stable or unstable nature. Our research was predicated on the idea that stability was important for biological systems. While all of this is very theoretical, mathematical and interesting in its own ways, it was a long way from data science.

Still, there was enough overlap that a move to data science made sense. Two former members of my lab had made exactly that move, and found jobs at prominent bay area companies. I also thought that the work of data science was probably a better fit for my personality.

Starting the journey

I went to events at UCLA for PhD’s interested in non-academic careers, and a panel discussion on data science. I began networking with everyone I could. I asked sympathetic professors if any former students had become data scientists and found several meaningful connections.

When I told my wife I wanted to be a data scientist, she bought me a book called Data Science for Business, which I still think is probably the best overall entry point for people interested in data science.

From these early days, I spent nearly all my free time and a good portion of my “work” time on learning and practicing data science. I had a lot of energy and commitment to my goal, but I had no idea how much work was in front of me.

My first side project

My initial forays were more about web development than data science. I instinctively knew that for my side projects to impress potential employers, they had to look good and be on the web, a click away in any browser window.

A family member had some credits for Microsoft Azure, so I had access to a Linux VM for development and experimentation (I later found out you can get essentially the same thing free through Amazon Web Services free tier). I started the long process of piecing together how the internet works, what a web server is, what a web application is, and how to deploy one.

My first side project (since decommissioned) was a solar panel calculator. Every hour, it downloaded data from the National Weather Service for the projected cloud cover all over the US. Processing this data turned out to be a huge challenge and almost derailed the entire project. The data is transmitted in a format called GRIB which is specified by the World Meteorological Organization.

I remember downloading a hex editor and looking at the GRIB data byte by byte and matching it up to the specification. I discovered a library in C, with Python handles which would handle the GRIB data for me. After several days of attempts, I managed to get them installed on my server. They provided a relatively convenient, if slow, way to query the cloud forecast data. This first success, however insignificant, felt huge to me at the time, and fueled my confidence to continue.

After solving my GRIB challenge, I found another calculator, implemented in Javascript, that calculated the sun’s elevation at any time at any place on earth, specified by latitude and longitude. I also tracked down a database of zip codes and latitude/longitude coordinates. The final application would calculate solar intensity at any zip code in the US every hour for the next week based on the predicted cloudiness and calculated solar elevation. It would then plot this using D3.js.

All said, it took several months of painstaking trial and error to finally get it launched in early 2016. It was a powerful moment of self-actualization. It made me realize I could actually do it. I could actually become a data scientist. I had blazed through interminable weeks of syntax errors and clueless flailing on the Linux command line. I worked through self-doubt to build a working math-based web application.

Leveling up

While my first successful side project increased my confidence, building a solid theoretical background was also essential. I took online courses, read textbooks and practiced Python skills. I started with Coursera’s Machine Learning course. It’s a technically rigorous course, and, I believe, has shaped the outlook and vocabulary of the entire field. Next I took the Biostatistics courses, which are also excellent and rigorous. For all the materials I’ve covered, check this post.

Deep Dive

In Spring 2016, a program called Deep Dive Data Science was advertised at the UCLA campus. A former UCLA PhD student was looking for students to mentor in data science. There were a series of lectures on algorithms and machine learning, which were perfect for me. The algorithms portion turned out to be an important part of interviewing for tech jobs, including those in data science.

There were also number of challenging homework assignments, which I kept in a git repository. As summer approached, I started working on another side project. This one is still active. It required more practical data science skills like data scraping, score functions and data visualization. After deploying a nice, visually appealing and accessible side project, and with my technical foundations in place, I finally felt ready for the job market.

Applying for jobs

The application process was painful, as most people have experienced. Despite the hype about plentiful data science jobs, it was a tough road with a lot of rejection. I started applying in June 2016, and carefully tracked all applications and any other interactions with employers. I suspected my Bioengineering degree was a disadvantage relative to other fields like Statistics or Computer Science.

The months dragged on and I set my thesis defense date in November. I had gotten a few phone interviews, but they weren’t leading to anything yet. If I graduated without an offer, my back would really be against a wall.

Finally in October, I got my break: a data science internship at a company that at the time was called Demand Media. I had applied through the normal channels and completed a coding challenge. After an onsite interview that went well, I got an offer. I could have felt let down that all I could manage to find after completing a PhD was an internship, but I was ecstatic. It was the ultimate validation of my efforts. After more than a year of intense preparation and study, I had my data science job. I knew that the first job was the hardest, many people had told me so. And here I was getting over that monumental hurdle.

A year and a half after starting the internship, it could not have worked out better. After a few months I was hired full time. I enjoy the work and I have no plans to change jobs for the time being. My work is valued and the company culture and prospects are good. Still, I am frequently contacted by recruiters and am confident I could get another job if I wanted to.

Still learning

I continued to build out my technical skills with additional courses and textbooks. At work, I also learned a lot about how a tech stack works and how to use Python for data analysis and transformation.

I’m now broadening into other fields including information security, business, management, finance, public speaking and negotiation. It’s hard to say exactly what these tools will do for me, but they have great general utility and I suspect they will be useful in some way or other in the future. I learned to trust my instincts in my data science hunt and I will continue to do so.

Lessons learned

The number one lesson I took away from the experience is to start small and build your confidence. We all know confidence is important. But how do you build it? It’s not something that can be manufactured out of nothing. In my experience, it has to be based on real accomplishments, especially the ones you set for yourself. It doesn’t matter how big the accomplishments look to other people. Even small accomplishments boost your confidence. Set a direction and accomplish something. If the bar is low enough that you can actually clear it, you’ll have something to celebrate and the energy to keep going.

Another lesson is to give your dreams time to grow, and to pursue them intensely. In my case it took over a year to finally land a data science job, and even then it was only an internship at first. All along I was pushing myself and learning. This lesson has a caveat. Not all dreams are possible. My dream of data science could have ended in failure. The calculation of if or when to change paths is difficult, and we all must make it based on our own individual situations.

Data science is not for everyone and that’s OK

This is a tricky path to navigate for many people who become interested in data science. I was gripped by a manic drive that got me to the finish line. Others who were at one time interested in data science didn’t have the same level of intensity and have moved on to other interesting and productive careers. I probably won’t be in data science for my whole career either. The first thing I want to tell people when they ask about careers in data science is to explore other options too. Data science had its moment and people got interested because of the hype, but maybe there’s something else they could find that would suit them much better.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s