Books, videos and courses I've done, am doing or want to do, for the curious and for my own reference. Read/watched/took: Machine Learning with Andrew Ng Biostatistics Boot Camp 1&2 with Brian Caffo - Link to 2 Data Science For Business CS231N with Andrej Karpathy Statistical Rethinking Mathematical Statistics and Data Analysis Introduction to Statistical Learning… Continue reading What I’m reading/watching/taking

## Linear Regression vs. Decision Trees: Handling Outliers

In regression tasks, it's often assumed that decision trees are more robust to outliers than linear regression. See this Quora question for a typical example. I believe this is also mentioned in the book "Introduction to Statistical Learning", which may be the source of the notion. Predictions from a decision tree are based on the… Continue reading Linear Regression vs. Decision Trees: Handling Outliers

## Data vignette: men are worse drivers

This post is based on a notebook I wrote a couple years ago. I'd like to revisit and expand on it, as well as correct some errors. The original notebook is here. In this post, I analyze traffic collision data from Los Angeles County in January 2012. The analysis is sound, but the conclusion in… Continue reading Data vignette: men are worse drivers

## For presentations, focus on narrative

You've collected the data, you've run the analysis, now you have to decide how to present. You've considered it from every angle, and you're preparing a slide deck to match--detailed, lengthy and technical. Is this the right approach? Probably not. Rule of thumb: include no more than one figure per topic When you're the technical… Continue reading For presentations, focus on narrative

## Lead Scoring with Customer Data Using glmnet in R

Lead Scoring Lead scoring is an important task for business. Lead scoring is identifying which individuals in a population may convert (purchase) if marketed to, or assigning them a probability of converting, or determining how much value that individual may have as a customer. Properly using data to support this task can greatly benefit your… Continue reading Lead Scoring with Customer Data Using glmnet in R

## Data Science From Scratch

I visit Quora regularly and am always surprised by the number of people asking how to become a data scientist. It's a fascinating field, and one I was able to (mostly) "bootstrap" into, out of a quantitative PhD (bioengineering). This is my simple guide on how to become a data scientist. Get in touch if… Continue reading Data Science From Scratch

## Deep Neural Networks: CS231n & Transfer Learning

Deep learning (also known as neural networks) has become a very powerful technique for dealing with very high dimensional data, i.e. images, audio, and video. As one example, automated image classification has become highly effective. This task consists of putting an image into one of a certain number of classes. Look at the results of… Continue reading Deep Neural Networks: CS231n & Transfer Learning

## Collecting Neighborhood Data

Since deploying my latest web application, a Los Angeles Neighborhood Ranker, I've wanted to explain the process of gathering the data. The first step was to decide which neighborhoods to use, what they're called, and how they're defined, geographically. The first stop I made is the LA Times Mapping LA project. It has a mapping feature… Continue reading Collecting Neighborhood Data

## Ranking Neighborhoods in Los Angeles

My latest web application is an interactive, personalized map-based ranking of neighborhoods in Los Angeles. I've spent the last few weeks gathering data points for each of 155 neighborhoods in the Los Angeles area. That on its own could (and will soon) be the topic of its own post. For now, I wanted to explain the ranking… Continue reading Ranking Neighborhoods in Los Angeles

## SQL: Combining tables without JOIN

SQL is a versatile language--there are many ways to get the job done. One way to compare information from multiple tables is the JOIN command, but there are sometimes alternatives. Let's say I've got two tables, office and person: Let's say I need to know which people have printers. I could do a JOIN ON office_room = office_num.… Continue reading SQL: Combining tables without JOIN