The limits of data

Beware someone who says data can solve any problem. They're naive, malicious, or won't be around to deal with the aftermath. Of course, data can accomplish a lot. As a data scientist, my job depends on it. But if we're being honest, the limitations are becoming more and more obvious. As a community, data scientists have… Continue reading The limits of data


5 Pitfalls & Solutions for Today’s Data Leaders

The data revolution is in full swing: data science practitioners are prospering and creating huge value for their companies. Despite this success, data science leaders across the industry are facing stress and difficult conditions. Data leaders must avoid these pitfalls to succeed and generate value for their organizations. Limited Experience Pitfall #1: Because the field… Continue reading 5 Pitfalls & Solutions for Today’s Data Leaders

Analyzing resumes with data science

Ever wonder if there is a "secret code" for resumes, some key words that will actually make you stand out? It turns out there are indeed some very characteristic differences between experienced and novice resumes. Gathering Resume Data Resumes from an resume search were used to analyze the differences between experienced and inexperienced resumes. For maximum… Continue reading Analyzing resumes with data science

What I’m reading/watching/taking

Books, videos and courses I've done, am doing or want to do, for the curious and for my own reference. Read/watched/took: Machine Learning with Andrew Ng Biostatistics Boot Camp 1&2 with Brian Caffo - Link to 2 Data Science For Business CS231N with Andrej Karpathy Statistical Rethinking Mathematical Statistics and Data Analysis Introduction to Statistical Learning… Continue reading What I’m reading/watching/taking

Linear Regression vs. Decision Trees: Handling Outliers

In regression tasks, it's often assumed that decision trees are more robust to outliers than linear regression. See this Quora question for a typical example. I believe this is also mentioned in the book "Introduction to Statistical Learning", which may be the source of the notion. Predictions from a decision tree are based on the… Continue reading Linear Regression vs. Decision Trees: Handling Outliers

Lead Scoring with Customer Data Using glmnet in R

Lead Scoring Lead scoring is an important task for business. Lead scoring is identifying which individuals in a population may convert (purchase) if marketed to, or assigning them a probability of converting, or determining how much value that individual may have as a customer. Properly using data to support this task can greatly benefit your… Continue reading Lead Scoring with Customer Data Using glmnet in R