Deep Neural Networks: CS231n & Transfer Learning

Deep learning (also known as neural networks) has become a very powerful technique for dealing with very high dimensional data, i.e. images, audio, and video. As one example, automated image classification has become highly effective. This task consists of putting an image into one of a certain number of classes. Look at the results of… Continue reading Deep Neural Networks: CS231n & Transfer Learning


Collecting Neighborhood Data

Since deploying my latest web application, a Los Angeles Neighborhood Ranker, I've wanted to explain the process of gathering the data. The first step was to decide which neighborhoods to use, what they're called, and how they're defined, geographically. The first stop I made is the LA Times Mapping LA project. It has a mapping feature… Continue reading Collecting Neighborhood Data

Information Criteria & Cross Validation

A problem of predictive models is overfitting. This happens when the model is too responsive and picks up trends that are due to quirks in a specific sample, and not reflective of general, underlying trends in the data process. The result is a model that doesn't predict very well. A model can be made less responsive by regularization--i.e.… Continue reading Information Criteria & Cross Validation

A brief aside: Video Productions

On a slightly different note, I'd like to showcase my award-winning video production here. These two won a total of $2500 in a competition sponsored by UCLA Department of Chemical and Biomolecular Engineering: These were produced for UCLA California NanoSystems Institute. These two bike safety videos each won $200 in a contest sponsored… Continue reading A brief aside: Video Productions

Statistical Diversions: Collinear Predictors

Collinear predictors present a challenge in model construction and interpretation. This topic is covered in intuitive and engaging style in Chapter 5 of the excellent book Statistical Rethinking, by Richard McElreath. Collinear predictors refers to when predictor variables correlate strongly with one another. To see why this can be a problem, consider the following example. Let's… Continue reading Statistical Diversions: Collinear Predictors

Distributed Computing: Motivation & File Systems

Why Distributed Computing? Distributed computing is the practice of computing using a cluster of computers rather than a single one. Distributed computing comes with many advantages. Work With Very Big Data Work for today's data scientists very often involves data sets which are too large to feasibly work with from one local computer. For simple installations of programming… Continue reading Distributed Computing: Motivation & File Systems