My latest web application is an interactive, personalized map-based ranking of neighborhoods in Los Angeles. I've spent the last few weeks gathering data points for each of 155 neighborhoods in the Los Angeles area. That on its own could (and will soon) be the topic of its own post. For now, I wanted to explain the ranking… Continue reading Ranking Neighborhoods in Los Angeles
Author: Plot Reports
SQL: Combining tables without JOIN
SQL is a versatile language--there are many ways to get the job done. One way to compare information from multiple tables is the JOIN command, but there are sometimes alternatives. Let's say I've got two tables, office and person: Let's say I need to know which people have printers. I could do a JOIN ON office_room = office_num.… Continue reading SQL: Combining tables without JOIN
Information Criteria & Cross Validation
A problem of predictive models is overfitting. This happens when the model is too responsive and picks up trends that are due to quirks in a specific sample, and not reflective of general, underlying trends in the data process. The result is a model that doesn't predict very well. A model can be made less responsive by regularization--i.e.… Continue reading Information Criteria & Cross Validation
A brief aside: Video Productions
On a slightly different note, I'd like to showcase my award-winning video production here. These two won a total of $2500 in a competition sponsored by UCLA Department of Chemical and Biomolecular Engineering: https://youtu.be/f7p6amCP1yI https://youtu.be/_Y5I557Rmx8 These were produced for UCLA California NanoSystems Institute. https://youtu.be/QACRvcLV2gM https://youtu.be/M-cw6pK_IEY These two bike safety videos each won $200 in a contest sponsored… Continue reading A brief aside: Video Productions
Jupyter Notebook Explorations
I've recently been getting to know the Jupyter Notebook environment. It's a very convenient development environment for many languages, but is well-known as the evolution of the iPython Notebook. Jupyter works by starting a server on your own computer to execute python (or another language) code. The notebook is rendered in a web browser and… Continue reading Jupyter Notebook Explorations
Statistical Diversions: Collinear Predictors
Collinear predictors present a challenge in model construction and interpretation. This topic is covered in intuitive and engaging style in Chapter 5 of the excellent book Statistical Rethinking, by Richard McElreath. Collinear predictors refers to when predictor variables correlate strongly with one another. To see why this can be a problem, consider the following example. Let's… Continue reading Statistical Diversions: Collinear Predictors
The py_emra package
I'm pleased to announce the py_emra package for Ensemble Modeling Robustness Analysis (EMRA). EMRA is a tool for modeling metabolic systems which I have worked on through my PhD. The crux of EMRA is to use dynamic stability as a criteria for model selection and simulation in metabolic systems (e.g. bacterial metabolism). This depends on using… Continue reading The py_emra package
Distributed Computing: Motivation & File Systems
Why Distributed Computing? Distributed computing is the practice of computing using a cluster of computers rather than a single one. Distributed computing comes with many advantages. Work With Very Big Data Work for today's data scientists very often involves data sets which are too large to feasibly work with from one local computer. For simple installations of programming… Continue reading Distributed Computing: Motivation & File Systems
Some of my favorite links
This is a post which I will update periodically with some of my favorite Quora (& Stackexchange) answers. Gamma & Poisson Distributions Why is Spark faster than MapReduce? Importance of independence in statistical modeling Importance of underlying data distributions Biggest lessons learned in corporate Free open datasets Most common data science mistakes SVM and Kernels… Continue reading Some of my favorite links
Bayesian Updating, Part 2
In Part 1, I explored how Bayesian updating operates when there are two discrete possibilities. I now investigate how Bayesian updating operates with one continuous parameter. This example is from Chapter 2 of 'Statistical Rethinking' by Richard McElreath. The premise can be paraphrased as follows: Suppose you have a globe representing the Earth. It is… Continue reading Bayesian Updating, Part 2