Virus apocalypse, riots, presidential election. These are the trends of the year. Looking deeper, there are other, subtler, but probably longer lasting trends. We can track the emergence of these trends by following stock prices in different sectors. Decentralized rules Companies with products that enable physical decentralization have prospered. Most notably, Amazon (AMZN), Netflix (NFLX)… Continue reading Tracing 2020’s deep trends, in stocks
Why and how attention works in neural nets
What does it mean for a machine to "pay attention"? Is it possible for dead transistors to do something that seems so alive? Possibly. ML researchers have been working on neural architectures featuring so-called "attention" mechanisms. They are proving useful in different applications of ML, especially tasks with sequence-style inputs or outputs like text. Attention… Continue reading Why and how attention works in neural nets
AutoML: cutting through the hype
Are the machines getting too smart? Winning at chess, Go, Starcraft, and now, maybe, winning at machine learning. I set out to find out what AutoML means and what it can actually deliver. AutoML is reportedly at the peak of its hype cycle, so what is it, and does it work? Is it amazing? Or… Continue reading AutoML: cutting through the hype
Query optimization with SQLite and Redis caching
We can't be waiting around for five seconds for a database to respond. When we are young we believe the SQL execution engine will handle everything and make sure we get our results on time. Sadly, this is not the case, at least in my experience. I'll walk through the evolution of one SQL query,… Continue reading Query optimization with SQLite and Redis caching
Making sense of confusion matrices: ROC vs PR (precision-recall) and other metrics
Confusion matrices are simple in principle: four numbers that describe the performance of a binary classifier. Yet a full understanding of the behavior and meaning of a confusion matrix is far more subtle than it would appear based on the existence of just four numbers. This is a confusion matrix. Looks simple right? Accuracy The… Continue reading Making sense of confusion matrices: ROC vs PR (precision-recall) and other metrics
Latent semantic indexing: practical example of document categorization with topic modeling
Let's talk about latent semantic indexing. We all know machine learning is magical. One of the most magical things I've seen machine learning do is document categorization. What is latent semantic indexing? Let's say you have 10,000 text documents and you'd like to know what categories or topics exist. Machine learning can define the categories… Continue reading Latent semantic indexing: practical example of document categorization with topic modeling
Explainable machine learning: the next frontier
It's time for machine learning to grow up. Machine learning has proved valuable in many areas of technology and business. ML has a seat at the table, but for ML to truly mature, we need to know we can trust it. For ML to integrate more fully and contribute everything it can, people need to… Continue reading Explainable machine learning: the next frontier
The problems with the simulation argument
We're all living in a simulation. Or so the argument goes. Lots of people, not least Elon Musk, make this argument or something like it: Technology increases over time. Eventually, the technology to create a simulation as complex as our world will exist. Therefore life-like simulations are inevitable and more numerous than base realities. Ergo,… Continue reading The problems with the simulation argument
The limits of data
Beware someone who says data can solve any problem. They're naive, malicious, or won't be around to deal with the aftermath. Of course, data can accomplish a lot. As a data scientist, my job depends on it. But if we're being honest, the limitations are becoming more and more obvious. As a community, data scientists have… Continue reading The limits of data
5 Pitfalls & Solutions for Today’s Data Leaders
The data revolution is in full swing: data science practitioners are prospering and creating huge value for their companies. Despite this success, data science leaders across the industry are facing stress and difficult conditions. Data leaders must avoid these pitfalls to succeed and generate value for their organizations. Limited Experience Pitfall #1: Because the field… Continue reading 5 Pitfalls & Solutions for Today’s Data Leaders