Distributed Computing: Motivation & File Systems

Why Distributed Computing? Distributed computing is the practice of computing using a cluster of computers rather than a single one. Distributed computing comes with many advantages. Work With Very Big Data Work for today's data scientists very often involves data sets which are too large to feasibly work with from one local computer. For simple installations of programming… Continue reading Distributed Computing: Motivation & File Systems

Some of my favorite links

This is a post which I will update periodically with some of my favorite Quora (& Stackexchange) answers. Gamma & Poisson Distributions Why is Spark faster than MapReduce? Importance of independence in statistical modeling Importance of underlying data distributions Biggest lessons learned in corporate Free open datasets Most common data science mistakes SVM and Kernels… Continue reading Some of my favorite links