Category: Computer Science
Published: Sunday, 24 May 2015

This is a graduate level course being offered at Indiana University as part of the foundational requirements for MS/PhD. I took this course on Spring 2015 with Prof. Predrag Radivojac. In this course we used the following book: Pattern Recognition and Machine Learning (Information Science and Statistics). In my opinion this is not a very friendly book and rather hard to read. Other alternatives are: By Tom M. Mitchell Machine Learning (McGraw-Hill International Editions Computer Science Series) (1st First Edition) [Paperback] and The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics).

The following is a compilation of my solutions to assignments given in class.

- Assignment 1 - [Solution]
- Assignment 2 - [Solution]
- Assignment 3 - [Solution]
- Assignment 4 - [Solution]

Machine Learning is an amazing field of study. In a nutshell, Machine Learning is the science (and art!) of discovering patterns out of data (see also Elements of AI). This is particularly relevant today as far as Big Data goes. In my opinion, Machine Learning, as a discipline, is a generalization of well-established techniques from Statistics and Mathematics (specially Optimization and Probability) for data sets so big and complex that even today's computer technology might not be enough to give you an analytic (closed form solution). What do you do instead? Usually come up with an iterative procedure, i.e., an algorithm that approximates a solution. Amazingly, the main tool for many of the techniques used in the field is an old optimization algorithm called Newton-Raphson. The novelty lies in the way in which this algorithm is used.

But what are we trying to approximate? Very succinctly, given a data set we hypothesize a data generating mechanism i.e., the way in which the data was generated, and then try to estimate the parameters that best fit the data. This procedure gives you a model from which you can make predictions and simulate new data, among other things you can do. As an example, suppose we have a data set and we hypothesize that it was generated from a Poisson distribution. In this case we would need only to estimate the parameter lambda since given this parameter we can completely describe the distribution. There are many ways to perform this estimation some of which are: Maximum Likelihood (ML), Maximum a posteriori, etc. Once we have an estimate for lambda, we have completely described the data set by the Poisson distribution together with the estimated lambda. In other words, our model is a Poisson distribution with the estimated lambda. And there you go, you have discovered the pattern in your data! (more on this on Statistical Learning Theory).

Of course the example above is overly simplistic. There are many more variants and issues to be considered when building a model. The point, however, is that the estimation of parameters is usually so complicated, with many more parameters to estimate in highly-dimensional spaces, that analytic-closed form solutions are impossible and thus, approximate algorithms are necessary. In a way one can say that Machine Learning was already possible before computers, since the Mathematical theory was already in place, but it was not feasible as the amount of computations necessary to build a single model were beyond the capabilities of what could be reliably and consistently performed by humans. In other words, it was not economical to optimize models until computation became relatively cheap. Machine Learning is one of those fields that were ahead of its time.