KDD Cup 2015: The story of how I built hundreds of predictive models….And got so close, yet so far away from 1st place!

Data Until I Die!

The challenge from the KDD Cup this year was to use their data relating to student enrollment in online MOOCs to predict who would drop out vs who would stay.

The short story is that using H2O and a lot of my free time, I trained several hundred GBM models looking for the final one which eventually got me an AUC score of 0.88127 on the KDD Cup leaderboard and at the time of this writing landed me in 120th place. My score is 2.6% away from 1st place, but there are 119 people above me!

Here are the main characters of this story:

MySQL Workbench

It started with my obsessive drive to find an analytics project to work on. I happened upon the KDD Cup 2015 competition and decided to give it a go. It had the characteristics of a project that I wanted to get…

View original post 1,843 more words

Reviews; Machine learning for music discovery; icml 2016 workshop, new york [1/2: invited talks]

Keunwoo Choi

I attended this amazing workshop this year again, Machine learning for music discovery at International conference on machine learning (ICML) 2016. ICML is one of the biggest conferences in machine learning (ICML; THE summer ML conference, NIPS; THE winter ML conference)(Or the opposite if it happens at somewhere in southern hemisphere). The whole conference was massive! The committee expected ~3,000 attendees. ML4MD workshop was also rather packed, though the room was not large like deep learning workshop.

There was one keynote (1hr), 5 invited talks, 8 accepted talks, and happy hours.

Project Magenta: Can Music Generation be Solved with Music Recommendation?

By Douglas Eck, Google Brain

Douglas Eck gave this presentation about rather hot issue – Project Magenta by Google Brain. If you haven’t heard of it — please check out the website. The current example is not that like state-of-the-art-as-Google-does-all-the-time, but it is a project that just started…

View original post 1,359 more words

Singular Value Decomposition Part 2: Theorem, Proof, Algorithm

Math ∩ Programming

I’m just going to jump right into the definitions and rigor, so if you haven’t read the previous post motivating the singular value decomposition, go back and do that first. This post will be theorem, proof, algorithm, data. The data set we test on is a thousand-story CNN news data set. All of the data, code, and examples used in this post is in a github repository, as usual.

We start with the best-approximating $latex k$-dimensional linear subspace.

Definition: Let $latex X = { x_1, dots, x_m }$ be a set of $latex m$ points in $latex mathbb{R}^n$. The best approximating $latex k$-dimensional linear subspace of $latex X$ is the $latex k$-dimensional linear subspace $latex V subset mathbb{R}^n$ which minimizes the sum of the squared distances from the points in $latex X$ to $latex V$.

Let me clarify what I mean by minimizing the sum of squared distances. First we’ll start with the…

View original post 5,066 more words

Dynamic Time Warping averaging of time series allows faster and more accurate classification

the morning paper

Dynamic Time Warping averaging of time series allows faster and more accurate classification – Petitjean et al. ICDM 2014

For most time series classification problems, using the Nearest Neighbour algorithm (find the nearest neighbour within the training set to the query) is the technique of choice. Moreover, when determining the distance to neighbours, we want to use Dynamic Time Warping (DTW) as the distance measure.

Despite the optimisations we looked at earlier this week to improve the efficiency of DTW, the authors argue there remain situations where DTW (or even Euclidean distance) has severe tractability issues, particularly on resource constrained devices such as wearable computers and embedded medical devices. There is a great example of this in the evaluation section, where recent work has shown that it is possible to classify flying insects with high accuracy by converting the audio of their flight (buzzing bees, and so on).


View original post 1,398 more words

Singular Value Decomposition Part 1: Perspectives on Linear Algebra

Math ∩ Programming

The singular value decomposition (SVD) of a matrix is a fundamental tool in computer science, data analysis, and statistics. It’s used for all kinds of applications from regression to prediction, to finding approximate solutions to optimization problems. In this series of two posts we’ll motivate, define, compute, and use the singular value decomposition to analyze some data.

I want to spend the first post entirely on motivation and background. As part of this, I think we need a little reminder about how linear algebra equivocates linear subspaces and matrices. I say “I think” because what I’m going to say seems rarely spelled out in detail. Indeed, I was confused myself when I first started to read about linear algebra applied to algorithms, machine learning, and data science, despite having a solid understanding of linear algebra from a mathematical perspective. The concern is the connection between matrices as transformations and matrices as a “convenient” way to organize data.


View original post 2,673 more words


Normal Deviate


Today we have a guest post by my good friend Rob Tibshirani. Rob has a list of nine great statistics papers. (He is too modest to include his own papers.) Have a look and let us know what papers you would add to the list. And what machine learning papers would you add? Enjoy.

9 Great Statistics papers published after 1970
Rob Tibshirani

I was thinking about influential and awe-inspiring papers in Statistics and thought it would be fun to make a list. This list will show my bias in favor of practical work, and by its omissions, my ignorance of many important subfields of Statistics. I hope that others will express their own opinions.

  1. Regression models and life tables (with discussion) (Cox 1972). A beautiful and elegant solution to an extremely important practical problem. Has had an enormous impact in medical science. David Cox…

View original post 501 more words