Misleading modelling: overfitting, cross-validation, and the bias-variance trade-off

Cambridge Coding Academy


In this post you will get to grips with what is perhaps the most essential concept in machine learning: the bias-variance trade-off. The main idea here is that you want to create models that are as good at prediction as possible but that are still applicable to new data (i.e. they are generalizable). The danger is that you can easily create models that overfit to the local noise in your specific dataset, which isn’t too helpful and leads to poor generalizability since the noise is random and therefore different in each dataset. Essentially, you want to create models that capture only the useful components of a dataset. On the other hand, models that generalize very well but are too inflexible to generate good predictions are the other extreme you want to avoid (this is called underfitting).

We discuss and demonstrate these concepts using the k-nearest neighbors algorithm…

View original post 2,590 more words

Sunday Bayes: A brief history of Bayesian stats

The Etz-Files

The following discussion is essentially nontechnical; the aim is only to convey a little introductory “feel” for our outlook, purpose, and terminology, and to alert newcomers to common pitfalls of understanding.

Sometimes, in our perplexity, it has seemed to us that there are two basically different kinds of mentality in statistics; those who see the point of Bayesian inference at once, and need no explanation; and those who never see it, however much explanation is given.

–Jaynes, 1986 (pdf link)

Sunday Bayes

The format of this series is short and simple: Every week I will give a quick summary of a paper while sharing a few excerpts that I like. If you’ve read our eight easy steps paper and you’d like to follow along on this extension, I think a pace of one paper per week is a perfect way to ease yourself into the Bayesian sphere.

Bayesian Methods: General Background

The necessity of…

View original post 1,230 more words

Learning Python For Data Science

Python Tips

For those of you who wish to begin learning Python for Data Science, here is a list of various resources that will get you up and running. Included are things like online tutorials and short interactive course, MOOCs, newsletters, books, useful tools and more. We decided to put this together so that you can begin learning Data Science with Python right of the bat, without having to spend hours surfing the web in search of resources. Please note that while we believe the list is comprehensive, it is by no means exhaustive. We probably have missed out on a couple of nice resources so feel free to mention them in the comments if you are so inclined. 🙂

View original post 1,466 more words

Nonlocality and statistical inference

Low Dimensional Topology

It doesn’t have much to do with topology, but I’d like to share with you something Avishy Carmi and I have been thinking about quite a bit lately, that is the EPR paradox and the meaning of (non)locality. Avishy and I have a preprint about this:

A.Y. Carmi and D.M., Statistics Limits Nonlocality, arXiv:1507.07514.

It offers a statistical explanation for a Physics inequality called Tsirelson’s bound (perhaps to be compared to a known explanation called Information Causality). Behind the fold I will sketch how it works.

View original post 2,946 more words

Inferring Causal Impact Using Bayesian Structural Time-Series Models

the morning paper

Inferring Causal Impact Using Bayesian Structural Time-Series Models – Brodersen et al. (Google) 2015

Today’s paper comes from ‘The Annals of Applied Statistics’ – not one of my usual sources (!), and a good indication that I’m likely to be well out of my depth again for parts of it. Nevertheless, it addresses a really interesting and relevant question for companies of all shapes and sizes: how do I know whether a given marketing activity ‘worked’ or not? Or more precisely, how do I accurately measure the impact that a marketing activity had, so that I can figure out whether or not it had a good ROI and hence guide future actions. This also includes things like assessing the impact of the rollout of a new feature, so you can treat the word marketing fairly broadly in this context.

…we focus on measuring the impact of a discrete marketing event…

View original post 1,308 more words

Spectral Clustering: A quick overview

A lot of my ideas about Machine Learning come from Quantum Mechanical Perturbation Theory.  To provide some context, we need to step back and understand that the familiar techniques of Machine Learning, like Spectral Clustering, are, in fact, nearly identical to Quantum Mechanical Spectroscopy.   As usual, this will take several blogs.

Here, I give a brief tutorial on the theory of Spectral Clustering and how it is implemented in open source packaages

At some point I will rewrite some of this and add a review of this recent paper  Robust and Scalable Graph-Based Semisupervised Learning

Spectral (or Subspace) Clustering

The goal of spectral clustering is to cluster data that is connected but not lnecessarily compact or clustered within convex boundaries

The basic idea:

  1. project your data into $latex R^{n} $
  2. define an Affinity  matrix $latex A $ , using a Gaussian Kernel $latex K $ or say just an Adjacency matrix…

View original post 1,778 more words