An Engineer’s Guide to GEMM

Pete Warden's blog


I’ve spent most of the last couple of years worrying about the GEMM function because it’s the heart of deep learning calculations. The trouble is, I’m not very good at matrix math! I struggled through the courses I took in high school and college, barely getting a passing grade, confident that I’d never need anything so esoteric ever again. Right out of college I started working on 3D graphics engines where matrices were everywhere, and they’ve been an essential tool in my work ever since.

I managed to develop decent intuitions for 3D transformations and their 4×4 matrix representations, but not having a solid grounding in the theory left me very prone to mistakes when I moved on to more general calculations. I screwed up the first version of all my diagrams in a previous blog post, and most recently had to make a breaking API change to the…

View original post 1,204 more words


Introduction To Monte Carlo Methods


I’m going to keep this tutorial light on math, because the goal is just to give a general understanding.

Monte Carlo methods originated from the Manhattan Project, as a way to simulate the distance neutrons would travel through through various materials [1]. Ideas using sampling had been around for a little while, but they took off in the making of the atomic bomb, and have since appeared in lots of other fields.

The idea is this—generate some random samples for some random variable of interest, then use these samples to compute values you’re interested in.

I know, super broad. The truth is because Monte Carlo has a ton of different applications, it’s hard to get a more precise definition. It’s used in product design, to simulate variability in parts manufacturing. It’s used in biology, to simulate average distance of a bird from it’s nest, which would allow a scientist…

View original post 676 more words

What No One Tells You About Real-Time Machine Learning

Full Stack ML

During this year, I heard and read a lot about real-time machine learning. People usually provide this appealing business scenario when discussing credit card fraud detection systems. They say that they can continuously update credit card fraud detection model in real-time (See “What is Apache Spark?”,“…real-time use cases…” and “Real time machine learning”). It looks fantastic but not realistic to me. One important detail is missing in this scenario – continuous flow of transactional data is not needed for model retraining. Instead, you need continuous flow of labeled (or pre-marked as FraudNot-Fraud) transactional data.

Machine learning process Machine learning process

Creating labeled data is probably the slowest and the most expensive step in most of the machine learning systems. Machine learning algorithms learn to detect the fraud transactions from the people which is much like labeled data. Let’s see how it works for fraud detection scenario.

1. Creating model


View original post 598 more words

Entropy and rare events

What's new

Let $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ be two random variables taking values in the same (discrete) range $latex {R}&fg=000000$, and let $latex {E}&fg=000000$ be some subset of $latex {R}&fg=000000$, which we think of as the set of “bad” outcomes for either $latex {X}&fg=000000$ or $latex {Y}&fg=000000$. If $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ have the same probability distribution, then clearly

$latex displaystyle {bf P}( X in E ) = {bf P}( Y in E ).&fg=000000$

In particular, if it is rare for $latex {Y}&fg=000000$ to lie in $latex {E}&fg=000000$, then it is also rare for $latex {X}&fg=000000$ to lie in $latex {E}&fg=000000$.

If $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ do not have exactly the same probability distribution, but their probability distributions are close to each other in some sense, then we can expect to have an approximate version of the above statement. For instance, from the definition of the total variation distance

View original post 1,429 more words

275A, Notes 0: Foundations of probability theory

What's new

Starting this week, I will be teaching an introductory graduate course (Math 275A) on probability theory here at UCLA. While I find myself using probabilistic methods routinely nowadays in my research (for instance, the probabilistic concept of Shannon entropy played a crucial role in my recent paper on the Chowla and Elliott conjectures, and random multiplicative functions similarly played a central role in the paper on the Erdos discrepancy problem), this will actually be the first time I will be teaching a course on probability itself (although I did give a course on random matrix theory some years ago that presumed familiarity with graduate-level probability theory). As such, I will be relying primarily on an existing textbook, in this case Durrett’s Probability: Theory and Examples. I still need to prepare lecture notes, though, and so I thought I would continue my practice of putting my notes online…

View original post 10,103 more words