Python on Batı Şengül

Introduction To Hamiltonian Monte Carlo

batisengul@gmail.com — Fri, 02 Jul 2021 00:00:00 +0000

Introduction to Hamiltonian Monte Carlo One thing that has been occupying my head in the past couple of weeks has been HMC and how it can be used in large data/large model context. HMC stands for Hamiltonian Monte Carlo and it’s the de facto Bayesian method for sampling due to it’s speed. Before getting into big datasets and big models, let me motivate this problem a little bit. If you are new to Bayesian modelling, I have a little primer on the topic so I will assume for the most part you are familiar with basic Bayesianism.

Wasserstein variational autoencoders

batisengul@gmail.com — Wed, 20 Nov 2019 00:00:00 +0000

Variational auto-encoders (VAEs) are a latent space model. The idea is you have some latent space variable $z \in \mathbb{R}^{k}$ which describes your original variables $x\in\mathbb{R}^d$ in higher dimensional space by a latent model $p(x|z)$. Let’s assume that this distribution is given by a neural network with some parameters $\theta$ so that we assume $$ x | z, \theta \sim N(g_\theta(z), 1). $$ Of course in reality, we don’t know $(z, \theta)$, we would like to infer these from the data.

Introduction To Tensorflow Estimator

batisengul@gmail.com — Tue, 19 Nov 2019 00:00:00 +0000

In this post I am going to introduce tf.estimator library. So first of all, what is this library trying to do? When writing tensorflow code, there is a lot of repeated operations that we need to do: read the data in batches process the data, e.g. convert images to floats run a for loop and take a few gradient descent steps save model weights to disk output metrics to tensorboard The keras library makes this quite a bit easier, but there are times when you might need to use plain old tensorflow (it gets quite hacky to implement some multiple output models and GANs in keras).

FizzBuzz with neural networks and NALU

batisengul@gmail.com — Sat, 16 Mar 2019 00:00:00 +0000

FizzBuzz is one of the most well-known interview questions. The problem is stated as: Write the numbers from 0 to n replacing any number divisible by 3 with Fizz, divisible by 5 by Buzz and divisible by both 3 and 5 by FizzBuzz. The example program should output 1, 2, Fizz, 3, Buzz, Fizz, 7, 8, Fizz, Buzz. A while back, there was this infamous post where the author claimed to solve this problem in an interview using tensorflow.

From Zero To State Of The Art NLP Part II - Transformers

batisengul@gmail.com — Tue, 12 Mar 2019 00:00:00 +0000

Welcome to part two of the two part series on a crash course into state of the art natural language processing. This part is going to go through the transformer architecture from Attention Is All You Need. If you haven’t done so already, read the first part which introduces attention mechanisms. This post is all about transformers and assumes you know attention mechanisms.

From Zero To State Of The Art NLP Part I - Attention mechanism

batisengul@gmail.com — Wed, 06 Mar 2019 00:00:00 +0000

There has been some really amazing advances in natural language processing (NLP) in the last couple of years. Back in November 2018, Google released https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html, which is based on attention mechanisms in Attention Is All You Need. In this two part series, I will assume you know nothing about NLP, have some understanding about neural networks, and take you from the start to end of understanding how transformers work. Natural language processing is the art of using machine learning techniques in processing language.

Variational inference, the art of approximate sampling

batisengul@gmail.com — Sat, 21 Jul 2018 00:00:00 +0000

In the spirit of looking at fancy word topics, this post is about variational inference. Suppose you granted me one super power and I chose the ability to sample from any distribution in a fast and accurate way. Now, you might think that’s a crappy super-power, but that basically enables me to fit any model I want and provide uncertainty estimates. To make the problem concrete, lets suppose you are trying to sample from a distribution $p(x)$.