# **Active Learning Image Data**

**Yarin Gal, Riashat Islam, Zoubin Ghahramani. “Active Learning Image Data”. **** Bayesian Deep Learning workshop, NIPS 2016**

**MPhil Thesis, 2016. University of Cambridge, St John’s College
“Active Learning Image Data with Bayesian Convolutional Neural Networks”
MPhil Machine Learning, Speech and Language Technology
**

**Supervisor : Zoubin Ghahramani and Yarin Gal**

**Cambridge Machine Learning Group**

The recent advances of deep learning in applied machine learning gained tremendous success, addressing the problem of learning from massive amounts of data. However, the challenge now is to learn data-efficiently with the ability to learn in complex domains without requiring deep learning models to be trained with large quantities of data. We present the novel framework of achieving data-efficiency in deep learning through active learning. We develop active learning algorithms for collecting the most informative data for training deep neural network models. Our work is the first to propose active learning algorithms for image data using convolutional neural networks.

Recent work showed that the Bayesian approach to CNNs can offer robustness of these mod- els to overfitting on small datasets. By using dropout in neural networks to avoid overfitting as a Bayesian approximation, we can represent model uncertainty from CNNs for image classification tasks. Our proposed Bayesian active learning algorithms use the predictive distribution from the output of a CNN to query most useful datapoints for image classification with least amount of training data. We present information theoretic acquisition functions which incorporates model uncertainty information, namely Dropout Bayesian Active Learn- ing by Disagreement (Dropout BALD), along with several new acquisition functions, and demonstrate their performance on image classification tasks using MNIST as an example. Since our approach is the first to propose active learning in a deep learning framework, we compare our results with several semi-supervised learning methods which also focuses on learning data-efficiently using least number of training samples.

Our results demonstrate that we can perform active learning in a deep learning framework which has previously not been done for image data. This allows us to achieve data-efficiency in training. We illustrate that compared to standard semi-supervised learning methods, we achieve a considerable improvement in classification accuracy. Using our Bayesian active learning framework using 1000 training samples only, we achieve classification error rate of 0.57%, while the state of the art under purely supervised setting with significantly larger training data is 0.3% on MNIST.

# Improved State Estimation and Control for Resilient Spacecraft Executive

**Summer Research Project, 2015
**

**California Institute of Technology (Caltech)**

**Supervisors: Richard Murray and Catherine McGhan**

Control and Dynamical Systems Group, Caltech

Control and Dynamical Systems Group, Caltech

**Project in collaboration with NASA Jet Propulsion Laboratory (JPL)**Autonomous navigation systems that can operate safely without collisions in an unknown environment can have significant applications towards space exploration missions for science discoveries. In this project, we strive to further develop the initial implementation of a resilient risk-aware software architecture that can handle uncertainty in hazardous environments. We propose a model using ROS for real-time dynamic map up- dates using Hokuyo lidar sensors, and use generated map for autonomous navigation of the rover. We also integrate obstacle avoidance algorithms into the RSE software architecture. Our results demonstrate dynamic map updates for navigation and planning, on both simulation and practical framework us- ing the TurtleBot robot.

# Deterministic Policy Gradient Methods in Reinforcement Learning

**Undergraduate Thesis. “Improving Convergence of Deterministic Policy Gradient Methods in Reinforcement Learning”, University College London, 2015**

**University College London (UCL)
**

**Supervisors : John Shawe-Taylor and Guy Lever**

**UCL Centre for Computational Statistics and Machine Learning**

**and Gatsby Computational Neuroscience Unit**

*Additional Supervisor : David Silver, Google DeepMind*

Policy gradient methods in reinforcement learning directly optimize a parameterized control policy with respect to the long-term cumulative reward. Stochastic policy gradients for solving large problems with continuous state and action spaces have been extensively studied in reward-related learning problems. Recent work also showed existence of deterministic policy gradients, which has a model-free form that follows the gradient of the action-value function. In this project, we implement both stochastic and deterministic policy gradient methods and show learning performance guarantees on benchmark reinforcement learning problems, namely Toy, Grid World, Mountain Car and Inverted Pendulum MDPs. We analyze convergence of both the policy gradient methods using different optimization techniques. Furthermore, we introduce local goal distractors in the state space to compare convergence of both of stochastic and deterministic policy gradient methods to a local optima. We also consider off-policy methods to learn a deterministic target policy generated by an arbitrary stochastic behavior policy to ensure adequate exploration.

Our results show that using a deterministic off-policy behavior policy in case of deterministic policy gradients, convergence to a local optima is more prominent even for simple MDPs. Our work analyze convergence to local optima for different stochasticity in the policy. We demonstrate that with sufficient exploratory stochastic policy to generate trajectories for the deterministic gradient, we can avoid local minima convergence of deterministic policy gradient algorithms. Our results also show that with multiple local optima in the policy space, deterministic policy gradients can outperform their stochastic counterparts for standard reinforcement learning benchmark tasks.

# **TIMIT Speech Recognition using DNN-HMMs using HTK Toolkit**

**Project as part of a coursework in Speech Recognition course**

**MPhil Course : Speech Recognition
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Mark Gales,**

**Phil Woodland****Cambridge Speech Research Group**

We consider DNN-HMM based hybrid speech recognition systems and evaluate why it surpasses GMM-HMM based approaches. We show how context-dependent DNN-HMMs an achieve unprecedented gain in speech recognition tasks. Our analysis includes DNN based acoustic modelling, evaluating how DNN’s feature vectors can be concatenated from several consecutive speech frames with a relatively long context window. We consider analysis of how the gain of DNN-HMM based models can achieve better results, since DNN’s feature vectors can be concatenated from several consecutive speech frames with a relatively long context window. Compared to previous approaches based on Gaussian Mixture Models (GMMs) that were used to model the state observation probabilities, based on maximum likelihood criterion or discriminitive training, we consider how the context dependent deep neural networks can replace the GMMs to compute the state observation probabilities for all tied states in the DNN-HMM hybrid model.

# Speech Recognition with GMM-HMMs using the HTK Toolkit

**Project as part of a coursework in Speech Recognition course**

**MPhil Course : Speech Recognition
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Mark Gales, Phil Woodland**

**Cambridge Speech Research Group**

Automatic speech recognition is the process of mapping a speech signal to the corresponding sequence of words that it represents. Any of the general purpose speech recognition systems are based on Hidden Markov Models (HMMs). In this work, we consider building a speech recognition system based on GMM-HMMs, considering phones. Our work is based on the Cambridge HTK toolkit. First we give an overview of basic concepts of training and decoding the system, along with brief overview of Gaussian Mixture Model (GMM) based HMMs and the language model. We then consider acoustic modelling with monophone models. Finally, we consider using context dependent triphone models for acoustic modelling. We then evaluate the performance of the system based on using a Bigram Language Model instead of using a Unigram model. Finally, based on modelling accuracy, estimated parameters and generalisation, we comment on the overall best system suitable for phone recognition based on available training corpus.

# Unifying review of Variational Inference and Learning in Deep Directed Latent Variable Models

**Project as part of a coursework in Advanced Machine Learning course**

**MPhil Course : Advanced Machine Learning
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Zoubin Ghahramani, Richard Turner**

**Cambridge Machine Learning Group**

Deep generative latent variable models have recently gained significant interest due to development of efficient and scalable variational inference methods. However, until recently, directed latent variable model were difficult to train on large datasets. In this work, we provide an overview of several recent methods that have been developed for perform- ing stochastic variational inference on large datasets. We provide an overview of theoretical and experimental results for providing a benchmark comparison of the variational inference methods based on feedforward neural networks. All of the approaches in comparison considers gradient-based maximization of variational lower bound The methods in comparison are all applied on the MNIST dataset as a benchmark comparison. We implemented our own approach to the Auto-Encoding Variational Bayes algorithm (Kingma & Welling, 2013), and compared it with other approaches. Our experimental results show the significance of the different variance reduction techniques for the gradient estimator of the lower bound of the log likelihood.

In our work we provide a unifying review of efficient inference and learning algorithms in directed generative models with many layers of hidden variables. It is known that directed latent variable models are difficult to train on large datasets since exact inference in such models is intractable. In our work, we compare the different approaches performing inference in deep directed graphical models.

# Keyword Spotting

**Project as part of a coursework in Speech and Language Processing course**

**MPhil Course : Speech and Language Processing Applications
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Mark Gales, Phil Woodland**

**Cambridge Speech Research Group**

In this work, we consider the task of keyword spotting based on Swahili language, as part of the Babel project. Keyword spotting is the task to automatically detect keywords from a continuous speech system or from written texts from a stream of audio. The task is to find a query keyword or phrase from written text. We focus on KWS technology on low language resource conditions, inspired from IARPA’s Babel program. The effectiveness of KWS technology is based on a trade-off between processing resource requirements and detection accuracy.

We consider a basic KWS system for querying words and phrases from a 1-best list. We then consider improvements in detection accuracy based on multiple system combinations. For each section, we provide an outline of different tasks, include a brief description of the code, provide experimental results and include a discussion of results. Finally, we include a summary of the work and a discussion of overall keyword spotting system performance. Code snippets for 1-best output KWS and system combination are also given in appendix.

# Statistical Machine Translation

**Project as part of a coursework in Statistical Machine Translation course**

**MPhil Course : Statistical Machine Translation
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Bill Byrne**

**Cambridge Statistical Machine Translation Group**

We consider using Weighted Finite State Transducers (WFSTs). We also consider the task of statistical phrase based translation using hierarchical phrases. It considers subphrases within phrases as the basic unit of translation. The task is based on a hierarchical phrase-based decoder that is implemented using weighted finite-state transducers (HiFst). The HiFst decoder generates translation lattices and the decoding is based on a synchronous context-free grammar. Finally we consider a built-in decoder that has been built with WFSTs using OpenFST libraries.

# Reinforcement Learning for Spoken Dialogue Systems

**Project as part of a coursework in Spoken Dialogue Systems course**

**MPhil Course : Statistical Spoken Dialogue Systems
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Steve Young, Milica Gasic, Eddy Pei-Hao Su**

**Cambridge Spoken Dialogue Systems Group**

Reinforcement learning based spoken dialogue systems have been shown beneficial by allowing dialogue policy to be optimised to plan and act under the uncertainty introduced by noisy speech recognition and semantic decoding. However, the objective of the optimisation, reward, in SDS is usually hard to measured due to the unknown of the dialogue task on-line and the unreliability of the user satisfaction rating. In addition, these reinforcement signals are often set as an overall evaluation when the dialogue ends and thus sparse, requiring learning system to explore more until it finds the (sub-)optimal solutions. The key focus of the research is to develop a solution to these problems. Recurrent neural networks and Gaussian process are utilised to model the sequential dialogue data for adequate reward function setting for realistic and speeding up dialogue policy learning.

We first give a brief introduction to the two key ideas to a Partially Observable Markov Decision Process (POMDP) based dialogue system: belief state tracking and reinforcement learning. Belief state tracking in SDS is used to update the posterior probability of the belief state after each user input using Bayesian inference. The system maintains a belief distribution over all states such as to capture all possible dialogue paths. The states are encoded into three distinct types of information: user’s goal, most recent utterance and the dialogue history.

In a spoken dialogue system, a dialogue manager is used such as to select actions based on observations of events (what the user says) and inferred beliefs using belief state tracking. In reinforcement learning based SDS, the goal is to learn a policy such as to optimize the action selection process (what the system says back to user) such as to maximize a reward function. Rewards are associated with each state action pairs and it gives an objective measure of the performance of the system both off-line using dialogue corpora and on-line through interaction with real or simulated users. Decision- making and policy optimisation takes place in the summary space, which is a compressed summary space in which states and actions are simplified. Summary spaces are considered as a subspace of the master space, and so a dictionary is maintained consisting of state action pairs. Therefore, the policy can be considered as a function of the summary belief state and actions, instead of the original belief state and actions.

Reinforcement Learning

**Project as part of a coursework in Reinforcement Learning course**

**MPhil Course : Reinforcement Learning
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Zoubin Ghahramani, Matt Hoffman (Google DeepMind)**

**Cambridge Machine Learning Group**

In this work, we consider analysis of the basic reinforcement learning algorithms on three different mod- els (MDPs). We consider value and policy iteration and discuss the proof of convergence for these algorithms. We then consider the difference in performance between SARSA and Q-Leaning on benchmark RL tasks such as the cliffworld model. In each section, we first include a brief explanation of the algorithm, present code used for implementation and then include a discussion of results.

We examined the convergence and performance of policy and value iteration algorithms, and discuss how the convergence of these algorithms to the optimal value function depends on the number of iterations used. Furthermore, we evaluated the difference between on-policy SARSA and on-policy Q-learning algorithms and showed how the performance of these algorithms depends on the exploration-exploitation tradeoff, and on learning rates. Our experiments were evaluated on benchmark reinforcement learning tasks such as a smallworld, gridworld and a cliffworld MDP to analyze the performance of our algorithms.

# Playing Blackjack (Easy 21) with Reinforcement Learning

**Project as part of a coursework in Reinforcement Learning course**

**Undergraduate Course : Advanced Topics in Machine Learning
**

**Department of Computer Science, University College London (UCL)**

**Supervisor : David Silver (Google DeepMind)**

**Course offered by the Gatsby Computational Neuroscience Unit, UCL**

Cost-Sensitive Decision Tree of Classifiers

**Summer Research Project, 2015
**

**Department of Computer Science**

Johns Hopkins University

Johns Hopkins University

**Supervisor : Suchi Saria**

**JHU Machine Learning Group**

# Gaussian Processes

**Project as part of a coursework in Advanced Machine Learning course**

**MPhil Course : Advanced Topics in Machine Learning
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Zoubin Ghahramani, Matt Hoffman (Google DeepMind)**

**Cambridge Machine Learning Group**

# Weighted Automata using OpenFST

**MPhil Course : Weighted Automata
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Bill Byrne**

**Cambridge Machine Translation Group**

# Statistical Speech Synthesis

**MPhil Course : Speech Synthesis
**

**Department of Engineering, University of Cambridge**

**MPhil Machine Learning, Speech and Language Technology**

**Supervisor : Mark Gales**

**Cambridge Speech Research Group**

This work presents an overview for the basic techniques in statistical parametric speech synthesis based on HMM systems. We present experimental demonstrations and discuss the different approaches for accurate speech synthesis. Our experimental results demonstrate the approaches for synthesis and parameter trajectory generation, and how the global variance method can alter the form of the generated.

A typical speech synthesis system consists of training and synthesis parts as shown by the block diagram in figure 1. In a text to speech synthesis system, the training part consists of extracting both spectrum and excitation parameters from a speech database and model it by context-dependent HMMs. During the synthesis stage, the text for a given speech utterance is first converted to context dependent sequences of labels and then the utterance HMM is constructed by concatenating multiple HMMs according to the given label sequence. A sequence of speech parameters are then generated based on which the speech waveform can finally be synthesized.