I am a machine learning scientist at Amazon. Prior to joining Amazon, I did my PhD in Statistics at the University of Oxford, supervised by Professor Yee Whye Teh. My research interests lie in the field of causality, probabilistic inference, Gaussian Process and Bayesian optimization. importance sampling, and its connection to exploration-exploitation.
(Accepted by Advances in Approximate Bayesian Inference Workshop, 2017)
We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation. Borrowing ideas from the bandits literature, we propose Daisee, a partition-based AIS algorithm. We further introduce a notion of regret for AIS and show that Daisee has (T‾‾√(logT)34) cumulative pseudo-regret, where T is the number of iterations. We then extend Daisee to adaptively learn a hierarchical partitioning of the sample space for more efficient sampling and confirm the performance of both algorithms empirically.
We tackle the problem of optimizing a black-box objective function defined over a highly-structured input space. This problem is ubiquitous in science and engineering. In machine learning, inferring the structure of a neural network or the Automatic Statistician (AS), where the optimal kernel combination for a Gaussian process is selected, are two important examples. We use the \as as a case study to describe our approach, that can be easily generalized to other domains. We propose an Structure Generating Variational Auto-encoder (SG-VAE) to embed the original space of kernel combinations into some low-dimensional continuous manifold where Bayesian optimization (BO) ideas are used. This is possible when structural knowledge of the problem is available, which can be given via a simulator or any other form of generating potentially good solutions. The right exploration-exploitation balance is imposed by propagating into the search the uncertainty of the latent space of the SG-VAE, that is computed using variational inference. The key aspect of our approach is that the SG-VAE can be used to bias the search towards relevant regions, making it suitable for transfer learning tasks. Several experiments in various application domains are used to illustrate the utility and generality of the approach described in this work.
(Accepted by Artificial Intelligence and Statistics Conference (AISTATS), 2017)
Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.
We introduce the Tucker Gaussian Process (TGP), a model for regression that regularises a Gaussian Process (GP) towards simpler regression functions for enhanced generalisation performance. We derive it using a novel approach to scalable GP learning, and show that our model is particularly well-suited to grid-structured data and problems where the dependence on covariates is close to being separable. A prime example is collaborative filtering, for which our model provides an effective GP based method that has a low-rank matrix factorisation at its core. We show that TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information.
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone.
• Research project in imitation learning with latent variable modelling.
• Delivered a research paper.
Work in the Model Governance Group, current projects on caturing risks not in VaR(Value at Risk) for CDS.
• Research project in Bayesian Optimization when the input space is non-Euclidean, with an application in automated model selection. Successfully implemented the model in Python and presented the work to the group.
• Delivered a paper and has been accepted by ICML 2018. The paper has also been submitted to the Bayesian optimization for science and engineering, NIPS, 2017 which is under review.
• Implemented the VAE(Variational Autoencoder) module in a deep learning framework (MxNet) and contributed to the public repository.
Lecturer in Probability and Statistics
• Collect big data from database using query languages such as SQL.
• Create competitive analysis and benchmarking study for account hijacking, recommend strategy ad- justments based on findings.
• Analyse hijacking trends within a specific set of products and develop an action plan based on trends and patterns.
• Analyse preventable abuse related issues which impact users, and identify core and common prevention focus areas across Product Quality Operation.
• Partner with engineering teams to improve our hijacking prevention, detection and recovery systems.
• Build statistical models to select relavant features and predict goodness/badness of clusters of accounts.
• Delivered excellent presentation and documentation.
• Build pricing models for Calendar Spread Options using Excel and VBA.
• Perform model calibration and validation, as well as hedging simulation for historical data.
• Delivered excellent results and received exceptional feedback from managers and colleagues.
• Build predictive models for bid-offer curves for forecasting in the European power market.
• Data sampling and manipulation using statistical and programming tools including R and Python.
• Have received excellent feedback and successfully implemented the pricing model which was in pro- duction.
• In charge of assisting with daily business and organising files in a group.
• Provide customer service and maintain relationships with clients in a fast-paced environment.
• Improved customers’ satisfaction by 10%.