Preprints

  1. D. Sejdinovic, An Overview of Causal Inference using Kernel Embeddings, ArXiv e-prints:2410.22754, 2024.

    Kernel embeddings have emerged as a powerful tool for representing probability measures in a variety of statistical inference problems. By mapping probability measures into a reproducing kernel Hilbert space (RKHS), kernel embeddings enable flexible representations of complex relationships between variables. They serve as a mechanism for efficiently transferring the representation of a distribution downstream to other tasks, such as hypothesis testing or causal effect estimation. In the context of causal inference, the main challenges include identifying causal associations and estimating the average treatment effect from observational data, where confounding variables may obscure direct cause-and-effect relationships. Kernel embeddings provide a robust nonparametric framework for addressing these challenges. They allow for the representations of distributions of observational data and their seamless transformation into representations of interventional distributions to estimate relevant causal quantities. We overview recent research that leverages the expressiveness of kernel embeddings in tandem with causal inference.

    @unpublished{Sej2024,
      title = {{An Overview of Causal Inference using Kernel Embeddings}},
      author = {Sejdinovic, Dino},
      year = {2024},
      journal = {ArXiv e-prints:2410.22754},
      arxiv = {https://arxiv.org/abs/2410.22754}
    }
    
  2. S. L. Chau, A. Schrab, A. Gretton, D. Sejdinovic, and K. Muandet, Credal Two-Sample Tests of Epistemic Ignorance, ArXiv e-prints:2410.12921, 2024.

    We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets – convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller’s partial ignorance. Classical two-sample tests, which rely on comparing precise distributions, fail to address epistemic uncertainty due to partial ignorance. To bridge this gap, we generalise two-sample tests to compare credal sets, enabling reasoning for equality, inclusion, intersection, and mutual exclusivity, each offering unique insights into the modeller’s epistemic beliefs. We formalise these tests as two-sample tests with nuisance parameters and introduce the first permutation-based solution for this class of problems, significantly improving upon existing methods. Our approach properly incorporates the modeller’s epistemic uncertainty into hypothesis testing, leading to more robust and credible conclusions, with kernel-based implementations for real-world applications.

    @unpublished{ChaSchGreSejMua2024,
      title = {{Credal Two-Sample Tests of Epistemic Ignorance}},
      author = {Chau, Siu Lun and Schrab, Antonin and Gretton, Arthur and Sejdinovic, Dino and Muandet, Krikamol},
      year = {2024},
      journal = {ArXiv e-prints:2410.12921},
      arxiv = {https://arxiv.org/abs/2410.12921}
    }
    
  3. J. Lenhardt, J. Quaas, D. Sejdinovic, and D. Klocke, CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers, EGUsphere [preprint], 2024.

    Clouds constitute, through their interactions with incoming solar radiation and outgoing terrestrial radiation, a fundamental element of the Earth’s climate system. Different cloud types show a wide variety in cloud microphysical or optical properties, phase, vertical extent or temperature among others, and thus disparate radiative effects. Both in observational and model datasets, classifying cloud types is also of large importance since different cloud types respond differently to current and future anthropogenic climate change. Cloud types have traditionally been defined using a simplified partition of the space determined by spatially aggregated values e.g. of the cloud top pressure and the cloud optical thickness. In this study, we present a method called CloudViT (Cloud Vision Transformer) building upon spatial extracts of cloud properties from the MODIS instrument to derive cloud types, leveraging spatial features and patterns with a vision transformer model. The classification model is based on global surface observations of cloud types. The method is then evaluated through the distributions of cloud type properties and the corresponding spatial patterns of cloud type occurrences for a global cloud type dataset produced over a year-long period. Subsequently, a first application of the cloud type classification method to climate model data is presented. This application additionally provides insights into how global storm-resolving models are representing clouds as these models are increasingly being used to perform simulations. The global cloud type dataset and the method code constituting CloudViT are available from Zenodo (Lenhardt et al., 2024b).

    @unpublished{LenQuaSejKlo2024,
      author = {Lenhardt, J. and Quaas, J. and Sejdinovic, D. and Klocke, D.},
      title = {CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers},
      journal = {EGUsphere [preprint]},
      year = {2024},
      url = {https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2724/},
      doi = {10.5194/egusphere-2024-2724}
    }
    
  4. B. G. Doan, A. Shamsi, X.-Y. Guo, A. Mohammadi, H. Alinejad-Rokny, D. Sejdinovic, D. C. Ranasinghe, and E. Abbasnejad, Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks, ArXiv e-prints:2407.20891, 2024.

    Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.

    @unpublished{Doanetal2024,
      title = {Bayesian {{Low-Rank LeArning}} ({{Bella}}): {{A Practical Approach}} to {{Bayesian Neural Networks}}},
      author = {Doan, Bao Gia and Shamsi, Afshar and Guo, Xiao-Yu and Mohammadi, Arash and {Alinejad-Rokny}, Hamid and Sejdinovic, Dino and Ranasinghe, Damith C. and Abbasnejad, Ehsan},
      year = {2024},
      journal = {ArXiv e-prints:2407.20891},
      arxiv = {https://arxiv.org/abs/2407.20891}
    }
    
  5. D. Martinez-Taboada and D. Sejdinovic, Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation, ArXiv e-prints:2211.01518, 2022.

    The counterfactual distribution models the effect of the treatment in the untreated group. While most of the work focuses on the expected values of the treatment effect, one may be interested in the whole counterfactual distribution or other quantities associated to it. Building on the framework of Bayesian conditional mean embeddings, we propose a Bayesian approach for modeling the counterfactual distribution, which leads to quantifying the epistemic uncertainty about the distribution. The framework naturally extends to the setting where one observes multiple treatment effects (e.g. an intermediate effect after an interim period, and an ultimate treatment effect which is of main interest) and allows for additionally modelling uncertainty about the relationship of these effects. For such goal, we present three novel Bayesian methods to estimate the expectation of the ultimate treatment effect, when only noisy samples of the dependence between intermediate and ultimate effects are provided. These methods differ on the source of uncertainty considered and allow for combining two sources of data. Moreover, we generalize these ideas to the off-policy evaluation framework, which can be seen as an extension of the counterfactual estimation problem. We empirically explore the calibration of the algorithms in two different experimental settings which require data fusion, and illustrate the value of considering the uncertainty stemming from the two sources of data.

    @unpublished{MarSej2022b,
      title = {{Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation}},
      author = {Martinez-Taboada, Diego and Sejdinovic, Dino},
      year = {2022},
      journal = {ArXiv e-prints:2211.01518},
      arxiv = {https://arxiv.org/abs/2211.01518}
    }
    
  6. D. Martinez-Taboada and D. Sejdinovic, Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings, ArXiv e-prints:2210.13692, 2022.

    The problem of sequentially maximizing the expectation of a function seeks to maximize the expected value of a function of interest without having direct control on its features. Instead, the distribution of such features depends on a given context and an action taken by an agent. In contrast to Bayesian optimization, the arguments of the function are not under agent’s control, but are indirectly determined by the agent’s action based on a given context. If the information of the features is to be included in the maximization problem, the full conditional distribution of such features, rather than its expectation only, needs to be accounted for. Furthermore, the function is itself unknown, only counting with noisy observations of such function, and potentially requiring the use of unmatched data sets. We propose a novel algorithm for the aforementioned problem which takes into consideration the uncertainty derived from the estimation of both the conditional distribution of the features and the unknown function, by modeling the former as a Bayesian conditional mean embedding and the latter as a Gaussian process. Our algorithm empirically outperforms the current state-of-the-art algorithm in the experiments conducted.

    @unpublished{MarSej2022a,
      title = {{Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings}},
      author = {Martinez-Taboada, Diego and Sejdinovic, Dino},
      year = {2022},
      journal = {ArXiv e-prints:2210.13692},
      arxiv = {https://arxiv.org/abs/2210.13692}
    }
    
  7. M. Matabuena, J. C. Vidal, O. H. M. Padilla, and D. Sejdinovic, Kernel Biclustering Algorithm in Hilbert Spaces, ArXiv e-prints:2208.03675, 2022.

    Biclustering algorithms partition data and covariates simultaneously, providing new insights in several domains, such as analyzing gene expression to discover new biological functions. This paper develops a new model-free biclustering algorithm in abstract spaces using the notions of energy distance (ED) and the maximum mean discrepancy (MMD) – two distances between probability distributions capable of handling complex data such as curves or graphs. The proposed method can learn more general and complex cluster shapes than most existing literature approaches, which usually focus on detecting mean and variance differences. Although the biclustering configurations of our approach are constrained to create disjoint structures at the datum and covariate levels, the results are competitive. Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice, outperforming them when cluster differences are concentrated in higher-order moments. The model’s performance has been tested in several situations that involve simulated and real-world datasets. Finally, new theoretical consistency results are established using some tools of the theory of optimal transport.

    @unpublished{MatVidPadSej2022,
      title = {{Kernel Biclustering Algorithm in Hilbert Spaces}},
      author = {Matabuena, Marcos and Vidal, J.C. and Padilla, Oscar Hernan Madrid and Sejdinovic, Dino},
      year = {2022},
      journal = {ArXiv e-prints:2208.03675},
      arxiv = {https://arxiv.org/abs/2208.03675}
    }
    
  8. V. D. Wild, M. Kanagawa, and D. Sejdinovic, Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes, ArXiv e-prints:2106.01121, 2021.

    We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to massive data, focusing on the Nyström method and the Sparse Variational Gaussian Processes (SVGP). While sparse approximation methods for GPs and kernel methods share some algebraic similarities, the literature lacks a deep understanding of how and why they are related. This is a possible obstacle for the communications between the GP and kernel communities, making it difficult to transfer results from one side to the other. Our motivation is to remove this possible obstacle, by clarifying the connections between the sparse approximations for GPs and kernel methods. In this work, we study the two popular approaches, the Nyström and SVGP approximations, in the context of a regression problem, and establish various connections and equivalences between them. In particular, we provide an RKHS interpretation of the SVGP approximation, and show that the Evidence Lower Bound of the SVGP contains the objective function of the Nyström approximation, revealing the origin of the algebraic equivalence between the two approaches. We also study recently established convergence results for the SVGP and how they are related to the approximation quality of the Nyström method.

    @unpublished{WilKanSej2021,
      title = {{Connections and Equivalences between the Nystr\"om Method and Sparse Variational Gaussian Processes}},
      author = {Wild, Veit D. and Kanagawa, Motonobu and Sejdinovic, Dino},
      year = {2021},
      journal = {ArXiv e-prints:2106.01121},
      arxiv = {https://arxiv.org/abs/2106.01121}
    }
    
  9. H. Zhu, A. Howes, O. van Eer, M. Rischard, Y. Li, D. Sejdinovic, and S. Flaxman, Aggregated Gaussian Processes with Multiresolution Earth Observation Covariates, ArXiv e-prints:2105.01460, 2021.

    For many survey-based spatial modelling problems, responses are observed as spatially aggregated over survey regions due to limited resources. Covariates, from weather models and satellite imageries, can be observed at many different spatial resolutions, making the pre-processing of covariates a key challenge for any spatial modelling task. We propose a Gaussian process regression model to flexibly handle multiresolution covariates by employing an additive kernel that can efficiently aggregate features across resolutions. Compared to existing approaches that rely on resolution matching, our approach better maintains distributional information across resolutions, leading to better performance and interpretability. Our model yields stronger predictive performance and interpretability on both simulated and crop yield datasets.

    @unpublished{ZhuHowEerRisSejFla2021,
      title = {Aggregated Gaussian Processes with Multiresolution Earth Observation Covariates},
      author = {Zhu, Harrison and Howes, Adam and van Eer, Owen and Rischard, Maxime and Li, Yingzhen and Sejdinovic, Dino and Flaxman, Seth},
      year = {2021},
      journal = {ArXiv e-prints:2105.01460},
      arxiv = {https://arxiv.org/abs/2105.01460}
    }
    
  10. M. Kanagawa, P. Hennig, D. Sejdinovic, and B. K. Sriperumbudur, Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences, ArXiv e-prints:1807.02582, 2018.

    This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

    @unpublished{KanHenSejSri2018,
      author = {Kanagawa, M. and Hennig, P. and Sejdinovic, D. and Sriperumbudur, B.K.},
      title = {{{ Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences}}},
      journal = {ArXiv e-prints:1807.02582},
      arxiv = {https://arxiv.org/abs/1807.02582},
      year = {2018}
    }
    
  11. H. Strathmann, D. Sejdinovic, and M. Girolami, Unbiased Bayes for Big Data: Paths of Partial Posteriors, ArXiv e-prints:1501.03326, 2015.

    A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations – without explicit simulation from the full posterior. The scheme’s variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.

    @unpublished{StrSejGir2015,
      author = {Strathmann, H. and Sejdinovic, D. and Girolami, M.},
      title = {{{Unbiased Bayes for Big Data: Paths of Partial Posteriors}}},
      journal = {ArXiv e-prints:1501.03326},
      arxiv = {http://arxiv.org/abs/1501.03326},
      year = {2015}
    }
    

Published / In Press

  1. R. Oliveira, D. Sejdinovic, D. Howard, and E. Bonilla, Bayesian Adaptive Calibration and Optimal Design, in Advances in Neural Information Processing Systems (NeurIPS), 2024.

    The process of calibrating computer models of natural phenomena is essential for applications in the physical sciences, where plenty of domain knowledge can be embedded into simulations and then calibrated against real observations. Current machine learning approaches, however, mostly rely on rerunning simulations over a fixed set of designs available in the observed data, potentially neglecting informative correlations across the design space and requiring a large amount of simulations. Instead, we consider the calibration process from the perspective of Bayesian adaptive experimental design and propose a data-efficient algorithm to run maximally informative simulations within a batch-sequential process. At each round, the algorithm jointly estimates the parameters of the posterior distribution and optimal designs by maximising a variational lower bound of the expected information gain. The simulator is modelled as a sample from a Gaussian process, which allows us to correlate simulations and observed data with the unknown calibration parameters. We show the benefits of our method when compared to related approaches across synthetic and real-data problems.

    @inproceedings{OliSejHowBon2024,
      title = {{Bayesian Adaptive Calibration and Optimal Design}},
      author = {Oliveira, Rafael and Sejdinovic, Dino and Howard, David and Bonilla, Edwin},
      year = {2024},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2405.14440}
    }
    
  2. B. Severin, D. T. Lennon, L. C. Camenzind, F. Vigneau, F. Fedele, D. Jirovec, A. Ballabio, D. Chrastina, G. Isella, M. de Kruijf, M. J. Carballido, S. Svab, A. V. Kuhlmann, S. Geyer, F. N. M. Froning, H. Moon, M. A. Osborne, D. Sejdinovic, G. Katsaros, D. M. Zumbühl, G. A. D. Briggs, and N. Ares, Cross-Architecture Tuning of Silicon and SiGe-based Quantum Devices Using Machine Learning, Scientific Reports, vol. 14, no. 1, 17281, Jul. 2024.

    The potential of Si and SiGe-based devices for the scaling of quantum circuits is tainted by device variability. Each device needs to be tuned to operation conditions and each device realisation requires a different tuning protocol. We demonstrate that it is possible to automate the tuning of a 4-gate Si FinFET, a 5-gate GeSi nanowire and a 7-gate Ge/SiGe heterostructure double quantum dot device from scratch with the same algorithm. We achieve tuning times of 30, 10, and 92 min, respectively. The algorithm also provides insight into the parameter space landscape for each of these devices, allowing for the characterization of the regions where double quantum dot regimes are found. These results show that overarching solutions for the tuning of quantum devices are enabled by machine learning.

    @article{Severinetal2024,
      title = {Cross-Architecture Tuning of Silicon and {{SiGe-based}} Quantum Devices Using Machine Learning},
      author = {Severin, B. and Lennon, D. T. and Camenzind, L. C. and Vigneau, F. and Fedele, F. and Jirovec, D. and Ballabio, A. and Chrastina, D. and Isella, G. and {de Kruijf}, M. and Carballido, M. J. and Svab, S. and Kuhlmann, A. V. and Geyer, S. and Froning, F. N. M. and Moon, H. and Osborne, M. A. and Sejdinovic, D. and Katsaros, G. and Zumb{\"u}hl, D. M. and Briggs, G. A. D. and Ares, N.},
      year = {2024},
      month = jul,
      journal = {Scientific Reports},
      volume = {14},
      number = {1},
      pages = {17281},
      publisher = {Nature Publishing Group},
      issn = {2045--2322},
      doi = {10.1038/s41598-024-67787-z},
      url = {https://doi.org/10.1038/s41598-024-67787-z}
    }
    
  3. J. Lenhardt, J. Quaas, and D. Sejdinovic, Marine cloud base height retrieval from MODIS cloud properties using machine learning, Atmospheric Measurement Techniques, vol. 17, no. 18, 5655–5677, 2024.

    Clouds are a crucial regulator in the Earth’s energy budget through their radiative properties, both at the top of the atmosphere and at the surface; hence, determining key factors like their vertical extent is of essential interest. While the cloud top height is commonly retrieved by satellites, the cloud base height is difficult to estimate from satellite remote sensing data. Here, we present a novel method called ORABase (Ordinal Regression Auto-encoding of cloud Base), leveraging spatially resolved cloud properties from the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument to retrieve the cloud base height over marine areas. A machine learning model is built with two components to facilitate the cloud base height retrieval: the first component is an auto-encoder designed to learn a representation of the data cubes of cloud properties and to reduce their dimensionality. The second component is developed for predicting the cloud base using ground-based ceilometer observations from the lower-dimensional encodings generated by the aforementioned auto-encoder. The method is then evaluated based on a collection of collocated surface ceilometer observations and retrievals from the CALIOP satellite lidar. The statistical model performs similarly on both datasets and performs notably well on the test set of ceilometer cloud bases, where it exhibits accurate predictions, particularly for lower cloud bases, and a narrow distribution of the absolute error, namely 379 and 328 m for the mean absolute error and the standard deviation of the absolute error, respectively. Furthermore, cloud base height predictions are generated for an entire year over the ocean, and global mean aggregates are also presented, providing insights into global cloud base height distributions and offering a valuable dataset for extensive studies requiring global cloud base height retrievals. The global cloud base height dataset and the presented models constituting ORABase are available from Zenodo (Lenhardt et al., 2024).

    @article{LenQuaSej2024,
      author = {Lenhardt, J. and Quaas, J. and Sejdinovic, D.},
      title = {{Marine cloud base height retrieval from MODIS cloud properties using machine learning}},
      journal = {Atmospheric Measurement Techniques},
      year = {2024},
      volume = {17},
      number = {18},
      pages = {5655--5677},
      doi = {10.5194/amt-17-5655-2024},
      url = {https://doi.org/10.5194/amt-17-5655-2024}
    }
    
  4. J. Fawkes, R. Hu, R. J. Evans, and D. Sejdinovic, Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects, Transactions on Machine Learning Research, 2024.

    With the widespread application of causal inference, it is increasingly important to have tools which can test for the presence of causal effects in a diverse array of circumstances. In this vein we focus on the problem of testing for distributional causal effects, where the treatment affects not just the mean, but also higher order moments of the distribution, as well as multidimensional or structured outcomes. We build upon a previously introduced framework, Counterfactual Mean Embeddings, for representing causal distributions within Reproducing Kernel Hilbert Spaces (RKHS) by proposing new, improved, estimators for the distributional embeddings. These improved estimators are inspired by doubly robust estimators of the causal mean, using a similar form within the kernel space. We analyse these estimators, proving they retain the doubly robust property and have improved convergence rates compared to the original estimators. This leads to new permutation-based tests for distributional causal effects, by constructing the test statistics based on the estimators we propose. We experimentally and theoretically demonstrate the validity of our tests.

    @article{FawHuEvaSej2024,
      title = {{Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects}},
      author = {Fawkes, Jake and Hu, Robert and Evans, Robin J. and Sejdinovic, Dino},
      year = {2024},
      journal = {Transactions on Machine Learning Research},
      issn = {2835--8856},
      url = {https://openreview.net/forum?id=5g5zFVj33K},
      arxiv = {https://arxiv.org/abs/2212.04922}
    }
    
  5. R. Hu, D. Sejdinovic, and R. J. Evans, A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment, Journal of Machine Learning Research, vol. 25, no. 160, 1–56, 2024.

    Causal inference grows increasingly complex as the dimension of confounders increases. Given treatments X, outcomes Y , and measured confounders Z, we develop a non-parametric method to test the do-null hypothesis that, after an intervention on X, there is no marginal dependence of Y on X, against the general alternative. Building on the Hilbert-Schmidt Independence Criterion (HSIC) for marginal independence testing, we propose backdoor-HSIC (bd-HSIC), an importance weighted HSIC which combines density ratio estimation with kernel methods. Experiments on simulated data verify the correct size and that the estimator has power for both binary and continuous treatments under a large number of confounding variables. Additionally, we establish convergence properties of the estimators of covariance operators used in bd-HSIC. We investigate the advantages and disadvantages of bd-HSIC against parametric tests as well as the importance of using the do-null testing in contrast to marginal or conditional independence testing. A complete implementation can be found at https://github.com/MrHuff/kgformula.

    @article{HuSejEva2024,
      title = {{A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment}},
      author = {Hu, Robert and Sejdinovic, Dino and Evans, Robin J.},
      year = {2024},
      journal = {Journal of Machine Learning Research},
      volume = {25},
      number = {160},
      pages = {1--56},
      url = {https://jmlr.org/papers/v25/21-1409.html},
      arxiv = {https://arxiv.org/abs/2111.13226}
    }
    
  6. E. Shimizu, K. Fukumizu, and D. Sejdinovic, Neural-Kernel Conditional Mean Embeddings, in International Conference on Machine Learning (ICML), 2024, PMLR 235:45040–45059.

    Kernel conditional mean embeddings (CMEs) offer a powerful framework for representing conditional distribution, but they often face scalability and expressiveness challenges. In this work, we propose a new method that effectively combines the strengths of deep learning with CMEs in order to address these challenges. Specifically, our approach leverages the end-to-end neural network (NN) optimization framework using a kernel-based objective. This design circumvents the computationally expensive Gram matrix inversion required by current CME methods. To further enhance performance, we provide efficient strategies to optimize the remaining kernel hyperparameters. In conditional density estimation tasks, our NN-CME hybrid achieves competitive performance and often surpasses existing deep learning-based methods. Lastly, we showcase its remarkable versatility by seamlessly integrating it into reinforcement learning (RL) contexts. Building on Q-learning, our approach naturally leads to a new variant of distributional RL methods, which demonstrates consistent effectiveness across different environments.

    @inproceedings{ShiFukSej2024,
      title = {{Neural-Kernel Conditional Mean Embeddings}},
      author = {Shimizu, Eiki and Fukumizu, Kenji and Sejdinovic, Dino},
      year = {2024},
      booktitle = {International Conference on Machine Learning (ICML)},
      url = {https://proceedings.mlr.press/v235/shimizu24a.html},
      pages = {PMLR 235:45040--45059},
      arxiv = {https://arxiv.org/abs/2403.10859}
    }
    
  7. S. Bouabid, D. Watson-Parris, S. Stefanovic, A. Nenes, and D. Sejdinovic, Aerosol optical depth disaggregation: toward global aerosol vertical profiles, Environmental Data Science, vol. 3, e16, 2024.

    Aerosol-cloud interactions constitute the largest source of uncertainty in assessments of the anthropogenic climate change. This uncertainty arises in part from the difficulty in measuring the vertical distributions of aerosols, and only sporadic vertically resolved observations are available. We often have to settle for less informative vertically aggregated proxies such as aerosol optical depth (AOD). In this work, we develop a framework for the vertical disaggregation of AOD into extinction profiles, i.e. the measure of light extinction throughout an atmospheric column, using readily available vertically resolved meteorological predictors such as temperature, pressure or relative humidity. Using Bayesian nonparametric modelling, we devise a simple Gaussian process prior over aerosol vertical profiles and update it with AOD observations to infer a distribution over vertical extinction profiles. To validate our approach, we use ECHAM-HAM aerosol-climate model data which offers self-consistent simulations of meteorological covariates, AOD and extinction profiles. Our results show that, while very simple, our model is able to reconstruct realistic extinction profiles with well-calibrated uncertainty, outperforming by an order of magnitude the idealized baseline which is typically used in satellite AOD retrieval algorithms. In particular, the model demonstrates a faithful reconstruction of extinction patterns arising from aerosol water uptake in the boundary layer. Observations however suggest that other extinction patterns, due to aerosol mass concentration, particle size and radiative properties, might be more challenging to capture and require additional vertically resolved predictors.

    @article{BouWatSteNenSej2024,
      title = {{Aerosol optical depth disaggregation: toward global aerosol vertical profiles}},
      author = {Bouabid, Shahine and Watson-Parris, Duncan and Stefanovic, Sofija and Nenes, Athanasios and Sejdinovic, Dino},
      year = {2024},
      volume = {3},
      doi = {10.1017/eds.2024.15},
      url = {https://doi.org/10.1017/eds.2024.15},
      journal = {Environmental Data Science},
      pages = {e16},
      arxiv = {https://arxiv.org/abs/2205.04296}
    }
    
  8. S. Bouabid, D. Sejdinovic, and D. Watson-Parris, FaIRGP: A Bayesian Energy Balance Model for Surface Temperatures Emulation, Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, 2024.

    Emulators, or reduced complexity climate models, are surrogate Earth system models that produce projections of key climate quantities with minimal computational resources. Using time-series modeling or more advanced machine learning techniques, data-driven emulators have emerged as a promising avenue of research, producing spatially resolved climate responses that are visually indistinguishable from state-of-the-art Earth system models. Yet, their lack of physical interpretability limits their wider adoption. In this work, we introduce FaIRGP, a data-driven emulator that satisfies the physical temperature response equations of an energy balance model. The result is an emulator that (i) enjoys the flexibility of statistical machine learning models and can learn from observations, and (ii) has a robust physical grounding with interpretable parameters that can be used to make inference about the climate system. Further, our Bayesian approach allows a principled and mathematically tractable uncertainty quantification. Our model demonstrates skillful emulation of global mean surface temperature and spatial surface temperatures across realistic future scenarios. Its ability to learn from data allows it to outperform energy balance models, while its robust physical foundation safeguards against the pitfalls of purely data-driven models. We also illustrate how FaIRGP can be used to obtain estimates of top-of-atmosphere radiative forcing and discuss the benefits of its mathematical tractability for applications such as detection and attribution or precipitation emulation. We hope that this work will contribute to widening the adoption of data-driven methods in climate emulation.

    @article{BouSejWat2024,
      title = {{FaIRGP: A Bayesian Energy Balance Model for Surface Temperatures Emulation}},
      author = {Bouabid, Shahine and Sejdinovic, Dino and Watson-Parris, Duncan},
      year = {2024},
      journal = {Journal of Advances in Modeling Earth Systems},
      volume = {16},
      number = {6},
      url = {https://doi.org/10.1029/2023MS003926},
      doi = {10.1029/2023MS003926},
      arxiv = {https://arxiv.org/abs/2307.10052}
    }
    
  9. R. Tsuchida, C. S. Ong, and D. Sejdinovic, Exact, Fast and Expressive Poisson Point Processes via Squared Neural Families, in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, vol. 38, no. 18, 20559–20566.

    We introduce squared neural Poisson point processes (SNEPPPs) by parameterising the intensity function by the squared norm of a two layer neural network. When the hidden layer is fixed and the second layer has a single neuron, our approach resembles previous uses of squared Gaussian process or kernel methods, but allowing the hidden layer to be learnt allows for additional flexibility. In many cases of interest, the integrated intensity function admits a closed form in quadratic time in the number of hidden neurons. We enumerate a far more extensive number of such cases than has previously been discussed. Our approach is more memory and time efficient than naive implementations of squared or exponentiated kernel methods or Gaussian processes. Maximum likelihood and maximum a posteriori estimates in a reparameterisation of the final layer of the intensity function can be obtained by solving a (strongly) convex optimisation problem using projected gradient descent. We demonstrate SNEPPPs on real, and synthetic benchmarks, and provide a software implementation.

    @inproceedings{TsuOngSej2024,
      author = {Tsuchida, Russell and Ong, Cheng Soon and Sejdinovic, Dino},
      title = {{{Exact, Fast and Expressive Poisson Point Processes via Squared Neural Families}}},
      booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
      volume = {38},
      number = {18},
      pages = {20559--20566},
      arxiv = {https://arxiv.org/abs/2402.09608},
      url = {https://doi.org/10.1609/aaai.v38i18.30041},
      doi = {10.1609/aaai.v38i18.30041},
      year = {2024}
    }
    
  10. D. L. Craig, H. Moon, F. Fedele, D. T. Lennon, B. V. Straaten, F. Vigneau, L. C. Camenzind, D. M. Zumbuhl, G. A. D. Briggs, M. A. Osborne, D. Sejdinovic, and N. Ares, Bridging the Reality Gap in Quantum Devices with Physics-Aware Machine Learning, Physical Review X, vol. 14, no. 1, 011001, 2024.

    The discrepancies between reality and simulation impede the optimisation and scalability of solid-state quantum devices. Disorder induced by the unpredictable distribution of material defects is one of the major contributions to the reality gap. We bridge this gap using physics-aware machine learning, in particular, using an approach combining a physical model, deep learning, Gaussian random field, and Bayesian inference. This approach has enabled us to infer the disorder potential of a nanoscale electronic device from electron transport data. This inference is validated by verifying the algorithm’s predictions about the gate voltage values required for a laterally-defined quantum dot device in AlGaAs/GaAs to produce current features corresponding to a double quantum dot regime.

    @article{Craigetal2023,
      title = {{Bridging the Reality Gap in Quantum Devices with Physics-Aware Machine Learning}},
      author = {Craig, D.L. and Moon, H. and Fedele, F. and Lennon, D.T. and Straaten, B. Van and Vigneau, F. and Camenzind, L.C. and Zumbuhl, D.M. and Briggs, G.A.D. and Osborne, M.A. and Sejdinovic, D. and Ares, N.},
      year = {2024},
      journal = {Physical Review X},
      volume = {14},
      issue = {1},
      pages = {011001},
      doi = {10.1103/PhysRevX.14.011001},
      url = {https://link.aps.org/doi/10.1103/PhysRevX.14.011001},
      arxiv = {https://arxiv.org/abs/2111.11285}
    }
    
  11. S. L. Chau, K. Muandet, and D. Sejdinovic, Explaining the Uncertain: Stochastic Shapley Values for Gaussian Process Models, in Advances in Neural Information Processing Systems (NeurIPS), 2023.

    We present a novel approach for explaining Gaussian processes (GPs) that can utilize the full analytical covariance structure present in GPs. Our method is based on the popular solution concept of Shapley values extended to stochastic cooperative games, resulting in explanations that are random variables. The GP explanations generated using our approach satisfy similar favorable axioms to standard Shapley values and possess a tractable covariance function across features and data observations. This covariance allows for quantifying explanation uncertainties and studying the statistical dependencies between explanations. We further extend our framework to the problem of predictive explanation, and propose a Shapley prior over the explanation function to predict Shapley values for new data based on previously computed ones. Our extensive illustrations demonstrate the effectiveness of the proposed approach.

    @inproceedings{ChaMuaSej2023,
      title = {{Explaining the Uncertain: Stochastic Shapley Values for Gaussian Process Models}},
      author = {Chau, Siu Lun and Muandet, Krikamol and Sejdinovic, Dino},
      year = {2023},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/9f0b1220028dfa2ee82ca0a0e0fc52d1-Abstract-Conference.html},
      arxiv = {https://arxiv.org/abs/2305.15167}
    }
    
  12. V. D. Wild, S. Ghalebikesabi, D. Sejdinovic, and J. Knoblauch, A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods, in Advances in Neural Information Processing Systems (NeurIPS), 2023.

    We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the lense of Wasserstein gradient flows. The result is a unified theory of various seemingly disconnected approaches that are commonly used for uncertainty quantification in deep learning – including deep ensembles and (variational) Bayesian methods. This offers a fresh perspective on the reasons behind the success of deep ensembles over procedures based on parameterised variational inference, and allows the derivation of new ensembling schemes with convergence guarantees. We showcase this by proposing a family of interacting deep ensembles with direct parallels to the interactions of particle systems in thermodynamics, and use our theory to prove the convergence of these algorithms to a well-defined global minimiser on the space of probability measures.

    @inproceedings{WilGhaSejKno2023,
      title = {{A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods}},
      author = {Wild, Veit David and Ghalebikesabi, Sahra and Sejdinovic, Dino and Knoblauch, Jeremias},
      year = {2023},
      url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/7d25b1db211d99d5750ec45d65fd6e4e-Abstract-Conference.html},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2305.15027}
    }
    
  13. R. Tsuchida, C. S. Ong, and D. Sejdinovic, Squared Neural Families: A New Class of Tractable Density Models, in Advances in Neural Information Processing Systems (NeurIPS), 2023.

    Flexible models for probability distributions are an essential ingredient in many machine learning tasks. We develop and investigate a new class of probability distributions, which we call a Squared Neural Family (SNEFY), formed by squaring the 2-norm of a neural network and normalising it with respect to a base measure. Following the reasoning similar to the well established connections between infinitely wide neural networks and Gaussian processes, we show that SNEFYs admit a closed form normalising constants in many cases of interest, thereby resulting in flexible yet fully tractable density models. SNEFYs strictly generalise classical exponential families, are closed under conditioning, and have tractable marginal distributions. Their utility is illustrated on a variety of density estimation and conditional density estimation tasks.

    @inproceedings{TsuOngSej2023,
      title = {{Squared Neural Families: A New Class of Tractable Density Models}},
      author = {Tsuchida, Russell and Ong, Cheng Soon and Sejdinovic, Dino},
      year = {2023},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/ea13534ee239bb3977795b8cc855bacc-Abstract-Conference.html},
      arxiv = {https://arxiv.org/abs/2305.13552},
      code = {https://github.com/RussellTsuchida/snefy}
    }
    
  14. A. Perez-Suay, P. Gordaliza, J.-M. Loubes, D. Sejdinovic, and G. Camps-Valls, Fair Kernel Regression through Cross-Covariance Operators, Transactions on Machine Learning Research, 2023.

    Ensuring fairness in machine learning models is a difficult problem from both a formulation and implementation perspective. One sensible criterion for achieving fairness is Equalised Odds, which requires that subjects in protected and unprotected groups have equal true and false positive rates. However, practical implementation is challenging. This work proposes two ways to address this issue through the conditional independence operator. First, given the output values, it is used as a fairness measure of independence between model predictions and sensitive variables. Second, it is used as a regularisation term in the problem formulation, which seeks optimal models that balance performance and fairness concerning the sensitive variables. To illustrate the potential of our approach, we consider different scenarios. First, we use the Gaussian model to provide new insights into the problem formulation and numerical results on its convergence. Second, we present the formulation using the conditional cross-covariance operator. We anticipate that a closed-form solution is possible in the general problem formulation, including in the case of a kernel formulation setting. Third, we introduce a normalised criterion of the conditional independence operator. All formulations are posed under the risk minimisation principle, which leads to theoretical results on the performance. Additionally, insights are provided into using these operators under a Gaussian Process setting. Our methods are compared to state-of-the-art methods in terms of performance and fairness metrics on a representative set of real problems. The results obtained with our proposed methodology show promising performance-fairness curves. Furthermore, we discuss the usefulness of linear weights in the fair model to describe the behaviour of the features when enforcing fairness over a particular set of input features.

    @article{perez-suay2023fair,
      title = {{Fair Kernel Regression through Cross-Covariance Operators}},
      author = {Perez-Suay, Adrian and Gordaliza, Paula and Loubes, Jean-Michel and Sejdinovic, Dino and Camps-Valls, Gustau},
      journal = {Transactions on Machine Learning Research},
      issn = {2835-8856},
      year = {2023},
      url = {https://openreview.net/forum?id=MyQ1e1VQQ3}
    }
    
  15. S. Bouabid, J. Fawkes, and D. Sejdinovic, Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge, in International Conference on Machine Learning (ICML), 2023, PMLR 202:2885–2913.

    A directed acyclic graph (DAG) provides valuable prior knowledge that is often discarded in regression tasks in machine learning. We show that the independences arising from the presence of collider structures in DAGs provide meaningful inductive biases, which constrain the regression hypothesis space and improve predictive performance. We introduce collider regression, a framework to incorporate probabilistic causal knowledge from a collider in a regression problem. When the hypothesis space is a reproducing kernel Hilbert space, we prove a strictly positive generalisation benefit under mild assumptions and provide closed-form estimators of the empirical risk minimiser. Experiments on synthetic and climate model data demonstrate performance gains of the proposed methodology.

    @inproceedings{BouFawSej2023,
      title = {{Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge}},
      author = {Bouabid, Shahine and Fawkes, Jake and Sejdinovic, Dino},
      year = {2023},
      url = {https://proceedings.mlr.press/v202/bouabid23a.html},
      booktitle = {International Conference on Machine Learning (ICML)},
      pages = {PMLR 202:2885--2913},
      arxiv = {https://arxiv.org/abs/2301.11214}
    }
    
  16. J. Schuff, D. T. Lennon, S. Geyer, D. L. Craig, F. Fedele, F. Vigneau, L. C. Camenzind, A. V. Kuhlmann, G. A. D. Briggs, D. M. Zumbühl, D. Sejdinovic, and N. Ares, Identifying Pauli Spin Blockade using Deep Learning, Quantum, vol. 7, 1077, 2023.

    Pauli spin blockade (PSB) can be employed as a great resource for spin qubit initialisation and readout even at elevated temperatures but it can be difficult to identify. We present a machine learning algorithm capable of automatically identifying PSB using charge transport measurements. The scarcity of PSB data is circumvented by training the algorithm with simulated data and by using cross-device validation. We demonstrate our approach on a silicon field-effect transistor device and report an accuracy of 96% on different test devices, giving evidence that the approach is robust to device variability. Our algorithm, an essential step for realising fully automatic qubit tuning, is expected to be employable across all types of quantum dot devices.

    @article{Schuff2023,
      title = {{Identifying Pauli Spin Blockade using Deep Learning}},
      author = {Schuff, Jonas and Lennon, Dominic T. and Geyer, Simon and Craig, David L. and Fedele, Federico and Vigneau, Florian and Camenzind, Leon C. and Kuhlmann, Andreas V. and Briggs, G. Andrew D. and Zumbühl, Dominik M. and Sejdinovic, Dino and Ares, Natalia},
      year = {2023},
      journal = {Quantum},
      issn = {2521-327X},
      volume = {7},
      pages = {1077},
      doi = {10.22331/q-2023-08-08-1077},
      url = {https://doi.org/10.22331/q-2023-08-08-1077},
      arxiv = {https://arxiv.org/abs/2202.00574}
    }
    
  17. R. Hu and D. Sejdinovic, Towards Deep Interpretable Features, Journal of Computational Mathematics and Data Science, vol. 6, 100067, 2023.

    The problem of interpretability for binary image classification is considered through the lens of kernel two-sample tests and generative modeling. A feature extraction framework coined Deep Interpretable Features is developed, which is used in combination with IntroVAE, a generative model capable of high-resolution image synthesis. Experimental results on a variety of datasets, including COVID-19 chest x-rays demonstrate the benefits of combining deep generative models with the ideas from kernel-based hypothesis testing in moving towards more robust interpretable deep generative models.

    @article{HuSej2023dif,
      title = {{Towards Deep Interpretable Features}},
      author = {Hu, Robert and Sejdinovic, Dino},
      year = {2023},
      volume = {6},
      pages = {100067},
      url = {https://doi.org/10.1016/j.jcmds.2022.100067},
      doi = {10.1016/j.jcmds.2022.100067},
      journal = {Journal of Computational Mathematics and Data Science}
    }
    
  18. Z. Li, W. Su, and D. Sejdinovic, Benign Overfitting and Noisy Features, Journal of the American Statistical Association, vol. 118, no. 544, 2876–2888, 2023.

    Modern machine learning models often exhibit the benign overfitting phenomenon, which has recently been characterized using the double descent curves. In addition to the classical U-shaped learning curve, the learning risk undergoes another descent as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which benign overfitting occurs in the random feature (RF) models, i.e., in a two-layer neural network with fixed first layer weights. Adopting a novel view of random features, we show that benign overfitting emerges because of the noise residing in such features. The noise may already exist in the data and propagates to the features, or it may be added by the user to the features directly. Such noise plays an implicit yet crucial regularization role in the phenomenon. In addition, we derive the explicit trade-off between the number of parameters and the prediction accuracy, and for the first time demonstrate that overparameterized model can achieve the optimal learning rate in the minimax sense. Finally, our results indicate that the learning risk for overparameterized models has multiple, instead of double descent behavior, which is empirically verified in recent works.

    @article{LiSuSej2023,
      author = {Li, Zhu and Su, Weijie and Sejdinovic, Dino},
      title = {{{Benign Overfitting and Noisy Features}}},
      journal = {Journal of the American Statistical Association},
      arxiv = {https://arxiv.org/abs/2008.02901},
      volume = {118},
      number = {544},
      pages = {2876-2888},
      year = {2023},
      url = {https://doi.org/10.1080/01621459.2022.2093206},
      doi = {10.1080/01621459.2022.2093206}
    }
    
  19. T. Fernandez, A. Gretton, D. Rindt, and D. Sejdinovic, A Kernel Log-Rank Test of Independence for Right-Censored Data, Journal of the American Statistical Association, vol. 118, no. 542, 925–936, 2023.

    With the incorporation of new data gathering methods in clinical research, it becomes fundamental for survival analysis techniques to deal with high-dimensional or/and non-standard covariates. In this paper we introduce a general non-parametric independence test between right-censored survival times and covariates taking values on a general (not necessarily Euclidean) space X. We show that our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure that our test is omnibus. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild-Bootstrap procedure. We perform extensive simulations demonstrating that our testing procedure generally performs better than competing approaches in detecting complex nonlinear dependence.

    @article{FerGreRinSej2021,
      author = {Fernandez, Tamara and Gretton, Arthur and Rindt, David and Sejdinovic, Dino},
      title = {{{A Kernel Log-Rank Test of Independence for Right-Censored Data}}},
      journal = {Journal of the American Statistical Association},
      arxiv = {https://arxiv.org/abs/1912.03784},
      volume = {118},
      number = {542},
      pages = {925--936},
      year = {2023},
      url = {https://doi.org/10.1080/01621459.2021.1961784},
      doi = {10.1080/01621459.2021.1961784}
    }
    
  20. R. Hu, S. L. Chau, J. F. Huertas, and D. Sejdinovic, Explaining Preferences with Shapley Values, in Advances in Neural Information Processing Systems (NeurIPS), 2022.

    While preference modelling is becoming one of the pillars of machine learning, the problem of preference explanation remains challenging and underexplored. In this paper, we propose Pref-SHAP, a Shapley value-based model explanation framework for pairwise comparison data. We derive the appropriate value functions for preference models and further extend the framework to model and explain context specific information, such as the surface type in a tennis game. To demonstrate the utility of Pref-SHAP, we apply our method to a variety of synthetic and real-world datasets and show that richer and more insightful explanations can be obtained over the baseline.

    @inproceedings{HuChaHueSej2022,
      title = {{Explaining Preferences with Shapley Values}},
      author = {Hu, Robert and Chau, Siu Lun and Huertas, Jaime Ferrando and Sejdinovic, Dino},
      year = {2022},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2205.13662},
      url = {https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1656d20067ca7c84a33785c4083a75e-Abstract-Conference.html}
    }
    
  21. V. D. Wild, R. Hu, and D. Sejdinovic, Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning, in Advances in Neural Information Processing Systems (NeurIPS), 2022.

    We develop a framework for generalized variational inference in infinite-dimensional function spaces and use it to construct a method termed Gaussian Wasserstein inference (GWI). GWI leverages the Wasserstein distance between Gaussian measures on the Hilbert space of square-integrable functions in order to determine a variational posterior using a tractable optimisation criterion and avoids pathologies arising in standard variational function space inference. An exciting application of GWI is the ability to use deep neural networks in the variational parametrisation of GWI, combining their superior predictive performance with the principled uncertainty quantification analogous to that of Gaussian processes. The proposed method obtains state-of-the-art performance on several benchmark datasets.

    @inproceedings{WilHuSej2022,
      title = {{Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning}},
      author = {Wild, Veit D. and Hu, Robert and Sejdinovic, Dino},
      year = {2022},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2205.06342},
      url = {https://proceedings.neurips.cc/paper_files/paper/2022/hash/18210aa6209b9adfc97b8c17c3741d95-Abstract-Conference.html}
    }
    
  22. R. Hu, S. L. Chau, D. Sejdinovic, and J. A. Glaunes, Giga-scale Kernel Matrix Vector Multiplication on GPU, in Advances in Neural Information Processing Systems (NeurIPS), 2022.

    Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined Faster-Fast and Free Memory Method (F3M) to address these scaling issues of KMVM for tall (108∼109) and skinny (D≤7) data. Extensive experiments demonstrate that F3M has empirical linear time and memory complexity with a relative error of order 10−3 and can compute a full KMVM for a billion points in under a minute on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, improving speed 1.5-5.5 times at the cost of <1% drop in accuracy. We further demonstrate competitive results on Gaussian Process regression coupled with significant speedups on a variety of real-world datasets.

    @inproceedings{HuSejGla2022,
      title = {{Giga-scale Kernel Matrix Vector Multiplication on GPU}},
      author = {Hu, Robert and Chau, Siu Lun and Sejdinovic, Dino and Glaunes, Joan Alexis},
      year = {2022},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2202.01085},
      url = {https://proceedings.neurips.cc/paper_files/paper/2022/hash/3b1f32693e9fe15c949a0742bf226803-Abstract-Conference.html}
    }
    
  23. S. L. Chau, R. Hu, J. Gonzalez, and D. Sejdinovic, RKHS-SHAP: Shapley Values for Kernel Methods, in Advances in Neural Information Processing Systems (NeurIPS), 2022.

    Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values (SV), a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing SVs from a functional perspective, we propose RKHS-SHAP, an attribution method for kernel machines that can efficiently compute both Interventional and Observational Shapley values using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for consistent model interpretation. Further, we propose Shapley regulariser, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature’s contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the SVs of sensitive features.

    @inproceedings{ChaHuGonSej2022,
      title = {{RKHS-SHAP: Shapley Values for Kernel Methods}},
      author = {Chau, Siu Lun and Hu, Robert and Gonzalez, Javier and Sejdinovic, Dino},
      year = {2022},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2110.09167},
      url = {https://proceedings.neurips.cc/paper_files/paper/2022/hash/54bb63eaec676b87a2278a22b1bd02a2-Abstract-Conference.html}
    }
    
  24. Z. Li, A. Perez-Suay, G. Camps-Valls, and D. Sejdinovic, Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness, Pattern Recognition, vol. 132, 108922, 2022.

    Current adoption of machine learning in industrial, societal and economical activities has raised concerns about the fairness, equity and ethics of automated decisions. Predictive models are often developed using biased datasets and thus retain or even exacerbate biases in their decisions and recommendations. Removing the sensitive covariates, such as gender or race, is insufficient to remedy this issue since the biases may be retained due to other related covariates. We present a regularization approach to this problem that trades off predictive accuracy of the learned models (with respect to biased labels) for the fairness in terms of statistical parity, i.e. independence of the decisions from the sensitive covariates. In particular, we consider a general framework of regularized empirical risk minimization over reproducing kernel Hilbert spaces and impose an additional regularizer of dependence between predictors and sensitive covariates using kernel-based measures of dependence, namely the Hilbert-Schmidt Independence Criterion (HSIC) and its normalized version. This approach leads to a closed-form solution in the case of squared loss, i.e. ridge regression. Moreover, we show that the dependence regularizer has an interpretation as modifying the corresponding Gaussian process (GP) prior. As a consequence, a GP model with a prior that encourages fairness to sensitive variables can be derived, allowing principled hyperparameter selection and studying of the relative relevance of covariates under fairness constraints. Experimental results in synthetic examples and in real problems of income and crime prediction illustrate the potential of the approach to improve fairness of automated decisions.

    @article{LiPerCamSej2022,
      author = {Li, Zhu and Perez-Suay, Adrian and Camps-Valls, Gustau and Sejdinovic, Dino},
      title = {{{Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness}}},
      journal = {Pattern Recognition},
      arxiv = {https://arxiv.org/abs/1911.04322},
      year = {2022},
      volume = {132},
      pages = {108922},
      url = {https://doi.org/10.1016/j.patcog.2022.108922},
      doi = {10.1016/j.patcog.2022.108922}
    }
    
  25. S. L. Chau, M. Cucuringu, and D. Sejdinovic, Spectral Ranking with Covariates, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2022.

    We consider spectral approaches to the problem of ranking n players given their incomplete and noisy pairwise comparisons, but revisit this classical problem in light of player covariate information. We propose three spectral ranking methods that incorporate player covariates and are based on seriation, low-rank structure assumption and canonical correlation, respectively. Extensive numerical simulations on both synthetic and real-world data sets demonstrated that our proposed methods compare favorably to existing state-of-the-art covariate-based ranking algorithms.

    @inproceedings{ChaCucSej2022,
      title = {{{Spectral Ranking with Covariates}}},
      author = {Chau, Siu Lun and Cucuringu, Mihai and Sejdinovic, Dino},
      booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
      arxiv = {https://arxiv.org/abs/arXiv:2005.04035},
      url = {https://doi.org/10.1007/978-3-031-26419-1_5},
      doi = {10.1007/978-3-031-26419-1_5},
      year = {2022}
    }
    
  26. J. Cortés-Andrés, G. Camps-Valls, S. Sippel, E. M. Székely, D. Sejdinovic, E. Díaz, A. Pérez-Suay, Z. Li, M. D. Mahecha, and M. Reichstein, Physics-Aware Nonparametric Regression Models for Earth Data Analysis, Environmental Research Letters, vol. 17, no. 5, 054034, 2022.

    Process understanding and modeling is at the core of scientific reasoning. Principled parametric and mechanistic modeling dominated science and engineering until the recent emergence of machine learning. Despite great success in many areas, machine learning algorithms in the Earth and climate sciences, and more broadly in physical sciences, are not explicitly designed to be physically-consistent and may, therefore, violate the most basic laws of physics. In this work, motivated by the field of algorithmic fairness, we reconcile data-driven machine learning with physics modeling by illustrating a nonparametric and nonlinear physics-aware regression method. By incorporating a dependence-based regularizer, the method leads to models that are consistent with domain knowledge, as reflected by either simulations from physical models or ancillary data. The idea can conversely encourage independence of model predictions with other variables that are known to be uncertain either in their representation or magnitude. The method is computationally efficient and comes with a closed-form analytic solution. Through a consistency-vs-accuracy path diagram, one can assess the consistency between data-driven models and physical models. We demonstrate in three examples on simulations and measurement data in Earth and climate studies that the proposed machine learning framework allows us to trade-off physical consistency and accuracy.

    @article{Cortesetal2022,
      author = {Cortés-Andrés, Jordi and Camps-Valls, Gustau and Sippel, Sebastian and Székely, Eniko Melinda and Sejdinovic, Dino and Díaz, Emiliano and Pérez-Suay, Adrián and Li, Zhu and Mahecha, Miguel D. and Reichstein, Markus},
      title = {Physics-Aware Nonparametric Regression Models for Earth Data Analysis},
      journal = {Environmental Research Letters},
      url = {http://iopscience.iop.org/article/10.1088/1748-9326/ac6762},
      doi = {10.1088/1748-9326/ac6762},
      year = {2022},
      volume = {17},
      number = {5},
      pages = {054034}
    }
    
  27. Q. Zhang, V. Wild, S. Filippi, S. Flaxman, and D. Sejdinovic, Bayesian Kernel Two-Sample Testing, Journal of Computational and Graphical Statistics, vol. 31, no. 4, 1164–1176, 2022.

    In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach.

    @article{ZhaFilFlaSej2020,
      title = {{{Bayesian Kernel Two-Sample Testing}}},
      author = {Zhang, Q. and Wild, V. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
      journal = {Journal of Computational and Graphical Statistics},
      arxiv = {https://arxiv.org/abs/arXiv:2002.05550},
      volume = {31},
      number = {4},
      pages = {1164--1176},
      year = {2022},
      url = {https://doi.org/10.1080/10618600.2022.2067547},
      doi = {10.1080/10618600.2022.2067547}
    }
    
  28. A. Schrab, W. Jitkrittum, Z. Szabo, D. Sejdinovic, and A. Gretton, Discussion of ‘Multiscale Fisher’s Independence Test for Multivariate Dependence,’ Biometrika, vol. 109, no. 3, 597–603, 2022.

    We discuss how MultiFIT, the Multiscale Fisher’s Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC). We highlight the fact that the levels of the kernel tests at any finite sample size can be controlled exactly, as it is the case with the level of MultiFIT. In our experiments, we observe some of the performance limitations of MultiFIT in terms of test power.

    @article{SchJitSzaSejGre2022,
      author = {Schrab, Antonin and Jitkrittum, Wittawat and Szabo, Zoltan and Sejdinovic, Dino and Gretton, Arthur},
      title = {{Discussion of `Multiscale Fisher’s Independence Test for Multivariate Dependence'}},
      journal = {Biometrika},
      volume = {109},
      number = {3},
      pages = {597--603},
      url = {https://doi.org/10.1093/biomet/asac028},
      doi = {10.1093/biomet/asac028},
      arxiv = {https://arxiv.org/abs/2206.11142},
      year = {2022}
    }
    
  29. D. Rindt, R. Hu, D. Steinsaltz, and D. Sejdinovic, Survival Regression with Proper Scoring Rules and Monotonic Neural Networks, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2022, PMLR 151:1190–1205.

    We consider frequently used scoring rules for right-censored survival regression models such as time-dependent concordance, survival-CRPS, integrated Brier score and integrated binomial log-likelihood, and prove that neither of them is a proper scoring rule. This means that the true survival distribution may be scored worse than incorrect distributions, leading to inaccurate estimation. We prove, in contrast to these scores, that the right-censored log-likelihood is a proper scoring rule, i.e. the highest expected score is achieved by the true distribution. Despite this, modern feed-forward neural-network-based survival regression models are unable to train and validate directly on right-censored log-likelihood, due to its intractability, and resort to the aforementioned alternatives, i.e. non-proper scoring rules. We therefore propose a simple novel survival regression method capable of directly optimizing log-likelihood using a monotonic restriction on the time-dependent weights, coined SurvivalMonotonic-net (SuMo-net). SuMo-net achieves state-of-the-art log-likelihood scores across several datasets with 20–100x computational speedup on inference over existing state-of-the-art neural methods and is readily applicable to datasets with several million observations.

    @inproceedings{RinHuSteSej2022,
      title = {Survival Regression with Proper Scoring Rules and Monotonic Neural Networks},
      author = {Rindt, David and Hu, Robert and Steinsaltz, David and Sejdinovic, Dino},
      year = {2022},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
      arxiv = {https://arxiv.org/abs/2103.14755},
      url = {https://proceedings.mlr.press/v151/rindt22a.html},
      pages = {PMLR 151:1190--1205}
    }
    
  30. S. L. Chau, J. Gonzalez, and D. Sejdinovic, Learning Inconsistent Preferences with Gaussian Processes, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2022, PMLR 151:2266–2281.

    We revisit widely used preferential Gaussian processes (PGP) by Chu and Ghahramani [2005] and challenge their modelling assumption that imposes rankability of data items via latent utility function values. We propose a generalisation of PGP which can capture more expressive latent preferential structures in the data and thus be used to model inconsistent preferences, i.e. where transitivity is violated, or to discover clusters of comparable items via spectral decomposition of the learned preference functions. We also consider the properties of associated covariance kernel functions and its reproducing kernel Hilbert Space (RKHS), giving a simple construction that satisfies universality in the space of preference functions. Finally, we provide an extensive set of numerical experiments on simulated and real-world datasets showcasing the competitiveness of our proposed method with state-of-the-art. Our experimental findings support the conjecture that violations of rankability are ubiquitous in real-world preferential data.

    @inproceedings{ChaGonSej2022,
      author = {Chau, Siu Lun and Gonzalez, Javier and Sejdinovic, Dino},
      title = {{{Learning Inconsistent Preferences with Gaussian Processes}}},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
      arxiv = {https://arxiv.org/abs/2006.03847},
      year = {2022},
      url = {https://proceedings.mlr.press/v151/lun-chau22a.html},
      pages = {PMLR 151:2266--2281}
    }
    
  31. J. Fawkes, R. Evans, and D. Sejdinovic, Selection, Ignorability and Challenges with Causal Fairness, in Conference on Causal Learning and Reasoning (CLeaR), 2022, PMLR 177:275–289.

    In this paper we look at popular fairness methods that use causal counterfactuals. These methods capture the intuitive notion that a prediction is fair if it coincides with the prediction that would have been made if someone’s race, gender or religion were counterfactually different. In order to achieve this, we must have causal models that are able to capture what someone would be like if we were to counterfactually change these traits. However, we argue that any model that can do this must lie outside the particularly well behaved class that is commonly considered in the fairness literature. This is because in fairness settings, models in this class entail a particularly strong causal assumption, normally only seen in a randomised controlled trial. We argue that in general this is unlikely to hold. Furthermore, we show in many cases it can be explicitly rejected due to the fact that samples are selected from a wider population. We show this creates difficulties for counterfactual fairness as well as for the application of more general causal fairness methods.

    @inproceedings{FawEvaSej2022,
      title = {Selection, Ignorability and Challenges with Causal Fairness},
      author = {Fawkes, Jake and Evans, Robin and Sejdinovic, Dino},
      booktitle = {Conference on Causal Learning and Reasoning (CLeaR)},
      year = {2022},
      arxiv = {https://arxiv.org/abs/arXiv:2202.13774},
      url = {https://proceedings.mlr.press/v177/fawkes22a.html},
      pages = {PMLR 177:275--289}
    }
    
  32. V. C. Bradley, S. Kuriwaki, M. Isakov, D. Sejdinovic, X.-L. Meng, and S. Flaxman, Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake, Nature, no. 600, 695–700, 2021.

    Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook (about 250,000 responses per week) and Census Household Pulse (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel with about 1,000 responses per week following survey research best practices provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

    @article{BraKurIsaSejMenFla2021,
      title = {{Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake}},
      author = {Bradley, Valerie C. and Kuriwaki, Shiro and Isakov, Michael and Sejdinovic, Dino and Meng, Xiao-Li and Flaxman, Seth},
      year = {2021},
      journal = {Nature},
      number = {600},
      pages = {695--700},
      doi = {10.1038/s41586-021-04198-4},
      url = {https://doi.org/10.1038/s41586-021-04198-4},
      arxiv = {https://arxiv.org/abs/2106.05818}
    }
    
  33. S. L. Chau, S. Bouabid, and D. Sejdinovic, Deconditional Downscaling with Gaussian Processes, in Advances in Neural Information Processing Systems (NeurIPS), 2021, vol. 34, 17813–17825.

    Refining low-resolution (LR) spatial fields with high-resolution (HR) information is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an "inverse" of the conditional expectation, namely a deconditioning problem. In this work, we introduce conditional mean processes (CMP), a new class of Gaussian Processes describing conditional means. By treating CMPs as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods.

    @inproceedings{ChaBouSej2021,
      title = {{Deconditional Downscaling with Gaussian Processes}},
      author = {Chau, Siu Lun and Bouabid, Shahine and Sejdinovic, Dino},
      year = {2021},
      volume = {34},
      pages = {17813--17825},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2105.12909},
      url = {https://papers.nips.cc/paper/2021/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html}
    }
    
  34. S. L. Chau, J.-F. Ton, J. Gonzalez, Y. W. Teh, and D. Sejdinovic, BayesIMP: Uncertainty Quantification for Causal Data Fusion, in Advances in Neural Information Processing Systems (NeurIPS), 2021, vol. 34, 3466–3477.

    While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and quantity, principled uncertainty quantification becomes essential. To that end, we introduce Bayesian Interventional Mean Processes, a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the utility of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods.

    @inproceedings{ChaTonGonTehSej2021,
      title = {{BayesIMP: Uncertainty Quantification for Causal Data Fusion}},
      author = {Chau, Siu Lun and Ton, Jean-Francois and Gonzalez, Javier and Teh, Yee Whye and Sejdinovic, Dino},
      year = {2021},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/2106.03477},
      volume = {34},
      pages = {3466--3477},
      url = {https://papers.nips.cc/paper/2021/hash/1ca5c750a30312d1919ae6a4d636dcc4-Abstract.html}
    }
    
  35. R. Hu, G. K. Nicholls, and D. Sejdinovic, Large Scale Tensor Regression using Kernels and Variational Inference, Machine Learning, vol. 111, 2663–2713, 2021.

    We outline an inherent flaw of tensor factorization models when latent factors are expressed as a function of side information and propose a novel method to mitigate this. We coin our methodology kernel fried tensor (KFT) and present it as a large-scale prediction and forecasting tool for high dimensional data. Our results show superior performance against LightGBM and Field aware factorization machines (FFM), two algorithms with proven track records, widely used in large-scale prediction. We also develop a variational inference framework for KFT which enables associating the predictions and forecasts with calibrated uncertainty estimates on several datasets.

    @article{HuNicSej2021,
      title = {{{Large Scale Tensor Regression using Kernels and Variational Inference}}},
      author = {Hu, R. and Nicholls, G. K. and Sejdinovic, D.},
      journal = {Machine Learning},
      arxiv = {https://arxiv.org/abs/arXiv:2002.04704},
      year = {2021},
      volume = {111},
      pages = {2663--2713},
      url = {https://doi.org/10.1007/s10994-021-06067-7},
      doi = {10.1007/s10994-021-06067-7}
    }
    
  36. V. Nguyen, S. B. Orbell, D. T. Lennon, H. Moon, F. Vigneau, L. C. Camenzind, L. Yu, D. M. Zumbühl, G. A. D. Briggs, M. A. Osborne, D. Sejdinovic, and N. Ares, Deep Reinforcement Learning for Efficient Measurement of Quantum Devices, npj Quantum Information, vol. 7, no. 100, 2021.

    Deep reinforcement learning is an emerging machine-learning approach that can teach a computer to learn from their actions and rewards similar to the way humans learn from experience. It offers many advantages in automating decision processes to navigate large parameter spaces. This paper proposes an approach to the efficient measurement of quantum devices based on deep reinforcement learning. We focus on double quantum dot devices, demonstrating the fully automatic identification of specific transport features called bias triangles. Measurements targeting these features are difficult to automate, since bias triangles are found in otherwise featureless regions of the parameter space. Our algorithm identifies bias triangles in a mean time of <30 min, and sometimes as little as 1 min. This approach, based on dueling deep Q-networks, can be adapted to a broad range of devices and target transport features. This is a crucial demonstration of the utility of deep reinforcement learning for decision making in the measurement and operation of quantum devices.

    @article{Nguyen2021,
      title = {Deep Reinforcement Learning for Efficient Measurement of Quantum Devices},
      author = {Nguyen, V. and Orbell, S. B. and Lennon, D. T. and Moon, H. and Vigneau, F. and Camenzind, L. C. and Yu, L. and Zumbühl, D. M. and Briggs, G. A. D. and Osborne, M. A. and Sejdinovic, D. and Ares, N.},
      year = {2021},
      volume = {7},
      number = {100},
      journal = {{npj Quantum Information}},
      arxiv = {https://arxiv.org/abs/2009.14825},
      url = {https://doi.org/10.1038/s41534-021-00434-x},
      doi = {10.1038/s41534-021-00434-x}
    }
    
  37. A. Caterini, R. Cornish, D. Sejdinovic, and A. Doucet, Variational Inference with Continuously-Indexed Normalizing Flows, in Uncertainty in Artificial Intelligence (UAI), 2021, PMLR 161:44–53.

    Continuously-indexed flows (CIFs) have recently achieved improvements over baseline normalizing flows on a variety of density estimation tasks. CIFs do not possess a closed-form marginal density, and so, unlike standard flows, cannot be plugged in directly to a variational inference (VI) scheme in order to produce a more expressive family of approximate posteriors. However, we show here how CIFs can be used as part of an auxiliary VI scheme to formulate and train expressive posterior approximations in a natural way. We exploit the conditional independence structure of multi-layer CIFs to build the required auxiliary inference models, which we show empirically yield low-variance estimators of the model evidence. We then demonstrate the advantages of CIFs over baseline flows in VI problems when the posterior distribution of interest possesses a complicated topology, obtaining improved results in both the Bayesian inference and surrogate maximum likelihood settings.

    @inproceedings{CatCorSejDou2021,
      author = {Caterini, Anthony and Cornish, Rob and Sejdinovic, Dino and Doucet, Arnaud},
      title = {{{Variational Inference with Continuously-Indexed Normalizing Flows}}},
      booktitle = {Uncertainty in Artificial Intelligence (UAI)},
      year = {2021},
      arxiv = {https://arxiv.org/abs/2007.05426},
      url = {https://proceedings.mlr.press/v161/caterini21a.html},
      pages = {PMLR 161:44--53}
    }
    
  38. Z. Li, J.-F. Ton, D. Oglic, and D. Sejdinovic, Towards A Unified Analysis of Random Fourier Features, Journal of Machine Learning Research, vol. 22, no. 108, 1–51, 2021.

    Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the learning risk convergence rate is problem specific and expressed in terms of the regularization parameter and the number of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to ridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency. Our empirical results illustrate the effectiveness of the proposed scheme relative to the standard random Fourier features method.

    @article{LiTonOglSej2021,
      author = {Li, Zhu and Ton, Jean-Francois and Oglic, Dino and Sejdinovic, Dino},
      title = {{{Towards A Unified Analysis of Random Fourier Features}}},
      journal = {Journal of Machine Learning Research},
      volume = {22},
      number = {108},
      year = {2021},
      arxiv = {https://arxiv.org/abs/1806.09178},
      url = {https://jmlr.org/papers/v22/20-1369.html},
      pages = {1--51}
    }
    
  39. X. Pu, S. L. Chau, X. Dong, and D. Sejdinovic, Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint, IEEE Transactions on Signal and Information Processing over Networks, vol. 7, 192–207, 2021.

    The problem of graph learning concerns the construction of an explicit topological structure revealing the relationship between nodes representing data entities, which plays an increasingly important role in the success of many graph-based representations and algorithms in the field of machine learning and graph signal processing. In this paper, we propose a novel graph learning framework that incorporates the node-side and observation-side information, and in particular the covariates that help to explain the dependency structures in graph signals. To this end, we consider graph signals as functions in the reproducing kernel Hilbert space associated with a Kronecker product kernel, and integrate functional learning with smoothness-promoting graph learning to learn a graph representing the relationship between nodes. The functional learning increases the robustness of graph learning against missing and incomplete information in the graph signals. In addition, we develop a novel graph-based regularisation method which, when combined with the Kronecker product kernel, enables our model to capture both the dependency explained by the graph and the dependency due to graph signals observed under different but related circumstances, e.g. different points in time. The latter means the graph signals are free from the i.i.d. assumptions required by the classical graph learning models. Experiments on both synthetic and real-world data show that our methods outperform the state-of-the-art models in learning a meaningful graph topology from graph signals, in particular under heavy noise, missing values, and multiple dependency.

    @article{PuChaDonSej2021,
      author = {Pu, Xingyue and Chau, Siu Lun and Dong, Xiaowen and Sejdinovic, Dino},
      title = {{{Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint}}},
      journal = {IEEE Transactions on Signal and Information Processing over Networks},
      arxiv = {https://arxiv.org/abs/2008.10065},
      url = {https://ieeexplore.ieee.org/document/9356326},
      doi = {10.1109/TSIPN.2021.3059995},
      volume = {7},
      pages = {192--207},
      year = {2021}
    }
    
  40. D. Rindt, D. Sejdinovic, and D. Steinsaltz, Consistency of Permutation Tests of Independence using Distance Covariance, HSIC and dHSIC, Stat, vol. 10, no. 1, e364, 2021.

    The Hilbert–Schmidt independence criterion (HSIC) and its d-variable extension dHSIC are measures of (joint) dependence between random variables. While combining these statistics with a permutation test has become a popular method of testing the null hypothesis of (joint) independence, it had thus far not been proved that this results in a consistent test. In this work, we provide a simple proof that the permutation test with the test statistic HSIC or dHSIC is indeed consistent when using characteristic kernels. That is, we prove that under each alternative hypothesis, the power of these permutation tests indeed converges to 1 as the sample size converges to infinity. Since the test is consistent for each number of permutations, we further give a brief discussion of how the number of permutations relates to the power of the test and how the number of permutations may be selected in practice.

    @article{RinSejSte2021-stat,
      author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
      title = {{{Consistency of Permutation Tests of Independence using Distance Covariance, HSIC and dHSIC}}},
      journal = {Stat},
      arxiv = {https://arxiv.org/abs/2005.06573},
      url = {https://doi.org/10.1002/sta4.364},
      doi = {10.1002/sta4.364},
      volume = {10},
      number = {1},
      pages = {e364},
      year = {2021}
    }
    
  41. J.-F. Ton, L. Chan, Y. W. Teh, and D. Sejdinovic, Noise Contrastive Meta Learning for Conditional Density Estimation using Kernel Mean Embeddings, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2021, PMLR 130:1099–1107.

    Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. estimating conditional expectations in regression. In many applications, however, the conditional distributions cannot be meaningfully summarized solely by expectation (due to e.g. multimodality). We introduce a novel technique for meta-learning conditional densities, which combines neural representation and noise contrastive estimation together with well-established literature in conditional mean embeddings into reproducing kernel Hilbert spaces. The method shows significant improvements over standard density estimation methods on synthetic and real-world data, by leveraging shared representations across multiple conditional density estimation tasks.

    @inproceedings{TonChaTehSej2021,
      author = {Ton, Jean-Francois and Chan, Leung and Teh, Yee Whye and Sejdinovic, Dino},
      title = {{{Noise Contrastive Meta Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
      arxiv = {https://arxiv.org/abs/1906.02236},
      url = {http://proceedings.mlr.press/v130/ton21a.html},
      pages = {PMLR 130:1099--1107},
      video = {https://slideslive.com/38953033},
      year = {2021},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}
    }
    
  42. J.-F. Ton, D. Sejdinovic, and K. Fukumizu, Meta Learning for Causal Direction, in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 11, 9897–9905.

    The inaccessibility of controlled randomized trials due to inherent constraints in many fields of science has been a fundamental issue in causal inference. In this paper, we focus on distinguishing the cause from effect in the bivariate setting under limited observational data. Based on recent developments in meta learning as well as in causal inference, we introduce a novel generative model that allows distinguishing cause and effect in the small data setting. Using a learnt task variable that contains distributional information of each dataset, we propose an end-to-end algorithm that makes use of similar training datasets at test time. We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.

    @inproceedings{TonSejFuk2021,
      author = {Ton, Jean-Francois and Sejdinovic, Dino and Fukumizu, Kenji},
      title = {{{Meta Learning for Causal Direction}}},
      booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
      volume = {35},
      number = {11},
      pages = {9897--9905},
      arxiv = {https://arxiv.org/abs/2007.02809},
      year = {2021},
      url = {https://doi.org/10.1609/aaai.v35i11.17189},
      doi = {10.1609/aaai.v35i11.17189}
    }
    
  43. R. Hu and D. Sejdinovic, Robust Deep Interpretable Features for Binary Image Classification, in Proceedings of the Northern Lights Deep Learning Workshop, 2021, vol. 2.

    The problem of interpretability for binary image classification is considered through the lens of kernel two-sample tests and generative modeling. A feature extraction framework coined Deep Interpretable Features is developed, which is used in combination with IntroVAE, a generative model capable of high-resolution image synthesis. Experimental results on a variety of datasets, including COVID-19 chest x-rays demonstrate the benefits of combining deep generative models with the ideas from kernel-based hypothesis testing in moving towards more robust interpretable deep generative models.

    @inproceedings{HuSej2021,
      author = {Hu, Robert and Sejdinovic, Dino},
      title = {{{Robust Deep Interpretable Features for Binary Image Classification}}},
      booktitle = {Proceedings of the Northern Lights Deep Learning Workshop},
      year = {2021},
      volume = {2},
      url = {https://septentrio.uit.no/index.php/nldl/article/view/5708},
      doi = {10.7557/18.5708}
    }
    
  44. G. S. Blair, R. Bassett, L. Bastin, L. Beevers, M. I. Borrajo, M. Brown, S. L. Dance, A. Dionescu, L. Edwards, M. A. Ferrario, R. Fraser, H. Fraser, S. Gardner, P. Henrys, T. Hey, S. Homann, C. Huijbers, J. Hutchison, P. Jonathan, R. Lamb, S. Laurie, A. Leeson, D. Leslie, M. McMillan, V. Nundloll, O. Oyebamiji, J. Phillipson, V. Pope, R. Prudden, S. Reis, M. Salama, F. Samreen, D. Sejdinovic, W. Simm, R. Street, L. Thornton, R. Towe, J. V. Hey, M. Vieno, J. Waller, and J. Watkins, The Role of Digital Technologies in Responding to the Grand Challenges of the Natural Environment: The Windermere Accord, Patterns, vol. 2, no. 1, 100156, 2021.

    Summary Digital technology is having a major impact on many areas of society, and there is equal opportunity for impact on science. This is particularly true in the environmental sciences as we seek to understand the complexities of the natural environment under climate change. This perspective presents the outcomes of a summit in this area, a unique cross-disciplinary gathering bringing together environmental scientists, data scientists, computer scientists, social scientists, and representatives of the creative arts. The key output of this workshop is an agreed vision in the form of a framework and associated roadmap, captured in the Windermere Accord. This accord envisions a new kind of environmental science underpinned by unprecedented amounts of data, with technological advances leading to breakthroughs in taming uncertainty and complexity, and also supporting openness, transparency, and reproducibility in science. The perspective also includes a call to build an international community working in this important area.

    @article{Blairetal2021,
      title = {The Role of Digital Technologies in Responding to the Grand Challenges of the Natural Environment: The {W}indermere Accord},
      journal = {Patterns},
      volume = {2},
      number = {1},
      pages = {100156},
      year = {2021},
      issn = {2666-3899},
      doi = {https://doi.org/10.1016/j.patter.2020.100156},
      url = {http://www.sciencedirect.com/science/article/pii/S266638992030204X},
      author = {Blair, Gordon S. and Bassett, Richard and Bastin, Lucy and Beevers, Lindsay and Borrajo, Maribel Isabel and Brown, Mike and Dance, Sarah L. and Dionescu, Ada and Edwards, Liz and Ferrario, Maria Angela and Fraser, Rob and Fraser, Harriet and Gardner, Simon and Henrys, Peter and Hey, Tony and Homann, Stuart and Huijbers, Chantal and Hutchison, James and Jonathan, Phil and Lamb, Rob and Laurie, Sophie and Leeson, Amber and Leslie, David and McMillan, Malcolm and Nundloll, Vatsala and Oyebamiji, Oluwole and Phillipson, Jordan and Pope, Vicky and Prudden, Rachel and Reis, Stefan and Salama, Maria and Samreen, Faiza and Sejdinovic, Dino and Simm, Will and Street, Roger and Thornton, Lauren and Towe, Ross and Hey, Joshua Vande and Vieno, Massimo and Waller, Joanne and Watkins, John},
      keywords = {digital technologies, digital environment, data science, environmental science}
    }
    
  45. D. Rindt, D. Sejdinovic, and D. Steinsaltz, A kernel and optimal transport based test of independence between covariates and right-censored lifetimes, International Journal of Biostatistics, vol. 17, no. 2, 331–348, 2021.

    We propose a nonparametric test of independence, termed optHSIC, between a covariate and a right-censored lifetime. Because the presence of censoring creates a challenge in applying the standard permutation-based testing approaches, we use optimal transport to transform the censored dataset into an uncensored one, while preserving the relevant dependencies. We then apply a permutation test using the kernel-based dependence measure as a statistic to the transformed dataset. The type 1 error is proven to be correct in the case where censoring is independent of the covariate. Experiments indicate that optHSIC has power against a much wider class of alternatives than Cox proportional hazards regression and that it has the correct type 1 control even in the challenging cases where censoring strongly depends on the covariate.

    @article{RinSejSte2021-ijb,
      author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
      title = {A kernel and optimal transport based test of independence between covariates and right-censored lifetimes},
      journal = {International Journal of Biostatistics},
      arxiv = {https://arxiv.org/abs/1906.03866},
      volume = {17},
      number = {2},
      year = {2021},
      pages = {331--348},
      url = {https://doi.org/10.1515/ijb-2020-0022},
      doi = {10.1515/ijb-2020-0022}
    }
    
  46. N. M. van Esbroeck, D. T. Lennon, H. Moon, V. Nguyen, F. Vigneau, L. C. Camenzind, L. Yu, D. Zumbuehl, G. A. D. Briggs, D. Sejdinovic, and N. Ares, Quantum device fine-tuning using unsupervised embedding learning, New Journal of Physics, vol. 22, no. 9, 095003, 2020.

    Quantum devices with a large number of gate electrodes allow for precise control of device parameters. This capability is hard to fully exploit due to the complex dependence of these parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device within approximately 40 min.

    @article{Esbroecketal2020,
      author = {van Esbroeck, Nina M. and Lennon, Dominic T. and Moon, Hyungil and Nguyen, Vu and Vigneau, Florian and Camenzind, Leon C. and Yu, Liuqi and Zumbuehl, Dominik and Briggs, G. Andrew D. and Sejdinovic, Dino and Ares, Natalia},
      title = {Quantum device fine-tuning using unsupervised embedding learning},
      journal = {New Journal of Physics},
      url = {https://iopscience.iop.org/article/10.1088/1367-2630/abb64c},
      year = {2020},
      arxiv = {https://arxiv.org/abs/arXiv:2001.04409},
      doi = {10.1088/1367-2630/abb64c},
      volume = {22},
      number = {9},
      pages = {095003}
    }
    
  47. H. Moon, D. T. Lennon, J. Kirkpatrick, N. M. van Esbroeck, L. C. Camenzind, L. Yu, F. Vigneau, D. M. Zumbühl, G. A. D. Briggs, M. A. Osborne, D. Sejdinovic, E. A. Laird, and N. Ares, Machine learning enables completely automatic tuning of a quantum device faster than human experts, Nature Communications, vol. 11, no. 4161, 2020.

    Variability is a problem for the scalability of semiconductor quantum devices. The parameter space is large, and the operating range is small. Our statistical tuning algorithm searches for specific electron transport features in gate-defined quantum dot devices with a gate voltage space of up to eight dimensions. Starting from the full range of each gate voltage, our machine learning algorithm can tune each device to optimal performance in a median time of under 70 minutes. This performance surpassed our best human benchmark (although both human and machine performance can be improved). The algorithm is approximately 180 times faster than an automated random search of the parameter space, and is suitable for different material systems and device architectures. Our results yield a quantitative measurement of device variability, from one device to another and after thermal cycling. Our machine learning algorithm can be extended to higher dimensions and other technologies.

    @article{Moonetal2020,
      title = {Machine learning enables completely automatic tuning of a quantum device faster than human experts},
      author = {Moon, H. and Lennon, D. T. and Kirkpatrick, J. and van Esbroeck, N. M. and Camenzind, L. C. and Yu, Liuqi and Vigneau, F. and Zumb\"uhl, D. M. and Briggs, G. A. D. and Osborne, M. A and Sejdinovic, D. and Laird, E. A. and Ares, N.},
      journal = {Nature Communications},
      volume = {11},
      number = {4161},
      arxiv = {https://arxiv.org/abs/2001.02589},
      year = {2020},
      doi = {10.1038/s41467-020-17835-9},
      url = {https://doi.org/10.1038/s41467-020-17835-9},
      code = {https://doi.org/10.5281/zenodo.3966318}
    }
    
  48. T. G. J. Rudner, D. Sejdinovic, and Y. Gal, Inter-domain Deep Gaussian Processes, in International Conference on Machine Learning (ICML), 2020, PMLR 119:8286–8294.

    Inter-domain Gaussian processes (GPs) allow for high flexibility and low computational cost when performing approximate inference in GP models. They are particularly suitable for modeling data exhibiting global function behavior but are limited to stationary covariance functions and thus fail to model non-stationary data effectively. We propose Inter-domain Deep Gaussian Processes with RKHS Fourier Features, an extension of shallow inter-domain GPs that combines the advantages of inter-domain and deep Gaussian processes (DGPs) and demonstrate how to leverage existing approximate inference approaches to perform simple and scalable approximate inference on Inter-domain Deep Gaussian Processes. We assess the performance of our method on a wide range of prediction problems and demonstrate that it outperforms inter-domain GPs and DGPs on challenging large-scale and high-dimensional real-world datasets exhibiting both global behavior as well as a high-degree of non-stationarity.

    @inproceedings{RudSejGal2020,
      author = {Rudner, T.G.J. and Sejdinovic, D. and Gal, Y.},
      title = {{{Inter-domain Deep Gaussian Processes}}},
      booktitle = {International Conference on Machine Learning (ICML)},
      pages = {PMLR 119:8286--8294},
      year = {2020},
      arxiv = {https://arxiv.org/abs/2011.00415},
      video = {https://slideslive.com/38928448},
      url = {http://proceedings.mlr.press/v119/rudner20a.html}
    }
    
  49. D. Sejdinovic, Discussion of ‘Functional models for time-varying random objects’ by Dubey and Müller, Journal of the Royal Statistical Society: Series B, vol. 82, no. 2, 312–313, 2020.

    The discussion focuses on metric covariance, a new association measure between paired random objects in a metric space, developed by Dubey and Müller, and on its relationship with other similar concepts which have previously appeared in the literature, including distance covariance by Székely et al, as well as its generalisations which rely on the formalism of reproducing kernel Hilbert spaces (RKHS).

    @article{Sej2020-discussion,
      author = {Sejdinovic, Dino},
      title = {{{Discussion of `Functional models for time-varying random objects' by Dubey and M\"uller}}},
      journal = {Journal of the Royal Statistical Society: Series B},
      volume = {82},
      number = {2},
      pages = {312--313},
      arxiv = {https://arxiv.org/abs/2001.03267},
      year = {2020}
    }
    
  50. J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic, Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets, Science Advances, vol. 5, no. 11, 2019.

    Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields.

    @article{RunNowKreFlaSej2019,
      author = {Runge, Jakob and Nowack, Peer and Kretschmer, Marlene and Flaxman, Seth and Sejdinovic, Dino},
      title = {{{Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets}}},
      journal = {Science Advances},
      volume = {5},
      number = {11},
      doi = {10.1126/sciadv.aau4996},
      url = {https://advances.sciencemag.org/content/5/11/eaau4996},
      arxiv = {https://arxiv.org/abs/1702.07007},
      code = {https://github.com/jakobrunge/tigramite},
      year = {2019}
    }
    
  51. H. C. L. Law, P. Zhao, L. Chan, J. Huang, and D. Sejdinovic, Hyperparameter Learning via Distributional Transfer, in Advances in Neural Information Processing Systems (NeurIPS), 2019, vol. 32, 6804–6815.

    Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.

    @inproceedings{LawZhaChaHuaSej2019,
      author = {Law, Ho Chung Leon and Zhao, Peilin and Chan, Leung and Huang, Junzhou and Sejdinovic, Dino},
      title = {{{Hyperparameter Learning via Distributional Transfer}}},
      arxiv = {https://arxiv.org/abs/1810.06305},
      year = {2019},
      url = {https://papers.nips.cc/paper/8905-hyperparameter-learning-via-distributional-transfer},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {32},
      pages = {6804--6815}
    }
    
  52. A. Raj, H. C. L. Law, D. Sejdinovic, and M. Park, A Differentially Private Kernel Two-Sample Test, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2019, vol. 11906, 697–724.

    Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.

    @inproceedings{RajLawSejPar2019,
      author = {Raj, Anant and Law, Ho Chung Leon and Sejdinovic, Dino and Park, Mijung},
      title = {{{A Differentially Private Kernel Two-Sample Test}}},
      booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
      arxiv = {https://arxiv.org/abs/1808.00380},
      url = {https://doi.org/10.1007/978-3-030-46150-8_41},
      doi = {10.1007/978-3-030-46150-8_41},
      series = {Lecture Notes in Computer Science},
      volume = {11906},
      pages = {697--724},
      year = {2019}
    }
    
  53. Z. Li, J.-F. Ton, D. Oglic, and D. Sejdinovic, Towards A Unified Analysis of Random Fourier Features, in International Conference on Machine Learning (ICML), 2019, PMLR 97:3905–3914.

    Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the expected risk convergence rate is problem specific and expressed in terms of the regularization parameter and the number of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to ridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

    @inproceedings{LiTonOglSej2019,
      author = {Li, Z. and Ton, J.-F. and Oglic, D. and Sejdinovic, D.},
      title = {{{Towards A Unified Analysis of Random Fourier Features}}},
      booktitle = {International Conference on Machine Learning (ICML)},
      pages = {PMLR 97:3905--3914},
      arxiv = {https://arxiv.org/abs/1806.09178},
      url = {http://proceedings.mlr.press/v97/li19k.html},
      year = {2019}
    }
    
  54. G. Camps-Valls, D. Sejdinovic, J. Runge, and M. Reichstein, A Perspective on Gaussian Processes for Earth Observation, National Science Review, vol. 6, no. 4, 616–618, 2019.

    Earth observation (EO) by airborne and satellite remote sensing and in-situ observations play a fundamental role in monitoring our planet. In the last decade, machine learning and Gaussian processes (GPs) in particular has attained outstanding results in the estimation of bio-geo-physical variables from the acquired images at local and global scales in a time-resolved manner. GPs provide not only accurate estimates but also principled uncertainty estimates for the predictions, can easily accommodate multimodal data coming from different sensors and from multitemporal acquisitions, allow the introduction of physical knowledge, and a formal treatment of uncertainty quantification and error propagation. Despite great advances in forward and inverse modelling, GP models still have to face important challenges that are revised in this perspective paper. GP models should evolve towards data-driven physics-aware models that respect signal characteristics, be consistent with elementary laws of physics, and move from pure regression to observational causal inference.

    @article{CamSejRunRei2019,
      author = {Camps-Valls, G. and Sejdinovic, D. and Runge, J. and Reichstein, M.},
      title = {{{A Perspective on Gaussian Processes for Earth Observation}}},
      journal = {National Science Review},
      arxiv = {https://arxiv.org/abs/2007.01238},
      volume = {6},
      number = {4},
      pages = {616--618},
      doi = {10.1093/nsr/nwz028},
      year = {2019},
      url = {https://doi.org/10.1093/nsr/nwz028}
    }
    
  55. F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder), Statistical Science, vol. 34, no. 1, 1–22; rejoinder: 38–42, 2019.

    A research frontier has emerged in scientific computation, wherein discretisation error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow, in order to assess the impact of discretisation error on the computer output. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the fact that the integrand has been discretised. Our main technical contribution is to establish, for the first time, rates of posterior contraction for one such method. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir.

    @article{BriOatGirOsbSej2019,
      author = {Briol, F.-X. and Oates, C.J. and Girolami, M. and Osborne, M.A. and Sejdinovic, D.},
      title = {{{Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder)}}},
      journal = {Statistical Science},
      arxiv = {http://arxiv.org/abs/1512.00933},
      volume = {34},
      number = {1},
      pages = {1--22; rejoinder: 38--42},
      year = {2019},
      url = {https://projecteuclid.org/euclid.ss/1555056025},
      doi = {10.1214/18-STS660}
    }
    
  56. A. L. Caterini, A. Doucet, and D. Sejdinovic, Hamiltonian Variational Auto-Encoder, in Advances in Neural Information Processing Systems (NeurIPS), 2018, vol. 31, 8167–8177.

    Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain low-variance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this, the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS), we obtain a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational Auto-Encoder (HVAE). This method can be reinterpreted as a target-informed normalizing flow which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration.

    @inproceedings{CatDouSej2018,
      author = {Caterini, A.L. and Doucet, A. and Sejdinovic, D.},
      title = {{{Hamiltonian Variational Auto-Encoder}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      url = {https://papers.nips.cc/paper/8039-hamiltonian-variational-auto-encoder},
      pages = {8167--8177},
      arxiv = {https://arxiv.org/abs/1805.11328},
      video = {https://www.youtube.com/watch?v=MD1CFKTu9U4},
      year = {2018},
      volume = {31}
    }
    
  57. H. C. L. Law, D. Sejdinovic, E. Cameron, T. C. D. Lucas, S. Flaxman, K. Battle, and K. Fukumizu, Variational Learning on Aggregate Outputs with Gaussian Processes, in Advances in Neural Information Processing Systems (NeurIPS), 2018, vol. 31, 6081–6091.

    While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.

    @inproceedings{LawSejCamLucFlaBatFuk2018,
      author = {Law, Ho Chung Leon and Sejdinovic, D. and Cameron, E. and Lucas, T. C. D. and Flaxman, S. and Battle, K. and Fukumizu, K.},
      title = {{{Variational Learning on Aggregate Outputs with Gaussian Processes}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      url = {https://papers.nips.cc/paper/7847-variational-learning-on-aggregate-outputs-with-gaussian-processes},
      arxiv = {https://arxiv.org/abs/1805.08463},
      video = {https://www.youtube.com/watch?v=UDSSGbokG1M},
      year = {2018},
      volume = {31},
      pages = {6081--6091},
      code = {https://github.com/hcllaw/VBAgg}
    }
    
  58. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Causal Inference via Kernel Deviance Measures, in Advances in Neural Information Processing Systems (NeurIPS), 2018, vol. 31, 6986–6994.

    Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tubingen Cause-Effect Pairs where we outperform existing state-of-the-art methods.

    @inproceedings{MitSejTeh2018,
      author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
      title = {{{Causal Inference via Kernel Deviance Measures}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      url = {https://papers.nips.cc/paper/7930-causal-inference-via-kernel-deviance-measures},
      arxiv = {https://arxiv.org/abs/1804.04622},
      year = {2018},
      volume = {31},
      pages = {6986--6994}
    }
    
  59. J.-F. Ton, S. Flaxman, D. Sejdinovic, and S. Bhatt, Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features, Spatial Statistics, vol. 28, 59–78, 2018.

    The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matern or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations, Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.

    @article{TonFlaSejBha2018,
      author = {Ton, J.-F. and Flaxman, S. and Sejdinovic, D. and Bhatt, S.},
      title = {{{Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features}}},
      journal = {Spatial Statistics},
      volume = {28},
      arxiv = {https://arxiv.org/abs/1711.05615},
      doi = {10.1016/j.spasta.2018.02.002},
      year = {2018},
      url = {https://doi.org/10.1016/j.spasta.2018.02.002},
      pages = {59--78}
    }
    
  60. H. C. L. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman, Bayesian Approaches to Distribution Regression, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, PMLR 84:1167–1176.

    Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.

    @inproceedings{LawSutSejFla2018,
      author = {Law, Ho Chung Leon and Sutherland, Danica J. and Sejdinovic, Dino and Flaxman, Seth},
      title = {{{Bayesian Approaches to Distribution Regression}}},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
      arxiv = {https://arxiv.org/abs/1705.04293},
      url = {http://proceedings.mlr.press/v84/law18a.html},
      pages = {PMLR 84:1167--1176},
      year = {2018}
    }
    
  61. Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic, Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, vol. 28, no. 1, 113–130, Jan. 2018.

    Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.

    @article{ZhaFilGreSej2018,
      author = {Zhang, Q. and Filippi, S. and Gretton, A. and Sejdinovic, D.},
      title = {{{Large-Scale Kernel Methods for Independence Testing}}},
      journal = {Statistics and Computing},
      arxiv = {http://arxiv.org/abs/1606.07892},
      url = {http://link.springer.com/article/10.1007\%2Fs11222-016-9721-7},
      doi = {10.1007/s11222-016-9721-7},
      year = {2018},
      month = jan,
      volume = {28},
      number = {1},
      pages = {113--130},
      code = {https://github.com/oxmlcs/kerpy}
    }
    
  62. S. Flaxman, Y. W. Teh, and D. Sejdinovic, Poisson Intensity Estimation with Reproducing Kernels, Electronic Journal of Statistics, vol. 11, no. 2, 5081–5104, 2017.

    Despite the fundamental nature of the inhomogeneous Pois- son process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) for- mulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in su- pervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite- dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

    @article{FlaTehSej2017ejs,
      author = {Flaxman, Seth and Teh, Yee Whye and Sejdinovic, Dino},
      title = {{{Poisson Intensity Estimation with Reproducing Kernels}}},
      journal = {Electronic Journal of Statistics},
      year = {2017},
      volume = {11},
      number = {2},
      pages = {5081--5104},
      doi = {10.1214/17-EJS1339SI},
      url = {https://projecteuclid.org/euclid.ejs/1513306868}
    }
    
  63. H. C. L. Law, C. Yau, and D. Sejdinovic, Testing and Learning on Distributions with Symmetric Noise Invariance, in Advances in Neural Information Processing Systems (NeurIPS), 2017, vol. 30, 1343–1353.

    Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest – discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise. Such features lend themselves to a straightforward neural network implementation and can thus also be learned given a supervised signal.

    @inproceedings{LawYauSej2017,
      author = {Law, H. C. L. and Yau, C. and Sejdinovic, D.},
      title = {{{Testing and Learning on Distributions with Symmetric Noise Invariance}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      arxiv = {https://arxiv.org/abs/1703.07596},
      code = {https://github.com/hcllaw/Fourier-Phase-Neural-Network},
      year = {2017},
      video = {https://www.youtube.com/watch?v=kKpbHXZ1qmU},
      url = {https://papers.nips.cc/paper/6733-testing-and-learning-on-distributions-with-symmetric-noise-invariance},
      volume = {30},
      pages = {1343--1353}
    }
    
  64. I. Schuster, H. Strathmann, B. Paige, and D. Sejdinovic, Kernel Sequential Monte Carlo, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2017, vol. 10534, 390–409.

    Bayesian posterior inference with Monte Carlo methods has a fundamental role in statistics and probabilistic machine learning. Target posterior distributions arising in increasingly complex models often exhibit high degrees of nonlinearity and multimodality and pose substantial challenges to traditional samplers. We propose the Kernel Sequential Monte Carlo (KSMC) framework for building emulator models of the current particle system in a Reproducing Kernel Hilbert Space and use the emulator’s geometry to inform local proposals. KSMC is applicable when gradients are unknown or prohibitively expensive and inherits the superior performance of SMC on multi-modal targets and its ability to estimate model evidence. Strengths of the proposed methodology are demonstrated on a series of challenging synthetic and real-world examples.

    @inproceedings{SchStrPaiSej2017,
      author = {Schuster, I. and Strathmann, H. and Paige, B. and Sejdinovic, D.},
      title = {{{Kernel Sequential Monte Carlo}}},
      arxiv = {http://arxiv.org/abs/1510.03105},
      url = {https://doi.org/10.1007/978-3-319-71249-9_24},
      doi = {10.1007/978-3-319-71249-9_24},
      booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
      series = {Lecture Notes in Computer Science},
      volume = {10534},
      pages = {390--409},
      year = {2017}
    }
    
  65. Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic, Feature-to-Feature Regression for a Two-Step Conditional Independence Test, in Uncertainty in Artificial Intelligence (UAI), 2017.

    The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data.

    @inproceedings{ZhaFilFlaSej2017,
      author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
      title = {{{Feature-to-Feature Regression for a Two-Step Conditional Independence Test}}},
      booktitle = {Uncertainty in Artificial Intelligence (UAI)},
      url = {http://auai.org/uai2017/proceedings/papers/250.pdf},
      supplements = {http://auai.org/uai2017/proceedings/supplements/250.pdf},
      code = {https://github.com/oxmlcs/kerpy},
      year = {2017}
    }
    
  66. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Deep Kernel Machines via the Kernel Reparametrization Trick, in International Conference on Learning Representations (ICLR) - Workshop Track, 2017.

    While deep neural networks have achieved state-of-the-art performance on many tasks across varied domains, they still remain black boxes whose inner workings are hard to interpret and understand. In this paper, we develop a novel method for efficiently capturing the behaviour of deep neural networks using kernels. In particular, we construct a hierarchy of increasingly complex kernels that encode individual hidden layers of the network. Furthermore, we discuss how our framework motivates a novel supervised weight initialization method that discovers highly discriminative features already at initialization.

    @inproceedings{MitSejTeh2017,
      author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
      title = {{{Deep Kernel Machines via the Kernel Reparametrization Trick}}},
      openreview = {https://openreview.net/forum?id=Bkiqt3Ntg&noteId=Bkiqt3Ntg},
      booktitle = {International Conference on Learning Representations (ICLR) - Workshop Track},
      year = {2017}
    }
    
  67. S. Flaxman, Y. W. Teh, and D. Sejdinovic, Poisson Intensity Estimation with Reproducing Kernels, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, PMLR 54:270–279.

    Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

    @inproceedings{FlaTehSej2017,
      author = {Flaxman, Seth and Teh, Yee Whye and Sejdinovic, Dino},
      title = {{{Poisson Intensity Estimation with Reproducing Kernels}}},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
      year = {2017},
      arxiv = {https://arxiv.org/abs/1610.08623},
      url = {http://proceedings.mlr.press/v54/flaxman17a.html},
      code = {https://bitbucket.org/flaxter/kernel-poisson},
      pages = {PMLR 54:270--279}
    }
    
  68. D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, D. Sejdinovic, G. Karabulut Kurt, C. Hollanti, and I. Fischer, CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT, IEEE Access, vol. 4, 3360–3378, 2016.

    In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense (reconfigurable knowledge acquisition systems), which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing “atomic” functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.

    @article{VukJaketal2016a,
      author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D. and Karabulut Kurt, G. and Hollanti, C. and Fischer, I.},
      title = {{{CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT}}},
      year = {2016},
      journal = {{IEEE Access}},
      volume = {4},
      doi = {10.1109/ACCESS.2016.2585468},
      url = {http://dx.doi.org/10.1109/ACCESS.2016.2585468},
      pages = {3360--3378}
    }
    
  69. D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, and D. Sejdinovic, Network Function Computation as a Service in Future 5G Machine Type Communications, in International Symposium on Turbo Codes & Iterative Information Processing (ISTC), 2016, 365–369.

    The 3GPP machine type communications (MTC) service is expected to contribute a dominant share of the IoT traffic via the upcoming fifth generation (5G) mobile cellular systems. MTC has ambition to connect billions of devices to communicate their data to MTC applications for further processing and data analysis. However, for majority of the applications, collecting all the MTC generated data is inefficient as the data is typically fed into application-dependent functions whose outputs determine the application actions. In this paper, we present a novel MTC architecture that, instead of collecting raw large-volume MTC data, offers the network function computation (NFC) as a service. For a given application demand (function to be computed), different modules (atomic nodes) of the communication infrastructure are orchestrated into a (reconfigurable) directed network topology, and each module is assigned an appropriately defined (reconfigurable) atomic function over the input data, such that the desired global network function is evaluated over the MTC data and a requested MTC-NFC service is delivered. We detail practical viability of incorporating MTC-NFC within the existing 3GPP architecture relying on emerging concepts of Network Function Virtualization and Software Defined Networking. Finally, throughout the paper, we point to the theoretical foundations that inspired the presented architecture highlighting challenges and future directions for designing 3GPP MTC-NFC service.

    @inproceedings{VukJaketal2016b,
      author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D.},
      title = {{{Network Function Computation as a Service in Future 5G Machine Type Communications}}},
      booktitle = {International Symposium on Turbo Codes \& Iterative Information Processing (ISTC)},
      year = {2016},
      pages = {365-369},
      doi = {10.1109/ISTC.2016.7593138},
      url = {http://dx.doi.org/10.1109/ISTC.2016.7593138}
    }
    
  70. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression, in International Conference on Machine Learning (ICML), 2016, PMLR 48:1482–1491.

    Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function. Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of appropriate problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics. In this paper, we develop a novel framework for this task using kernel-based distribution regression. We model the functional relationship between data distributions and the optimal choice (with respect to a loss function) of summary statistics using kernel-based distribution regression. We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework shows superior performance when compared to related methods on toy and real-world problems.

    @inproceedings{MitSejTeh2016,
      author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
      title = {{{DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression}}},
      booktitle = {International Conference on Machine Learning (ICML)},
      arxiv = {http://arxiv.org/abs/1602.04805},
      url = {http://proceedings.mlr.press/v48/mitrovic16.html},
      year = {2016},
      pages = {PMLR 48:1482--1491}
    }
    
  71. G. Franchi, J. Angulo, and D. Sejdinovic, Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings, in IEEE International Conference on Image Processing (ICIP), 2016, 1898–1902.

    We propose a novel approach for pixel classification in hyperspectral images, leveraging on both the spatial and spectral information in the data. The introduced method relies on a recently proposed framework for learning on distributions – by representing them with mean elements in reproducing kernel Hilbert spaces (RKHS) and formulating a classification algorithm therein. In particular, we associate each pixel to an empirical distribution of its neighbouring pixels, a judicious representation of which in an RKHS, in conjunction with the spectral information contained in the pixel itself, give a new explicit set of features that can be fed into a suite of standard classification techniques – we opt for a well established framework of support vector machines (SVM). Furthermore, the computational complexity is reduced via random Fourier features formalism. We study the consistency and the convergence rates of the proposed method and the experiments demonstrate strong performance on hyperspectral data with gains in comparison to the state-of-the-art results.

    @inproceedings{FraAngSej2016,
      author = {Franchi, G. and Angulo, J. and Sejdinovic, D.},
      title = {{{Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings}}},
      booktitle = {IEEE International Conference on Image Processing (ICIP)},
      year = {2016},
      arxiv = {http://arxiv.org/abs/1605.09136},
      url = {http://dx.doi.org/10.1109/ICIP.2016.7532688},
      doi = {10.1109/ICIP.2016.7532688},
      pages = {1898-1902}
    }
    
  72. B. Paige, D. Sejdinovic, and F. Wood, Super-Sampling with a Reservoir, in Uncertainty in Artificial Intelligence (UAI), 2016, 567–576.

    We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset which provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree, allowing for an algorithm which runs in a single pass through the whole data set, with only a logarithmic time complexity in the size of the subset at each iteration. These “super-samples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.

    @inproceedings{PaiSejWoo2016,
      author = {Paige, B. and Sejdinovic, D. and Wood, F.},
      title = {{{Super-Sampling with a Reservoir}}},
      booktitle = {Uncertainty in Artificial Intelligence (UAI)},
      year = {2016},
      pages = {567--576},
      url = {http://www.auai.org/uai2016/proceedings/papers/293.pdf}
    }
    
  73. S. Flaxman, D. Sejdinovic, J. P. Cunningham, and S. Filippi, Bayesian Learning of Kernel Embeddings, in Uncertainty in Artificial Intelligence (UAI), 2016, 182–191.

    Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of kernel mean embeddings of probability measures. For characteristic kernels, which include most commonly used ones, the kernel mean embedding uniquely determines its probability measure, so it can be used to design a powerful statistical testing framework, which includes nonparametric two-sample and independence tests. In practice, however, the performance of these tests can be very sensitive to the choice of kernel and its lengthscale parameters. To address this central issue, we propose a new probabilistic model for kernel mean embeddings, the Bayesian Kernel Embedding model, combining a Gaussian process prior over the Reproducing Kernel Hilbert Space containing the mean embedding with a conjugate likelihood function, thus yielding a closed form posterior over the mean embedding. The posterior mean of our model is closely related to recently proposed shrinkage estimators for kernel mean embeddings, while the posterior uncertainty is a new, interesting feature with various possible applications. Critically for the purposes of kernel learning, our model gives a simple, closed form marginal pseudolikelihood of the observed data given the kernel hyperparameters. This marginal pseudolikelihood can either be optimized to inform the hyperparameter choice or fully Bayesian inference can be used.

    @inproceedings{FlaSejCunFil2016,
      author = {Flaxman, S. and Sejdinovic, D. and Cunningham, J.P. and Filippi, S.},
      title = {{{Bayesian Learning of Kernel Embeddings}}},
      booktitle = {Uncertainty in Artificial Intelligence (UAI)},
      arxiv = {http://arxiv.org/abs/1603.02160},
      year = {2016},
      pages = {182--191},
      url = {http://www.auai.org/uai2016/proceedings/papers/145.pdf},
      supplements = {http://www.auai.org/uai2016/proceedings/supp/145_supp.pdf}
    }
    
  74. M. Park, W. Jitkrittum, and D. Sejdinovic, K2-ABC: Approximate Bayesian Computation with Kernel Embeddings, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, PMLR 51:398–407.

    Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will “leak” information, which leads to ABC algorithms yielding samples from an incorrect (partial) posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) to construct a dissimilarity measure between the observed and simulated data. The embedding of an empirical distribution of the data into a reproducing kernel Hilbert space plays a role of the summary statistic and is sufficient whenever the corresponding kernels are characteristic. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm.

    @inproceedings{ParJitSej2016,
      author = {Park, M. and Jitkrittum, W. and Sejdinovic, D.},
      title = {{{K2-ABC: Approximate Bayesian Computation with Kernel Embeddings}}},
      booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
      arxiv = {http://arxiv.org/abs/1502.02558},
      url = {http://proceedings.mlr.press/v51/park16.html},
      year = {2016},
      pages = {PMLR 51:398--407},
      code = {https://github.com/wittawatj/k2abc}
    }
    
  75. H. Strathmann, D. Sejdinovic, S. Livingstone, Z. Szabo, and A. Gretton, Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 955–963.

    We propose Kamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where HMC is unavailable due to intractable gradients, KMC adaptively learns the target’s gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency and offers substantial mixing improvements to state-of-the-art gradient free-samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.

    @incollection{StrSejLivSzaGre2015,
      author = {Strathmann, H. and Sejdinovic, D. and Livingstone, S. and Szabo, Z. and Gretton, A.},
      title = {{{Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {28},
      year = {2015},
      pages = {955--963},
      url = {http://papers.nips.cc/paper/5890-gradient-free-hamiltonian-monte-carlo-with-efficient-kernel-exponential-families},
      arxiv = {http://arxiv.org/abs/1506.02564}
    }
    
  76. K. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton, Fast Two-Sample Testing with Analytic Representations of Probability Measures, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 1981–1989.

    We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics.

    @incollection{ChwRamSejGre2015,
      author = {Chwialkowski, K. and Ramdas, A. and Sejdinovic, D. and Gretton, A.},
      title = {{{Fast Two-Sample Testing with Analytic Representations of Probability Measures}}},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {28},
      year = {2015},
      pages = {1981--1989},
      url = {http://papers.nips.cc/paper/5685-fast-two-sample-testing-with-analytic-representations-of-probability-measures},
      arxiv = {http://arxiv.org/abs/1506.04725}
    }
    
  77. D. Vukobratovic, D. Sejdinovic, and A. Pizurica, Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective, in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2015.

    Compressed Sensing (CS) methods using sparse binary measurement matrices and iterative message-passing recovery procedures have been recently investigated due to their low computational complexity and excellent performance. Drawing much of inspiration from sparse-graph codes such as Low-Density Parity-Check (LDPC) codes, these studies use analytical tools from modern coding theory to analyze CS solutions. In this paper, we consider and systematically analyze the CS setup inspired by a class of efficient, popular and flexible sparse-graph codes called rateless codes. The proposed rateless CS setup is asymptotically analyzed using tools such as Density Evolution and EXIT charts and fine-tuned using degree distribution optimization techniques.

    @inproceedings{VukSejPiz2015,
      author = {Vukobratovic, Dejan and Sejdinovic, Dino and Pizurica, Aleksandra},
      title = {{{Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective}}},
      booktitle = {IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)},
      year = {2015},
      doi = {10.1109/SPAWC.2015.7227005},
      url = {http://dx.doi.org/10.1109/SPAWC.2015.7227005}
    }
    
  78. Z. Kurth-Nelson, G. Barnes, D. Sejdinovic, R. Dolan, and P. Dayan, Temporal Structure in Associative Retrieval, eLife, vol. 4, no. e04919, 2015.

    Electrophysiological data disclose rich dynamics in patterns of neural activity evoked by sensory objects. Retrieving objects from memory reinstates components of this activity. In humans, the temporal structure of this retrieved activity remains largely unexplored, and here we address this gap using the spatiotemporal precision of magnetoencephalography (MEG). In a sensory preconditioning paradigm, ’indirect’ objects were paired with ’direct’ objects to form associative links, and the latter were then paired with rewards. Using multivariate analysis methods we examined the short-time evolution of neural representations of indirect objects retrieved during reward-learning about direct objects. We found two components of the evoked representation of the indirect stimulus, 200 ms apart. The strength of retrieval of one, but not the other, representational component correlated with generalization of reward learning from direct to indirect stimuli. We suggest the temporal structure within retrieved neural representations may be key to their function.

    @article{KNeBarSejDolDay2015,
      author = {Kurth-Nelson, Zeb and Barnes, Gareth and Sejdinovic, Dino and Dolan, Ray and Dayan, Peter},
      title = {{{Temporal Structure in Associative Retrieval}}},
      volume = {4},
      number = {e04919},
      year = {2015},
      doi = {10.7554/eLife.04919},
      publisher = {eLife Sciences Publications Limited},
      journal = {{eL}ife},
      url = {http://dx.doi.org/10.7554/eLife.04919}
    }
    
  79. W. Jitkrittum, A. Gretton, N. Heess, S. M. A. Eslami, B. Lakshminarayanan, D. Sejdinovic, and Z. Szabó, Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages, in Uncertainty in Artificial Intelligence (UAI), 2015.

    We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where the ability to accurately assess uncertainty and to efficiently and robustly update the message operator are essential.

    @inproceedings{JitGreHeeEslLakSejSza2015,
      author = {Jitkrittum, Wittawat and Gretton, Arthur and Heess, Nicolas and Eslami, S. M. Ali and Lakshminarayanan, Balaji and Sejdinovic, Dino and Szab\'{o}, Zolt\'{a}n},
      booktitle = {Uncertainty in Artificial Intelligence (UAI)},
      title = {{{Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages}}},
      arxiv = {http://arxiv.org/abs/1503.02551},
      year = {2015},
      code = {https://github.com/wittawatj/kernel-ep},
      url = {http://auai.org/uai2015/proceedings/papers/235.pdf},
      supplements = {http://auai.org/uai2015/proceedings/supp/239_supp.pdf}
    }
    
  80. K. Chwialkowski, D. Sejdinovic, and A. Gretton, A Wild Bootstrap for Degenerate Kernel Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 27, 2014, 3608–3616.

    A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.

    @incollection{ChwSejGre2014,
      title = {{{A Wild Bootstrap for Degenerate Kernel Tests}}},
      author = {Chwialkowski, Kacper and Sejdinovic, Dino and Gretton, Arthur},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {27},
      pages = {3608--3616},
      year = {2014},
      url = {http://papers.nips.cc/paper/5452-a-wild-bootstrap-for-degenerate-kernel-tests.pdf},
      code = {https://github.com/kacperChwialkowski/wildBootstrap},
      video = {http://research.microsoft.com/apps/video/?id=240378}
    }
    
  81. D. Sejdinovic, H. Strathmann, M. G. Lomeli, C. Andrieu, and A. Gretton, Kernel Adaptive Metropolis-Hastings, in International Conference on Machine Learning (ICML), 2014, 1665–1673.

    A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

    @inproceedings{SejStrGarAndGre14,
      title = {{{Kernel Adaptive Metropolis-Hastings}}},
      author = {Sejdinovic, D. and Strathmann, H. and Lomeli, M.G. and Andrieu, C. and Gretton, A.},
      booktitle = {International Conference on Machine Learning (ICML)},
      year = {2014},
      pages = {1665--1673},
      code = {https://github.com/karlnapf/kameleon-mcmc},
      arxiv = {http://arxiv.org/abs/1307.5302},
      url = {http://jmlr.org/proceedings/papers/v32/sejdinovic14.pdf},
      supplements = {http://jmlr.org/proceedings/papers/v32/sejdinovic14-supp.zip}
    }
    
  82. O. Johnson, D. Sejdinovic, J. Cruise, R. Piechocki, and A. Ganesh, Non-Parametric Change-Point Estimation using String Matching Algorithms, Methodology and Computing in Applied Probability, vol. 16, no. 4, 987–1008, 2014.

    Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.

    @article{JohSejCruPieGan2014,
      title = {{{Non-Parametric Change-Point Estimation using String Matching Algorithms}}},
      year = {2014},
      issn = {1387-5841},
      journal = {Methodology and Computing in Applied Probability},
      volume = {16},
      number = {4},
      doi = {10.1007/s11009-013-9359-2},
      url = {http://dx.doi.org/10.1007/s11009-013-9359-2},
      publisher = {Springer US},
      author = {Johnson, Oliver and Sejdinovic, Dino and Cruise, James and Piechocki, Robert and Ganesh, Ayalvadi},
      pages = {987-1008}
    }
    
  83. D. Sejdinovic, A. Gretton, and W. Bergsma, A Kernel Test for Three-Variable Interactions, in Advances in Neural Information Processing Systems (NeurIPS), vol. 26, 2013, 1124–1132.

    We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.

    @incollection{SejGreBer2013,
      title = {{{A Kernel Test for Three-Variable Interactions}}},
      author = {Sejdinovic, Dino and Gretton, Arthur and Bergsma, Wicher},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {26},
      pages = {1124--1132},
      year = {2013},
      url = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions.pdf},
      supplements = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions-supplemental.zip},
      code = {http://www.gatsby.ucl.ac.uk/%7Egretton/interact/threeWayInteract.htm},
      arxiv = {http://arxiv.org/abs/1306.2281},
      video = {http://research.microsoft.com/apps/video/default.aspx?id=206943}
    }
    
  84. D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics, vol. 41, no. 5, 2263–2291, Oct. 2013.

    We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

    @article{SejSriGreFuk2013,
      author = {Sejdinovic, Dino and Sriperumbudur, Bharath and Gretton, Arthur and Fukumizu, Kenji},
      doi = {10.1214/13-AOS1140},
      journal = {Annals of Statistics},
      month = oct,
      volume = {41},
      number = {5},
      pages = {2263--2291},
      title = {{{Equivalence of distance-based and RKHS-based statistics in hypothesis testing}}},
      url = {http://dx.doi.org/10.1214/13-AOS1140},
      year = {2013},
      arxiv = {http://arxiv.org/abs/1207.6076}
    }
    
  85. A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sriperumbudur, Optimal Kernel Choice for Large-Scale Two-Sample Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012, 1205–1213.

    Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p=q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is thus critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

    @incollection{GreSriSejStrBalPonFuk2012,
      title = {{{Optimal Kernel Choice for Large-Scale Two-Sample Tests}}},
      author = {Gretton, Arthur and Sejdinovic, Dino and Strathmann, Heiko and Balakrishnan, Sivaraman and Pontil, Massimiliano and Fukumizu, Kenji and Sriperumbudur, Bharath K.},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      volume = {25},
      pages = {1205--1213},
      year = {2012},
      url = {http://papers.nips.cc/paper/4727-optimal-kernel-choice-for-large-scale-two-sample-tests.pdf},
      code = {http://www.gatsby.ucl.ac.uk/%7Egretton/adaptMMD/adaptMMD.htm}
    }
    
  86. D. Sejdinovic, A. Gretton, B. K. Sriperumbudur, and K. Fukumizu, Hypothesis Testing Using Pairwise Distances and Associated Kernels, in International Conference on Machine Learning (ICML), 2012, 1111–1118.

    We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

    @inproceedings{SejGreSriFuk12,
      title = {{{Hypothesis Testing Using Pairwise Distances and Associated Kernels}}},
      author = {Sejdinovic, D. and Gretton, Arthur and Sriperumbudur, Bharath K. and Fukumizu, Kenji},
      booktitle = {International Conference on Machine Learning (ICML)},
      year = {2012},
      pages = {1111--1118},
      url = {http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sejdinovic_575.pdf},
      arxiv = {http://arxiv.org/abs/1205.0411},
      video = {http://techtalks.tv/talks/57320/}
    }
    
  87. R. Piechocki and D. Sejdinovic, Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks, in IEEE International Conference on Communications (ICC), 2012.

    In this paper we introduce a novel modulation and multiplexing method which facilitates highly efficient and simultaneous communication between multiple terminals in wireless ad-hoc networks. We term this method Combinatorial Channel Signature Modulation (CCSM). The CCSM method is particularly efficient in situations where communicating nodes operate in highly time dispersive environments. This is all achieved with a minimal MAC layer overhead, since all users are allowed to transmit and receive at the same time/frequency (full simultaneous duplex). The CCSM method has its roots in sparse modelling and the receiver is based on compressive sampling techniques. Towards this end, we develop a new low complexity algorithm termed Group Subspace Pursuit. Our analysis suggests that CCSM at least doubles the throughput when compared to the state-of-the art.

    @inproceedings{PieSej2012,
      title = {{Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks}},
      author = {Piechocki, R. and Sejdinovic, D.},
      booktitle = {IEEE International Conference on Communications (ICC)},
      year = {2012},
      doi = {10.1109/ICC.2012.6363956},
      url = {http://dx.doi.org/10.1109/ICC.2012.6363956},
      arxiv = {http://arxiv.org/abs/1201.5608},
      file = {pdf/2012ICC.pdf}
    }
    
  88. A. Muller, D. Sejdinovic, and R. Piechocki, Approximate Message Passing under Finite Alphabet Constraints, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.

    In this paper we consider Basis Pursuit De-Noising (BPDN) problems in which the sparse original signal is drawn from a finite alphabet. To solve this problem we propose an iterative message passing algorithm, which capitalises not only on the sparsity but by means of a prior distribution also on the discrete nature of the original signal. In our numerical experiments we test this algorithm in combination with a Rademacher measurement matrix and a measurement matrix derived from the random demodulator, which enables compressive sampling of analogue signals. Our results show in both cases significant performance gains over a linear programming based approach to the considered BPDN problem. We also compare the proposed algorithm to a similar message passing based algorithm without prior knowledge and observe an even larger performance improvement.

    @inproceedings{MulSejPie2012,
      title = {{{Approximate Message Passing under Finite Alphabet Constraints}}},
      author = {Muller, A. and Sejdinovic, D. and Piechocki, R.},
      booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      year = {2012},
      doi = {10.1109/ICASSP.2012.6288590},
      url = {http://dx.doi.org/10.1109/ICASSP.2012.6288590},
      arxiv = {http://arxiv.org/abs/1201.4949},
      file = {pdf/2012ICASSP.pdf}
    }
    
  89. W. Dai, D. Sejdinovic, and O. Milenkovic, Gaussian Dynamic Compressive Sensing, in International Conference on Sampling Theory and Applications (SampTA), 2011.

    We consider the problem of estimating a discrete-time sequence of sparse signals with Gaussian innovations. Instances of such problems arise in networking and imaging, in particular, dynamic and interventional MRI imaging. Our approach combines Kalman filtering and compressive sensing (CS) techniques by introducing a sparse MAP estimator for Gaussian signals, and then developing a CS-type algorithm for solving the sparse MAP problem. Despite the underlying assumption that the sequence of sparse signals is Gaussian, our approach also allows for efficient tracking of sparse non-Gaussian signals obtained via non-linear mappings, using only one sample/observation per time instance.

    @inproceedings{DaiSejMil2011,
      title = {{{Gaussian Dynamic Compressive Sensing}}},
      author = {Dai, W. and Sejdinovic, D. and Milenkovic, O.},
      booktitle = {International Conference on Sampling Theory and Applications (SampTA)},
      year = {2011},
      url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.640.6971&rep=rep1&type=pdf}
    }
    
  90. D. Sejdinovic and O. Johnson, Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 998–1003.

    An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, developing simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds.

    @inproceedings{SejJoh2010,
      title = {{{Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction}}},
      author = {Sejdinovic, D. and Johnson, O.},
      booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
      pages = {998--1003},
      year = {2010},
      doi = {10.1109/ALLERTON.2010.5707018},
      url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707018},
      arxiv = {http://arxiv.org/abs/1010.2441}
    }
    
  91. D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail, Decentralised Distributed Fountain Coding: Asymptotic Analysis and Design, IEEE Communications Letters, vol. 14, no. 1, 42–44, 2010.

    A class of generic decentralised distributed fountain coding schemes is introduced and the tools of analysis of the performance of such schemes are presented. It is demonstrated that the developed approach can be used to formulate a robust code design methodology in a number of instances. We show that two non-standard applications of fountain codes, fountain codes for distributed source coding and fountain codes for unequal error protection lie within this decentralised distributed fountain coding framework.

    @article{SejPieDouIsm2010,
      title = {{{Decentralised Distributed Fountain Coding: Asymptotic Analysis and Design}}},
      author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
      journal = {IEEE Communications Letters},
      volume = {14},
      number = {1},
      pages = {42--44},
      year = {2010},
      file = {pdf/2010CommLetter.pdf},
      doi = {10.1109/LCOMM.2010.01.091541},
      url = {http://dx.doi.org/10.1109/LCOMM.2010.01.091541}
    }
    
  92. D. Sejdinovic, C. Andrieu, and R. Piechocki, Bayesian Sequential Compressed Sensing in Sparse Dynamical Systems, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 1730–1736.

    While the theory of compressed sensing provides means to reliably and efficiently acquire a sparse high-dimensional signal from a small number of its linear projections, sensing of dynamically changing sparse signals is still not well understood. We pursue a Bayesian approach to the problem of sequential compressed sensing and develop methods to recursively estimate the full posterior distribution of the signal.

    @inproceedings{SejAndPie2010,
      title = {{{Bayesian Sequential Compressed Sensing in Sparse Dynamical Systems}}},
      author = {Sejdinovic, D. and Andrieu, C. and Piechocki, R.},
      booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
      pages = {1730--1736},
      year = {2010},
      doi = {10.1109/ALLERTON.2010.5707125},
      url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707125}
    }
    
  93. D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki, Expanding Window Fountain Codes for Unequal Error Protection, IEEE Transactions on Communications, vol. 57, no. 9, 2510–2516, 2009.

    A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

    @article{SejVukDouSenPie2009,
      title = {{{Expanding Window Fountain Codes for Unequal Error Protection}}},
      author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
      journal = {IEEE Transactions on Communications},
      volume = {57},
      number = {9},
      pages = {2510--2516},
      year = {2009},
      doi = {10.1109/TCOMM.2009.09.070616},
      url = {http://dx.doi.org/10.1109/TCOMM.2009.09.070616},
      file = {pdf/2009TransComm.pdf}
    }
    
  94. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Scalable Video Multicast Using Expanding Window Fountain Codes, IEEE Transactions on Multimedia, vol. 11, no. 6, 1094–1104, 2009.

    Fountain codes were introduced as an efficient and universal forward error correction (FEC) solution for data multicast over lossy packet networks. They have recently been proposed for large scale multimedia content delivery in practical multimedia distribution systems. However, standard fountain codes, such as LT or Raptor codes, are not designed to meet unequal error protection (UEP) requirements typical in real-time scalable video multicast applications. In this paper, we propose recently introduced UEP expanding window fountain (EWF) codes as a flexible and efficient solution for real-time scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be tuned at the server side to meet the different reception criteria of heterogeneous receivers. The performance analysis using both analytical results and simulation experiments of H.264 scalable video coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

    @article{VukStaSejStaXio2009,
      title = {{{Scalable Video Multicast Using Expanding Window Fountain Codes}}},
      author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
      journal = {IEEE Transactions on Multimedia},
      volume = {11},
      number = {6},
      pages = {1094--1104},
      year = {2009},
      doi = {10.1109/TMM.2009.2026087},
      url = {http://dx.doi.org/10.1109/TMM.2009.2026087},
      file = {pdf/2009TransMultimedia.pdf}
    }
    
  95. D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail, Fountain Code Design for Data Multicast with Side Information, IEEE Transactions on Wireless Communications, vol. 8, no. 10, 5155–5165, 2009.

    Fountain codes are a robust solution for data multicasting to a large number of receivers which experience variable channel conditions and different packet loss rates. However, the standard fountain code design becomes inefficient if all receivers have access to some side information correlated with the source information. We focus our attention on the cases where the correlation of the source and side information can be modelled by a binary erasure channel (BEC) or by a binary input additive white Gaussian noise channel (BIAWGNC). We analyse the performance of fountain codes in data multicasting with side information for these cases, derive bounds on their performance and provide a fast and robust linear programming optimization framework for code parameters. We demonstrate that systematic Raptor code design can be employed as a possible solution to the problem at the cost of higher encoding/decoding complexity, as it reduces the side information scenario to a channel coding problem. However, our results also indicate that a simpler solution, non-systematic LT and Raptor codes, can be designed to perform close to the information theoretic bounds.

    @article{SejPieDouIsm2009,
      title = {{{Fountain Code Design for Data Multicast with Side Information}}},
      author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
      journal = {IEEE Transactions on Wireless Communications},
      volume = {8},
      number = {10},
      pages = {5155--5165},
      year = {2009},
      publisher = {IEEE},
      doi = {10.1109/TWC.2009.081076},
      url = {http://dx.doi.org/10.1109/TWC.2009.081076},
      file = {pdf/2009TransWirelessComm.pdf}
    }
    
  96. D. Sejdinovic, R. J. Piechocki, and A. Doufexi, AND-OR Tree Analysis of Distributed LT Codes, in IEEE Information Theory Workshop (ITW), 2009, 261–265.

    In this contribution, we consider design of distributed LT codes, i.e., independent rateless encodings of multiple sources which communicate to a common relay, where relay is able to combine incoming packets from the sources and forwards them to receivers. We provide density evolution formulae for distributed LT codes, which allow us to formulate distributed LT code design problem and prove the equivalence of performance of distributed LT codes and LT codes with related parameters in the asymptotic regime. Furthermore, we demonstrate that allowing LT coding apparatus at both the sources and the relay may prove advantageous to coding only at the sources and coding only at the relay.

    @inproceedings{SejPieDou2009ITW,
      title = {{{AND-OR Tree Analysis of Distributed LT Codes}}},
      author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
      booktitle = {IEEE Information Theory Workshop (ITW)},
      pages = {261--265},
      year = {2009},
      doi = {10.1109/ITWNIT.2009.5158583},
      url = {http://dx.doi.org/10.1109/ITWNIT.2009.5158583},
      file = {pdf/2009ITW.pdf}
    }
    
  97. D. Vukobratovic, V. Stankovic, L. Stankovic, and D. Sejdinovic, Precoded EWF Codes for Unequal Error Protection of Scalable Video, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.

    Rateless codes are forward error correcting (FEC) codes of linear encoding-decoding complexity and asymptotically capacity-approaching performance over erasure channels with any erasure statistics. They have been recently recognized as a simple and efficient solution for packetized video transmission over networks with packet erasures. However, to adapt the error correcting capabilities of rateless codes to the unequal importance of scalable video, unequal error protection (UEP) rateless codes are proposed as an alternative to standard rateless codes. In this paper, we extend our recent work on UEP rateless codes called Expanding Window Fountain (EWF) codes in order to improve their UEP performance. We investigate the design of precoded EWF codes, where precoding is done using high-rate Low-Density Parity-Check (LDPC) codes, following the similar reasoning applied in the design of Raptor codes. The obtained results are presented in the context of UEP error correcting performance of EWF codes and applied on scalable video coded (SVC) transmission over erasure networks.

    @inproceedings{VukStaStaSej2009,
      title = {{{Precoded EWF Codes for Unequal Error Protection of Scalable Video}}},
      author = {Vukobratovic, D. and Stankovic, V. and Stankovic, L. and Sejdinovic, D.},
      booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
      year = {2009},
      url = {http://portal.acm.org/citation.cfm?id=1653559},
      doi = {10.4108/ICST.MOBIMEDIA2009.7407},
      file = {pdf/2009MobimediaB.pdf}
    }
    
  98. D. Sejdinovic, R. J. Piechocki, and A. Doufexi, Rateless Distributed Source Code Design, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.

    Over the past decade, rateless codes, i.e., digital fountain codes, have emerged as an efficient and robust solution for reliable data transmission over packet erasure networks and a particularly suitable one for multicasting and broadcasting applications where users may experience variable channel conditions and packet loss rates, such as mobile environments. Luby Transform (LT) and Raptor codes are practical fountain codes with a capacity approaching performance and a low computational cost. In addition to their channel coding applications, the use of fountain codes for various kinds of distributed source compression and distributed joint-source channel coding has been extensively studied lately, and with promising results. However, a systematic treatise of the code design and optimization considerations for such non-standard applications of fountain codes is still absent. In this contribution, we overview the main results concerned with rateless codes for distributed source coding and outline several examples of data dissemination protocols where carefully designed fountain codes can provide strikingly simple, yet robust solutions yielding both distributed source coding and channel coding gains.

    @inproceedings{SejPieDou2009,
      title = {{{Rateless Distributed Source Code Design}}},
      author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
      booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
      year = {2009},
      url = {http://portal.acm.org/citation.cfm?id=1653578},
      doi = {10.4108/ICST.MOBIMEDIA2009.7455},
      file = {pdf/2009MobimediaA.pdf}
    }
    
  99. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Expanding Window Fountain Codes for Scalable Video Multicast, in IEEE International Conference on Multimedia and Expo (ICME), 2008, 77–80.

    Digital Fountain (DF) codes have recently been suggested as an efficient forward error correction (FEC) solution for video multicast to heterogeneous receiver classes over lossy packet networks. However, to adapt DF codes to low-delay constraints and varying importance of scalable multimedia content, unequal error protection (UEP) DF schemes are needed. Thus, in this paper, Expanding Window Fountain (EWF) codes are proposed as a FEC solution for scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception conditions of heterogeneous receivers. Performance analysis of H.264 Scalable Video Coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

    @inproceedings{VukStaSejStaXio2008,
      title = {{{Expanding Window Fountain Codes for Scalable Video Multicast}}},
      author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
      booktitle = {IEEE International Conference on Multimedia and Expo (ICME)},
      pages = {77--80},
      year = {2008},
      doi = {10.1109/ICME.2008.4607375},
      url = {http://dx.doi.org/10.1109/ICME.2008.4607375}
    }
    
  100. D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail, Fountain Coding with Decoder Side Information, in IEEE International Conference on Communications (ICC), 2008, 4477–4482.

    In this contribution, we consider the application of digital fountain (DF) codes to the problem of data transmission when side information is available at the decoder. The side information is modelled as a "virtual" channel output when original information sequence is the input. For two cases of the system model, which model both the virtual and the actual transmission channel either as a binary erasure channel or as a binary input additive white Gaussian noise (BIAWGN) channel, we propose methods of enhancing the design of standard non-systematic DF codes by optimizing their output degree distribution based on the side information assumption. In addition, a systematic Raptor design has been employed as a possible solution to the problem.

    @inproceedings{SejPieDouIsm2008ICC,
      title = {{{Fountain Coding with Decoder Side Information}}},
      author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
      booktitle = {IEEE International Conference on Communications (ICC)},
      pages = {4477--4482},
      year = {2008},
      doi = {10.1109/ICC.2008.840},
      url = {http://dx.doi.org/10.1109/ICC.2008.840}
    }
    
  101. D. Sejdinovic, V. Ponnampalam, R. J. Piechocki, and A. Doufexi, The Throughput Analysis of Different IR-HARQ Schemes based on Fountain Codes, in IEEE Wireless Communications and Networking Conference (WCNC), 2008, 267–272.

    In this contribution, we construct two novel IR-HARQ (automatic repeat request) schemes based on fountain codes, which combine the punctured and rateless IR-HARQ schemes, in order to attain the advantageous properties of both: nearly optimal performance of the former at the high signal-to-noise ratio (SNR) region and ratelessness of the latter. The preliminary simulation results indicate that these schemes are particularly suitable for scenarios where the transmission is originally assumed to occur at the very high SNR region, but resilience to severe deterioration of channel conditions is required.

    @inproceedings{SejPonPieDou2008,
      title = {{{The Throughput Analysis of Different IR-HARQ Schemes based on Fountain Codes}}},
      author = {Sejdinovic, D. and Ponnampalam, V. and Piechocki, R.J. and Doufexi, A.},
      booktitle = {IEEE Wireless Communications and Networking Conference (WCNC)},
      pages = {267--272},
      year = {2008},
      doi = {10.1109/WCNC.2008.52},
      url = {http://dx.doi.org/10.1109/WCNC.2008.52},
      file = {pdf/2008WCNC.pdf}
    }
    
  102. D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail, Rate Adaptive Binary Erasure Quantization with Dual Fountain Codes, in IEEE Global Telecommunications Conference (GLOBECOM), 2008.

    In this contribution, duals of fountain codes are introduced and their use for lossy source compression is investigated. It is shown both theoretically and experimentally that the source coding dual of the binary erasure channel coding problem, binary erasure quantization, is solved at a nearly optimal rate with application of duals of LT and raptor codes by a belief propagation-like algorithm which amounts to a graph pruning procedure. Furthermore, this quantizing scheme is rate adaptive, i.e., its rate can be modified on-the-fly in order to adapt to the source distribution, very much like LT and raptor codes are able to adapt their rate to the erasure probability of a channel.

    @inproceedings{SejPieDouIsm2008,
      title = {{{Rate Adaptive Binary Erasure Quantization with Dual Fountain Codes}}},
      author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
      booktitle = {IEEE Global Telecommunications Conference (GLOBECOM)},
      year = {2008},
      doi = {10.1109/GLOCOM.2008.ECP.238},
      url = {http://dx.doi.org/10.1109/GLOCOM.2008.ECP.238},
      file = {pdf/2008Globecom.pdf}
    }
    
  103. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Scalable Data Multicast Using Expanding Window Fountain Codes, in 45th Annual Allerton Conference on Communication, Control, and Computing, 2007.

    Digital Fountain (DF) codes were introduced as an efficient and universal Forward Error Correction (FEC) solution for data multicast over lossy packet networks. However, in real-time applications, the DF encoder cannot make use of the “rateless” property as it was proposed in the DF framework, due to its delay constraints. In this scenario, many receivers might not be able to collect enough encoded symbols (packets) to perform succesful decoding of the source data block (e.g., they are connected as a low bit-rate receivers to a high bit-rate source stream, or they are affected by severe channel conditions). This paper proposes an application of recently introduced Expanding Window Fountain (EWF) codes as a scalable and efficient solution for real-time multicast data transmission. We show that, by carefully optimizing EWF code design parameters, it is possible to design a flexible DF solution that is capable of satisfying multicast data receivers over a wide range of data rates and/or erasure channel conditions.

    @inproceedings{VukStaSejStaXio2007,
      title = {{{Scalable Data Multicast Using Expanding Window Fountain Codes}}},
      author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
      booktitle = {45th Annual Allerton Conference on Communication, Control, and Computing},
      year = {2007},
      file = {pdf/2007Allerton.pdf}
    }
    
  104. D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki, Expanding Window Fountain Codes for Unequal Error Protection, in Asilomar Conference on Signals, Systems and Computers, 2007, 1020–1024.

    A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

    @inproceedings{SejVukDouSenPie2007,
      title = {{{Expanding Window Fountain Codes for Unequal Error Protection}}},
      author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
      booktitle = {Asilomar Conference on Signals, Systems and Computers},
      doi = {10.1109/ACSSC.2007.4487375},
      url = {http://dx.doi.org/10.1109/ACSSC.2007.4487375},
      year = {2007},
      pages = {1020--1024},
      file = {pdf/2007Asilomar.pdf}
    }
    

Patents

  1. R. J. Piechocki and D. Sejdinovic, Communication System, Method and Apparatus. US Patent 8,879,664, Nov-2014.
    @misc{PieSej_patent,
      author = {Piechocki, R.J. and Sejdinovic, D.},
      title = {Communication System, Method and Apparatus},
      year = {2014},
      month = nov,
      location = {US},
      filing_num = {10684114},
      yearfiled = {2012},
      monthfiled = {08},
      howpublished = {US Patent 8,879,664}
    }
    

Theses

  1. D. Sejdinovic, Topics in Fountain Coding, PhD dissertation, University of Bristol, 2009.

    The invention of the sparse graph codes, error correction codes with low complexity and rates close to capacity, has had an unrivaled impact on digital communication systems. A recent advance in the sparse graph codes, fountain coding, due to its natural rate adaptivity, is becoming an error correction coding scheme of choice for many multicasting and broadcasting systems. This thesis studies the use of fountain codes for several non-standard coding problems commonly occuring in communications. Generic decentralised distributed fountain coding schemes for networked communications are developed, discussed and analysed, where many non-cooperating source nodes communicate possibly correlated data to a large number of receivers. Several results concerning the generalised asymptotic analysis of the fountain decoder in this decentralised and distributed coding setting are presented. The problem of fountain codes with unequal error protection property is explored, where a novel class of fountain codes, Expanding Window Fountain (EWF) codes, is proposed, analysed and shown to offer competitive performance applicable to scalable video multicasting. Further, asymptotic analysis, code design and optimisation are derived for both symmetric and asymmetric Slepian-Wolf coding with fountain codes. It is shown how one can obtain both channel coding and distributed source coding gains with the same fountain coding scheme, by a judicious choice of the code parameters. The developed methods of asymptotic analysis are extended to the problem of independent fountain encodings at multiple source nodes which communicate to a common relay. It is shown that the re-encoding of the multiple fountain encoded bitstreams at the relay node with another fountain code may reduce the number of required transmissions, and the overall code optimisation methods of such schemes are derived. Finally, dual fountain codes are introduced and equipped with a low complexity quantisation algorithm for a lossy source coding problem dual to binary erasure channel coding.

    @thesis{Sej2009,
      title = {{{Topics in Fountain Coding}}},
      author = {Sejdinovic, Dino},
      year = {2009},
      school = {University of Bristol},
      file = {pdf/PhD_TopicsInFountainCoding.pdf},
      type = {PhD dissertation},
      keywords = {thesis}
    }
    
  2. D. Sejdinovic, Neka svojstva Zeta funkcija, DiplMath-Inf dissertation, University of Sarajevo, 2006.
    @thesis{Sej2006thesis,
      title = {{{Neka svojstva Zeta funkcija}}},
      author = {Sejdinovic, Dino},
      year = {2006},
      school = {University of Sarajevo},
      type = {DiplMath-Inf dissertation},
      keywords = {thesis}
    }
    

Selected Non-Archival Papers and Abstracts

  1. S. Bouabid, D. Sejdinovic, and D. Watson-Parris, Probabilistic climate emulation with physics-constrained Gaussian processes, in EGU General Assembly, 2023, EGU23–15660.

    Emulators, or reduced complexity climate models, are surrogate Earth system models that produce projections of key climate quantities with minimal computational resources. Using time-series modeling or more advanced machine learning techniques, statistically-driven emulators have emerged as a promising venue, producing spatially resolved climate responses that are visually indistinguishable from state-of-the-art Earth system models. Yet, their lack of physical interpretability limits their wider adoption. In fact, the exploration of future emission scenarios still relies on one-dimensional energy balance models which can lack the flexibility to capture certain climate responses. Here, we propose a statistically-driven emulator that hinges on an energy balance model. Using Gaussian processes, we formulate an emulator of temperature response that exactly satisfies the physical thermal response equations of an energy balance model. The result is an emulator that (1) enjoys the flexibility of statistical machine learning models and can be fitted against observations to produce spatially-resolved predictions (2) has physically-interpretable parameters that can be used to make inference about the climate system. This model shows skilfull prediction of annual mean global distribution of temperature, even over scenarios outside the range of observed emissions of greenhouse gases and aerosols during training — improvement in RMSE of 27% against the energy balance model and of 60% against a physics-free machine learning model. In addition, the Bayesian nature of our formulation provides a principled way to perform uncertainty quantification on the predictions. We outline how the probabilistic nature of this emulator makes it a natural candidate for detection and attribution studies and discuss extension to a precipitation emulator also incorporating physical constraints. We hope that by combining the ability of machine learning techniques to capture complex patterns with the interpretability of energy balance models, our work will produce a reliable tool for efficiently and accurately exploring future climates.

    @inproceedings{BouSejWat2023egu,
      author = {Bouabid, S. and Sejdinovic, D. and Watson-Parris, D.},
      title = {{{Probabilistic climate emulation with physics-constrained Gaussian processes}}},
      booktitle = {EGU General Assembly},
      year = {2023},
      pages = {EGU23--15660},
      keywords = {nonarch},
      url = {https://meetingorganizer.copernicus.org/EGU23/EGU23-15660.html}
    }
    
  2. J. Lenhardt, J. Quaas, and D. Sejdinovic, From MODIS cloud properties to cloud types using semi-supervised learning, in EGU General Assembly, 2023, EGU23–13250.

    Clouds are classified into types, classes, or regimes. The World Meteorological Organization distinguishes stratus and cumulus clouds and three altitude layers. Cloud types exhibit very different radiative properties and interact in numerous ways with aerosol particles in the atmosphere. However, it has proven difficult to define cloud regimes objectively and from remote sensing data, hindering the understanding we have of the processes and adjustments involved. Building on the method we previously developed, we combine synoptic observations and passive satellite remote-sensing retrievals to constitute a database of cloud types and cloud properties to eventually train a cloud classification algorithm. The cloud type labels come from the global marine meteorological observations dataset (UK Met Office, 2006) which is comprised of near-global synoptic observations. This data record reports back information about cloud type and other meteorological quantities at the surface. The cloud classification model is built on different cloud-top and cloud optical properties (Level 2 products MOD06/MYD06 from the MODIS sensor) extracted temporally close to the observation time and on a 128km x 128km grid around the synoptic observation location. To make full use of the large quantity of remote sensing data available and to investigate the variety in cloud settings, a convolutional variational auto-encoder (VAE) is applied as a dimensionality reduction tool in a first step. Furthermore, such model architecture allows to account for spatial relationships while describing non-linear patterns in the input data. The cloud classification task is subsequently performed drawing on the constructed latent representation of the VAE. Associating information from underneath and above the cloud enables to build a robust model to classify cloud types. For the training we specify a study domain in the Atlantic ocean around the equator and evaluate the method globally. Further experiments and evaluation are done on simulation data produced by the ICON model.

    @inproceedings{LenQuaSej2023egu,
      author = {Lenhardt, J. and Quaas, J. and Sejdinovic, D.},
      title = {{{From MODIS cloud properties to cloud types using semi-supervised learning}}},
      booktitle = {EGU General Assembly},
      year = {2023},
      pages = {EGU23--13250},
      keywords = {nonarch},
      url = {https://meetingorganizer.copernicus.org/EGU23/EGU23-13250.html}
    }
    
  3. S. Bouabid, D. Watson-Parris, and D. Sejdinovic, Bayesian Inference for Aerosol Vertical Profiles, in NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning, 2022.

    Aerosol-cloud interactions constitute the largest source of uncertainty in assessments of the anthropogenic climate change. This uncertainty arises in part from the difficulty in measuring the vertical distributions of aerosols. We often have to settle for less informative vertically aggregated proxies such as aerosol optical depth (AOD). In this work, we develop a framework to infer vertical aerosol profiles using AOD and readily available vertically resolved meteorological predictors such as temperature or relative humidity. We devise a simple Gaussian process prior over aerosol vertical profiles and update it with AOD observations. We validate our approach using ECHAM-HAM aerosol-climate model data. Our results show that, while simple, our model is able to reconstruct realistic extinction profiles with well-calibrated uncertainty. In particular, the model demonstrates a faithful reconstruction of extinction patterns arising from aerosol water uptake in the boundary layer.

    @inproceedings{BouWatSej2022w,
      title = {{Bayesian Inference for Aerosol Vertical Profiles}},
      author = {Bouabid, Shahine and Watson-Parris, Duncan and Sejdinovic, Dino},
      booktitle = {NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning},
      url = {https://www.climatechange.ai/papers/neurips2022/4},
      keywords = {nonarch},
      year = {2022}
    }
    
  4. J. Lenhardt, J. Quaas, and D. Sejdinovic, Combining cloud properties and synoptic observations to predict cloud base height using machine learning, in EGU General Assembly, 2022, EGU22–7355.

    Cloud base height (CBH) is an important geometric parameter of a cloud and shapes its radiative properties. The CBH is also further of practical interest in the aviation community regarding pilot visibility and aircraft icing hazards. While the cloud-top height has been successfully derived from passive imaging radiometers on satellites during recent years, the derivation of the CBH remains a more difficult challenge with these same retrievals. In our study we combine surface observations and passive satellite remote-sensing retrievals to create a database of CBH labels and cloud properties to ultimately train a machine learning model predicting CBH. The labels come from the global marine meteorological observations dataset (UK Met Office, 2006) which consists of near-global synoptic observations made on sea. This data set provides information about CBH, cloud type, cloud cover and other meteorological surface quantities with CBH being the main interest here. The features based upon which the machine learning model is trained consist in different cloud-top and cloud optical properties (Level 2 products MOD06/MYD06 from the MODIS sensor) extracted on a 127km x 127km grid around the synoptic observation point. To study the large diversity in cloud scenes, an auto-encoder architecture is chosen. The regression task is then carried out in the modelled latent space which is output by the encoder part of the model. To account for the spatial relationships in our input data the model architecture is based on Convolutional Neural Networks. We define a study domain in the Atlantic ocean, around the equator. The combination of information from below and over the cloud could allow us to build a robust model to predict CBH and then extend predictions to regions where surface measurements are not available.

    @inproceedings{LenQuaSej2022egu,
      author = {Lenhardt, J. and Quaas, J. and Sejdinovic, D.},
      title = {{{Combining cloud properties and synoptic observations to predict cloud base height using machine learning}}},
      booktitle = {EGU General Assembly},
      year = {2022},
      pages = {EGU22--7355},
      keywords = {nonarch},
      doi = {10.5194/egusphere-egu22-7355},
      url = {https://doi.org/10.5194/egusphere-egu22-7355}
    }
    
  5. S. Stefanovic, S. Bouabid, P. Stier, A. Nenes, and D. Sejdinovic, Reconstructing Aerosols Vertical Profiles with Aggregate Output Learning, in ICML 2021 Workshop on Tackling Climate Change with Machine Learning, 2021.

    Aerosol-cloud interactions constitute the largest source of uncertainty in assessments of anthropogenic climate change. This uncertainty arises in part from the inability to observe aerosol amounts at the cloud formation levels, and, more broadly, the vertical distribution of aerosols. Hence, we often have to settle for less informative two-dimensional proxies, i.e. vertically aggregated data. In this work, we formulate the problem of disaggregation of vertical profiles of aerosols. We propose some initial solutions for such an aggregate output regression problem and demonstrate their potential on climate model data.

    @inproceedings{SteBouStiNenSej2021w,
      author = {Stefanovic, Sofija and Bouabid, Shahine and Stier, Philip and Nenes, Athanasios and Sejdinovic, Dino},
      title = {{{Reconstructing Aerosols Vertical Profiles with Aggregate Output Learning}}},
      booktitle = {ICML 2021 Workshop on Tackling Climate Change with Machine Learning},
      year = {2021},
      arxiv = {https://doi.org/10.31223/X5QW5S},
      keywords = {nonarch}
    }
    
  6. A. Caterini, R. Cornish, D. Sejdinovic, and A. Doucet, Variational Inference with Continuously-Indexed Normalizing Flows, in ICML 2020 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2020.

    Continuously-indexed flows (CIFs) have recently achieved improvements over baseline normalizing flows in a variety of density estimation tasks. In this paper, we adapt CIFs to the task of variational inference (VI) through the framework of auxiliary VI, and demonstrate that the advantages of CIFs over baseline flows can also translate to the VI setting for both sampling from posteriors with complicated topology and performing maximum likelihood estimation in latent-variable models.

    @inproceedings{CatCorSejDou2020w,
      author = {Caterini, Anthony and Cornish, Rob and Sejdinovic, Dino and Doucet, Arnaud},
      title = {{{Variational Inference with Continuously-Indexed Normalizing Flows}}},
      booktitle = {ICML 2020 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models},
      year = {2020},
      arxiv = {https://arxiv.org/abs/2007.05426},
      keywords = {nonarch}
    }
    
  7. D. Watson-Parris, S. Sutherland, M. Christensen, A. Caterini, D. Sejdinovic, and P. Stier, A Large-Scale Analysis of Pockets of Open Cells Enabled by Deep Learning, in American Geophysical Union Fall Meeting Abstracts, 2019, A11L–2769.

    Pockets of Open Cells (POCs) have been a source of interest since they were first described 15 years ago (Bretherton et al., 2004) due to their complex nature (Glassmeier and Feingold 2017) and the importance of stratocumulus decks on the global climate. Indeed, it has been proposed that, by suppressing precipitation, anthropogenic aerosol could significantly reduce the occurrence of POCs and, through the increased cloud fraction, provide a large cooling affect on the Earth (Rosenfeld et al., 2006). To date however, no large-scale analysis of their properties or spatial and temporal prevalence has been performed. Machine learning is transforming many areas of science by providing new tools to analyse and understand the huge volumes of data that observations and models can provide. Climate science, with its wealth of data, is at the cusp of a similar transformation. One particular area where machine learning has made rapid progress is in object detection. Robust techniques are now available which can quickly identify objects in images without any need for the thresholding or edge detection which have been used in the past and often struggles with inhomogeneous features. Using a deep convolutional neural network trained on a small hand-logged dataset we have created a unique and comprehensive dataset of POCs from across the Californian, Peruvian and Namibian stratocumulus decks, spanning the whole lifetime of the MODIS mission. We have detected and analysed 8,491 POCs, quantifying their spatial and temporal distributions, as well as investigating their microphysical properties. POCs show a large, and remarkably consistent, difference in droplet effective radius compared to the surrounding clouds, but negligible difference in liquid water path. Further, we find that the global radiative effect of POCs, and hence the maximum forcing through any aerosol perturbation, is approximately 20mWm-2. Therefore, the potential for strong anthropogenic perturbations appears small.

    @inproceedings{Watetal2019b,
      author = {Watson-Parris, Duncan and Sutherland, Sam and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
      title = {{{A Large-Scale Analysis of Pockets of Open Cells Enabled by Deep Learning}}},
      booktitle = {American Geophysical Union Fall Meeting Abstracts},
      year = {2019},
      pages = {A11L-2769},
      keywords = {nonarch}
    }
    
  8. D. Watson-Parris, S. Sutherland, M. Christensen, A. Caterini, D. Sejdinovic, and P. Stier, Detecting Anthropogenic Cloud Perturbations with Deep Learning, in ICML 2019 Workshop on Climate Change: How Can AI Help?, 2019.

    One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth’s energy balance. Aerosols provide the ‘seeds’ on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.

    @inproceedings{Watetal2019a,
      author = {Watson-Parris, Duncan and Sutherland, Sam and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
      title = {{{Detecting Anthropogenic Cloud Perturbations with Deep Learning}}},
      booktitle = {ICML 2019 Workshop on Climate Change: How Can AI Help?},
      arxiv = {https://arxiv.org/abs/1911.13061},
      url = {https://www.climatechange.ai/papers/icml2019/21},
      video = {https://slideslive.com/38917855/detecting-anthropogenic-cloud-perturbations-with-deep-learning},
      year = {2019},
      keywords = {nonarch}
    }
    
  9. S. Cohen and D. Sejdinovic, On the Gromov-Wasserstein Distance and Coupled Deep Generative Models, in NeurIPS 2019 Workshop on Optimal Transport & Machine Learning, 2019.

    Recent advances in optimal transport enabled to reformulate the problem as an adversarial optimization which results in the computation of Wasserstein distances and the training of coupled deep generative models. However, designing a sensible cost across potentially incomparable spaces is extremely challenging. The Gromov-Wasserstein approach instead considers the intra-relational geometries of the compared measures which alleviates the incomparability issue. In most previous works, the Gromov cost function is a Euclidean metric measuring the discrepancy between pairwise distances in the different spaces, which is highly sensitive to the scale of the different metric spaces. We thus propose the m-Gromov-Wasserstein distance, which enables the introduction of the Hilbert-Schmidt independence criterion (HSIC) as cost function. We show that this formulation is trivially extendable to the task of learning multiple couplings, relies on dependence instead of distance, and has Gromov-Wasserstein as a special case. We then devise a scalable algorithm for computing this distance based on coupled Wasserstein GANs for general choices of cost function and apply it to learning couplings of multiple deep generative models across incomparable spaces dimensionally and intrinsically. We then show that the classic Gromov-Wasserstein approaches may suffer from symmetries within individual metric spaces, and we devise a semi-supervised algorithm to break the symmetries.

    @inproceedings{CohSej2019w,
      author = {Cohen, Samuel and Sejdinovic, Dino},
      title = {{{On the Gromov-Wasserstein Distance and Coupled Deep Generative Models}}},
      booktitle = {NeurIPS 2019 Workshop on Optimal Transport \& Machine Learning},
      year = {2019},
      keywords = {nonarch}
    }
    
  10. V. Nguyen, D. T. Lennon, H. Moon, N. M. van Esbroeck, D. Sejdinovic, M. A. Osborne, G. A. D. Briggs, and N. Ares, Controlling Quantum Dot Devices using Deep Reinforcement Learning, in NeurIPS 2019 Workshop on Deep Reinforcement Learning, 2019.

    A robust qubit implementation will form the building block of quantum computer. Such implementation is typically in a quantum physical device by controlling the electrostatic potential of quantum dots. However, controlling these quantum dot devices can be challenging due to unavoidable device variability. In this paper, we develop an elegant application of deep reinforcement learning for controlling quantum dot devices. Specifically, we present a computer-automated algorithm that controls and sets voltages to the gate electrodes of a gate-defined semiconductor double quantum dot. Our approach requires no human intervention and reduces the amount of measurements. This work alleviates the user effort required to control multiple quantum dot devices, each with multiple gate electrodes.

    @inproceedings{Nguyenetal2019w,
      author = {Nguyen, Vu and Lennon, Dominic T. and Moon, Hyungil and van Esbroeck, Nina M. and Sejdinovic, Dino and Osborne, Michael A. and Briggs, G. Andrew D. and Ares, Natalia},
      title = {{{Controlling Quantum Dot Devices using Deep Reinforcement Learning}}},
      booktitle = {NeurIPS 2019 Workshop on Deep Reinforcement Learning},
      year = {2019},
      keywords = {nonarch}
    }
    
  11. J.-F. Ton, L. Chan, Y. W. Teh, and D. Sejdinovic, Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings, in NeurIPS 2019 Workshop on Meta Learning, 2019.

    Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. estimating conditional expectations in regression. In many applications, however, the conditional distributions cannot be meaningfully summarized solely by expectation (due to e.g. multimodality). We introduce a novel technique for meta-learning conditional densities, which combines neural representation and noise contrastive estimation together with established literature in conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world data, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.

    @inproceedings{Tonetal2019w,
      author = {Ton, Jean-Francois and Chan, Leung and Teh, Yee Whye and Sejdinovic, Dino},
      title = {{{Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
      booktitle = {NeurIPS 2019 Workshop on Meta Learning},
      arxiv = {https://arxiv.org/abs/1906.02236},
      year = {2019},
      keywords = {nonarch}
    }
    
  12. H. C. L. Law, P. Zhao, J. Huang, and D. Sejdinovic, Hyperparameter Learning via Distributional Transfer, in NeurIPS 2018 Workshop on Meta Learning, 2018.

    Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial ‘exploration’ even in cases where potentially similar prior tasks have been solved. We propose to transfer information across tasks using kernel embeddings of distributions of training datasets used in those tasks. The resulting method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.

    @inproceedings{LawZhaHuaSej2018w,
      author = {Law, Ho Chung Leon and Zhao, Peilin and Huang, Junzhou and Sejdinovic, Dino},
      title = {{{Hyperparameter Learning via Distributional Transfer}}},
      booktitle = {NeurIPS 2018 Workshop on Meta Learning},
      arxiv = {https://arxiv.org/abs/1810.06305},
      year = {2018},
      keywords = {nonarch}
    }
    
  13. J. Mitrovic, P. Wirnsberger, C. Blundell, D. Sejdinovic, and Y. W. Teh, Infinitely Deep Infinite-Width Networks, in NeurIPS 2018 Workshop on Bayesian Deep Learning, 2018.

    Infinite-width neural networks have been extensively used to study the theoretical properties underlying the extraordinary empirical success of standard, finite-width neural networks. Nevertheless, until now, infinite-width networks have been limited to at most two hidden layers. To address this shortcoming, we study the initialisation requirements of these networks and show that the main challenge for constructing them is defining the appropriate sampling distributions for the weights. Based on these observations, we propose a principled approach to weight initialisation that correctly accounts for the functional nature of the hidden layer activations and facilitates the construction of arbitrarily many infinite-width layers, thus enabling the construction of arbitrarily deep infinite-width networks. The main idea of our approach is to iteratively reparametrise the hidden-layer activations into appropriately defined reproducing kernel Hilbert spaces and use the canonical way of constructing probability distributions over these spaces for specifying the required weight distributions in a principled way. Furthermore, we examine the practical implications of this construction for standard, finite-width networks. In particular, we derive a novel weight initialisation scheme for standard, finite-width networks that takes into account the structure of the data and information about the task at hand. We demonstrate the effectiveness of this weight initialisation approach on the MNIST, CIFAR-10 and Year Prediction MSD datasets.

    @inproceedings{Mitetal2018w,
      author = {Mitrovic, Jovana and Wirnsberger, Peter and Blundell, Charles and Sejdinovic, Dino and Teh, Yee Whye},
      title = {{{Infinitely Deep Infinite-Width Networks}}},
      booktitle = {NeurIPS 2018 Workshop on Bayesian Deep Learning},
      year = {2018},
      keywords = {nonarch}
    }
    
  14. T. G. J. Rudner and D. Sejdinovic, Inter-Domain Deep Gaussian Processes, in NeurIPS 2017 Workshop on Bayesian Deep Learning, 2017.

    We propose a novel variational inference method for deep Gaussian processes (GPs), which combines doubly stochastic variational inference with variational Fourier features, an inter-domain approach that replaces inducing points-based inference with a framework that harnesses RKHS Fourier features. First experiments have shown that inter-domain deep Gaussian processes are able to achieve levels of predictive performance superior to shallow GPs and alternative deep GP models.

    @inproceedings{RudSej2017w,
      author = {Rudner, Tim G. J. and Sejdinovic, Dino},
      title = {{{Inter-Domain Deep Gaussian Processes}}},
      booktitle = {NeurIPS 2017 Workshop on Bayesian Deep Learning},
      year = {2017},
      keywords = {nonarch},
      url = {http://bayesiandeeplearning.org/2017/papers/68.pdf}
    }
    
  15. H. C. L. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman, Bayesian Approaches to Distribution Regression, in NeurIPS 2017 Workshop: Learning on Distributions, Functions, Graphs and Groups, 2017.

    Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We construct a Bayesian distribution regression formalism that accounts for this uncertainty, improving the robustness and performance of the model when group sizes vary. We can obtain MAP estimates for some models with backpropagation, while the full propagation of uncertainty requires MCMC-based inference. We demonstrate our approach on an illustrative toy dataset as well as a challenging age prediction problem.

    @inproceedings{LawSutSejFla2017w,
      author = {Law, Ho Chung Leon and Sutherland, Danica J. and Sejdinovic, Dino and Flaxman, Seth},
      title = {{{Bayesian Approaches to Distribution Regression}}},
      booktitle = {NeurIPS 2017 Workshop: Learning on Distributions, Functions, Graphs and Groups},
      year = {2017},
      keywords = {nonarch}
    }
    
  16. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Causal Inference via Kernel Deviance Measures, in NeurIPS 2017 Workshop on Causal Inference and Machine Learning for Intelligent Decision Making: From ’What If?’ To ’What Next?,’ 2017.

    Identifying causal relationships among a set of variables is a fundamental problem in many areas of science. In this paper, we present a novel general-purpose causal inference method, Kernel Conditional Deviance for Causal Inference (KCDC), for inferring causal relationships from observational data. In particular, we propose a novel interpretation of the well-established notion of asymmetry between cause and effect. Based on this, we derive an asymmetry measure using the framework of representing conditional distributions in reproducing kernel Hilbert spaces thus providing the basis for causal discovery. We demonstrate the versatility and robustness of our method across several synthetic datasets. Furthermore, we test our method on the real-world benchmark dataset Tübingen Cause-Effect Pairs where it outperforms existing state-of-the-art methods.

    @inproceedings{MitSejTeh2017-kcdcworkshop,
      author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
      title = {{{Causal Inference via Kernel Deviance Measures}}},
      booktitle = {NeurIPS 2017 Workshop on Causal Inference and Machine Learning for Intelligent Decision Making: From 'What If?' To 'What Next?'},
      year = {2017},
      keywords = {nonarch}
    }
    
  17. J. Runge, D. Sejdinovic, and S. Flaxman, Overcoming Autocorrelation Biases for Causal Inference in Large Nonlinear Geoscientific Time Series Datasets, in Geophysical Research Abstracts, 2017, vol. 19, EGU2017–11366.

    Causal discovery methods for geoscientific time series datasets aim at detecting potentially causal statistical associations that cannot be explained by other variables in the dataset. A large-scale complex system like the Earth presents major challenges for methods such as Granger causality. In particular, its high dimensionality and strong autocorrelations lead to low detection power, distorting biases, and unreliable hypothesis tests. Here we introduce a reliable method that outperforms current approaches in detection power and overcomes detection biases, making it suitable to detect even weak causal signals in large-scale geoscientific datasets. We illustrate the method’s capabilities on the global surface-pressure system where we unravel spurious associations and find several potentially causal links that are difficult to detect with standard methods, focusing in particular on drivers of the NAO.

    @inproceedings{RunSejFla2017,
      author = {Runge, J. and Sejdinovic, D. and Flaxman, S.},
      title = {{{Overcoming Autocorrelation Biases for Causal Inference in Large Nonlinear Geoscientific Time Series Datasets}}},
      booktitle = {Geophysical Research Abstracts},
      year = {2017},
      volume = {19},
      pages = {EGU2017--11366},
      keywords = {nonarch}
    }
    
  18. D. Sejdinovic, Kernel Embeddings and Bayesian Quadrature, in Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481), 2017, vol. 6, no. 11, 157.
    @inproceedings{Sej2017a,
      author = {Sejdinovic, Dino},
      title = {{{Kernel Embeddings and Bayesian Quadrature}}},
      booktitle = {Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)},
      editor = {Gretton, Arthur and Hennig, Philipp and Rasmussen, Carl Edward and Sch{\"o}lkopf, Bernhard},
      pages = {157},
      volume = {6},
      number = {11},
      year = {2017},
      keywords = {nonarch}
    }
    
  19. D. Sejdinovic, Connections and Differences between Kernels and GPs, in Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481), 2017, vol. 6, no. 11, 166.
    @inproceedings{Sej2017b,
      author = {Sejdinovic, Dino},
      title = {{{Connections and Differences between Kernels and GPs}}},
      booktitle = {Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)},
      editor = {Gretton, Arthur and Hennig, Philipp and Rasmussen, Carl Edward and Sch{\"o}lkopf, Bernhard},
      pages = {166},
      volume = {6},
      number = {11},
      year = {2017},
      keywords = {nonarch}
    }
    
  20. D. Sejdinovic, Kernel Hypothesis Tests on Dependent Data, in Dagstuhl Reports: Machine Learning with Interdependent and Non-identically Distributed Data (Dagstuhl Seminar 15152), 2015, vol. 5, no. 4, 50–51.
    @inproceedings{Sej2015,
      author = {Sejdinovic, Dino},
      title = {{{Kernel Hypothesis Tests on Dependent Data}}},
      booktitle = {Dagstuhl Reports: Machine Learning with Interdependent and Non-identically Distributed Data (Dagstuhl Seminar 15152)},
      editor = {Darrell, Trevor and Kloft, Marius and Pontil, Massimiliano and R{\"a}tsch, Gunnar and Rodner, Erik},
      pages = {50-51},
      volume = {5},
      number = {4},
      year = {2015},
      keywords = {nonarch}
    }
    
  21. W. Jitkrittum, A. Gretton, N. Heess, S. M. A. Eslami, B. Lakshminarayanan, D. Sejdinovic, and Z. Szabó, Just-In-Time Kernel Regression for Expectation Propagation, in ICML 2015 Workshop on Large-Scale Kernel Learning: Challenges and New Opportunities, 2015.

    We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where it is essential to accurately assess uncertainty and to efficiently and robustly update the message operator.

    @inproceedings{Jitetal2015,
      author = {Jitkrittum, Wittawat and Gretton, Arthur and Heess, Nicolas and Eslami, S. M. Ali and Lakshminarayanan, Balaji and Sejdinovic, Dino and Szab\'{o}, Zolt\'{a}n},
      title = {{{Just-In-Time Kernel Regression for Expectation Propagation}}},
      booktitle = {ICML 2015 Workshop on Large-Scale Kernel Learning: Challenges and New Opportunities},
      url = {https://sites.google.com/site/largescalekernelwsicml15/files/Jitkrittum.pdf?attredirects=0},
      year = {2015},
      keywords = {nonarch}
    }
    
  22. K. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton, Fast Two Sample Tests using Smooth Random Features, in ICML 2015 Workshop on Large-Scale Kernel Learning: Challenges and New Opportunities, 2015.

    We propose a nonparametric two-sample test with cost linear in the number of samples. Our test statistic uses differences in smoothed characteristic functions: these are able to distinguish a larger class of alternatives than the non-smoothed characteristic functions used in previous linear-time tests, while being much faster than the current state-of-the-art tests based on kernels or distances, which are quadratic in the sample size. Experiments on artificial benchmarks and on challenging real life testing problems demonstrate that our test gives a better time/power tradeoff than competing approaches, including sub-quadratic-time variants of the kernel tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable in low order statistics.

    @inproceedings{Chwetal2015,
      author = {Chwialkowski, Kacper and Ramdas, Aaditya and Sejdinovic, Dino and Gretton, Arthur},
      title = {{{Fast Two Sample Tests using Smooth Random Features}}},
      booktitle = {ICML 2015 Workshop on Large-Scale Kernel Learning: Challenges and New Opportunities},
      url = {https://sites.google.com/site/largescalekernelwsicml15/files/Chwialkowski.pdf?attredirects=0},
      year = {2015},
      keywords = {nonarch}
    }
    

Miscellanea

  1. D. Sejdinovic, Dijkstrin algoritam za nalaženje najkraćeg puta, Tangenta (Serbia), no. 59(3), 15–22, 2010.
    @article{Sej2010_Tangenta,
      title = {Dijkstrin algoritam za nala\v{z}enje najkra\'{c}eg puta},
      author = {Sejdinovic, D.},
      journal = {Tangenta (Serbia)},
      number = {59(3)},
      pages = {15--22},
      keywords = {misc},
      year = {2010}
    }
    
  2. D. Sejdinovic and I. Tanovic, O harmonijskom redu i njegovim dijelovima, Osječki matematički list (Croatia), vol. 9, 31–39, 2009.
    @article{SejTan2009_OML,
      title = {O harmonijskom redu i njegovim dijelovima},
      author = {Sejdinovic, D. and Tanovic, I.},
      journal = {Osje\v{c}ki matemati\v{c}ki list (Croatia)},
      volume = {9},
      pages = {31--39},
      keywords = {misc},
      url = {https://hrcak.srce.hr/42992},
      year = {2009}
    }
    
  3. D. Sejdinovic, Algoritam i algoritamska rešlivost, Sigma (FYROM), no. 79, 8–19, 2008.
    @article{Sej2008_Sigma,
      title = {Algoritam i algoritamska re\v{s}livost},
      author = {Sejdinovic, D.},
      journal = {Sigma (FYROM)},
      number = {79},
      pages = {8--19},
      publisher = {Sojuz na dru\v{s}tvata na matemati\v{c}arite i informati\v{c}arite na Republika Makedonija},
      keywords = {misc},
      year = {2008}
    }
    
  4. D. Sejdinovic, Quine: samoreproducirajući kod, Matematičko-fizički list (Croatia), vol. 58, no. 1, 24–26, 2008.
    @article{Sej2008_MFL,
      title = {Quine: samoreproduciraju\'{c}i kod},
      author = {Sejdinovic, D.},
      journal = {Matemati\v{c}ko-fizi\v{c}ki list (Croatia)},
      volume = {58},
      number = {1},
      pages = {24--26},
      keywords = {misc},
      year = {2008}
    }
    
  5. D. Sejdinovic, Mathematics of the Human-Vampire Conflict, Math Horizons, vol. 16, no. 2, 14–15, 2008.
    @article{Sej2008_MH,
      title = {{Mathematics of the Human-Vampire Conflict}},
      author = {Sejdinovic, D.},
      journal = {Math Horizons},
      volume = {16},
      number = {2},
      pages = {14--15},
      year = {2008},
      keywords = {misc},
      publisher = {Mathematical Association of America}
    }
    
  6. D. Sejdinovic, Quine: samoreprodukujući kod, Tangenta (Serbia), no. 46(2), 17–18, 2006.
    @article{Sej2006_Tangenta,
      title = {Quine: samoreprodukuju\'{c}i kod},
      author = {Sejdinovic, D.},
      journal = {Tangenta (Serbia)},
      number = {46(2)},
      pages = {17--18},
      keywords = {misc},
      year = {2006}
    }
    
  7. D. Sejdinovic and A. Kopic, O Oberwolfach problemu, Hrvatski matematički elektronski časopis math.e (Croatia), vol. 7, 2006.
    @article{SejKop2006,
      title = {O {O}berwolfach problemu},
      author = {Sejdinovic, D. and Kopic, A.},
      journal = {Hrvatski matemati\v{c}ki elektronski \v{c}asopis math.e (Croatia)},
      volume = {7},
      keywords = {misc},
      url = {http://e.math.hr/old/oberwolfach/index.html},
      year = {2006}
    }
    
  8. D. Sejdinovic, Eliptičke krivulje u kriptografiji, Osječki matematički list (Croatia), vol. 6, 85–97, 2006.
    @article{Sej2006_OML,
      title = {Elipti\v{c}ke krivulje u kriptografiji},
      author = {Sejdinovic, D.},
      journal = {Osje\v{c}ki matemati\v{c}ki list (Croatia)},
      volume = {6},
      pages = {85--97},
      keywords = {misc},
      url = {https://hrcak.srce.hr/9563},
      year = {2006}
    }
    

Built with jekyll-scholar.