Webinars 2023
Previous webinars
Play Video
Prof. Soufiane Hayou
Assistant Professor
National University of Singapore
Singapore
21
Play Video
Dr. Soundouss Messoudi
Assistant Lecturer and Researcher
Heudiasyc Lab
France
19
Assistant Professor
National University of Singapore
Singapore
21
Assistant Lecturer and Researcher
Heudiasyc Lab
France
19
At the end of the 1600s, Leibniz started thinking about building a machine that would answer legal questions. However, the first systems applying Artificial Intelligence (AI) to the law didn’t appear before the 1970s. One famous example of these systems is TAXMAN. The first law chatbot DoNotPay was launched in 2015. TAXMAN built a formal model of US tax law. DoNotPay is an online chatbot that was first created to help appeal for parking tickets. Since then, various approaches have been proposed to automate some legal tasks. In this talk, we will overview how recent advances in AI and Natural Language Processing (NLP) allow the analysis of large numbers of legal documents:
Much work has been done recently to make neural networks more interpretable, and one approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or L1-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. In this webinar, Mr. Lemhadri will introduce LassoNet, a neural network framework with global feature selection. The proposed approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, the method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks.
Accurate image segmentation is crucial for medical imaging applications, which typically rely on high-quality manual annotations which is tedious and time-consuming for clinical experts. In this talk, Dr. Gridach will present his recent work on Densely Oriented Pooling Network (DOPNet) for capturing variation in feature size and preserving spatial interconnection in medical imaging segmentation. Dr. Gridach will start with a brief computer vision review before leading to his proposed work.
The presence of pollutants in the air has a direct impact on our health and causes detrimental changes to our environment. Air quality monitoring is therefore of paramount importance. The high cost of the acquisition and maintenance of accurate air quality stations implies that only a small number of these stations can be deployed in a country. This presentation is about a low-cost approach to monitor air quality in urban areas. By combining Artificial Intelligence (AI) and Internet of Things (IoT), we can improve the spatial resolution of the air monitoring process, and successfully predict air quality based on readily available data.
Attention mechanisms have improved the performance of NLP tasks while allowing models to remain explainable. Self-attention is currently widely used; however, interpretability is difficult due to the numerous attention distributions. Recent work has shown that model representations can benefit from label-specific information while facilitating interpretation of predictions. We introduce the Label Attention Layer, a new form of self-attention where attention heads represent labels. We test our novel approach by running constituency and dependency parsing experiments on the Penn Treebank (PTB) and the Chinese Treebank (CTB) datasets. The new proposed model achieves state-of-the-art results for both tasks. Moreover, our model requires fewer self-attention layers compared to existing models. Finally, we share our findings that Label Attention heads can learn relations between syntactic categories and show pathways to analyze errors.
Recently, by relying only on the self-attention blocks, the transformer mechanism has taken many AI fields by storm. For example in NLP, several transformer-based architectures were proposed like BERT, GPT 2, GPT3 outperforming classical NLP approaches such RNN and LSTM… Also, in Biology, AlphaFold 2 was proposed as a transformer-based model that better predicts the structures of proteins from their genetic sequences. And more recently many researchers have tried to apply the same transformer recipe to tackle computer visions tasks such as classification, semantic segmentation, object detection…
The aim of this webinar is to give an overview of the Vision Transformer ViT and how it has changed the computer vision landscape by replacing the most famous convolution operator with only self-attention. The webinar will discuss how the proposed architecture succeeded to outperform CNN-based architectures like ResNet by only stacking transformer layers and by considering input images as patches tokens. Considerations such as scalability, complexity, interpretability and other ViT variants will be discussed as well.
Rich-resource languages have plenty of frameworks to consider when developing for language technology purposes. For low-resourced languages, either no frameworks exist such as the Amazigh language or very few components are integrated in known and large frameworks. We present a comparative study of frameworks in order to clarify which ones can handle Arabic suitably and report on best practices to be applied for low-resource languages.
In this project, a Reinforcement Learning (RL) agent is trained to obtain a safe policy thus rendering a risk averse agent that prioritize avoiding worst case scenarios.
The model leverages distributional RL (e.g. Deep Quantile Regression) and optimizes the Conditional Value at Risk (CVaR) thus providing the user with an adjustable level of risk aversion.
Many applications can benefit from this approach, especially in fields where worst case scenarios are inadmissible such as security or medicine.
Plastic debris are one of the most widespread debris contributing to marine pollution, it threatens food safety and quality, human health, coastal tourism and contributes to climate changes. Remote sensing has shown great effectiveness in locating this type of debris. By leveraging AI/ML and hydrodynamic ocean models, we will demonstrate how to detect, quantify and track plastic marine debris in the marine environment.
The problem of human motion (face and body) prediction and generation is at the core of many applications in computer vision and robotics, such as human-robot interaction, autonomous driving and computer graphics. In this talk I will present some of our recent achievements addressing these specific aspects: 1) generating videos of the facial expressions given a neutral face image, 2) dynamic 3D expression generation from an expression label, 3) Human motion prediction and generation of 3D skeleton. We model the temporal evolution of the 3D human motion and face expression as trajectory, what allows us to map human motions to single points on a sphere manifold. We propose a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of facial expression and human motion through different losses. Our solutions score best on diverse benchmarks.
How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam’s razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. Then, we highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
Deep Neural networks are capable of integrating millions of examples, a key asset when it comes to dealing with incredibly complex systems like the immune system. AI offers a unique possibility to accelerate discoveries in biology by augmenting the capabilities of researchers like never before. In this presentation I will present various examples of such synergies, by introducing works on single cell representation, zoonotic transition, and prediction of markers of infection.
How can you show what a Machine Learning model does once it’s trained?
In this talk, you are going to learn how to create Machine Learning apps and demos using Streamlit and Gradio, Python libraries for this purpose. Additionally, you’ll see how to share them with the rest of the Open Source ecosystem. Learning to create graphic interfaces for models is extremely useful for sharing with other people interesting them.
What is required to follow it?
👉 Basic knowledge of Python
👉 Conceptual knowledge of ML
👉 A google Colab account
👉 A Hugging Face Hub account
Although machine learning and artificial intelligence have had many successes in the past years, many state-of-the-art methods still fail drastically in some real-life applications. One of the main reasons for that is the noisy and corrupted nature of real-life data. This failure phenomenon is what is typically known as overfitting.
In this talk, we study precisely what are the sources of overfitting in machine problems. The goal is to define formally what robustness properties are desired in machine learning algorithms to overcome this phenomenon and lead to stronger performance. We identify three overfitting sources naturally present in any dataset: (i) statistical error, as a result of working with finite sample data, (ii) noise, which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted. We show that existing machine learning formulations, such as LASSO and Ridge, are typically robust against one of these sources in isolation but do not provide protection against all overfitting sources simultaneously.
We design a novel machine learning formulation, which guarantees protection against all these sources of overfitting. We further show that this novel formulation provides “optimal” robustness. We finally show applications of our formulation to neural networks and show in experiments that the resulting novel robust neural networks considerably outperform state-of-art robust deep learning approaches.
Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision uni-modal supervised learning.
The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness.
Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community.
In this talk, I will present LAION-5B, a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, and share our recent findings on training contrastive language-image models at a large scale, and how they perform on downstream tasks.
Language models trained on code have demonstrated remarkable code completion and synthesis abilities from natural language descriptions. However, the best-performing models are not publicly available, and critical information about their datasets, licensing, and evaluation is missing. BigCode project aims to build code models in an open and responsible approach with support from the ML community. We are working on addressing challenges related to dataset composition, model architecture, inference techniques, and evaluation. To this end, we released The Stack – the largest permissively licensed open-source dataset, with over 350 programming languages and an opt-out mechanism. We also developed SantaCoder – a 1.1B multilingual code model that outperforms larger open-source models in left-to-right code generation and infilling.
Uncertainty quantification is not an easy task. Its difficulty depends on various factors related to the available data, the application domain, and also the learned task. Having multiple outputs to predict simultaneously can be even more demanding, principally when these outputs are correlated.
This presentation focuses on producing confidence regions for such complex problems, mainly multi-target regression, by using conformal prediction: a theoretically proven method that can be added to any Machine Learning model to generate set predictions whose size and statistical guarantee depend on a user-defined error rate.
Explainable AI has gained a lot of attention from both legislators and the scientific community. There are many advantages of being able to explain the reasoning behind the decisions a model makes, top among them are fairness, accountability, and causality. More and more, explainability is used to improve both human and machine decision-making in a mutually reinforcing loop. This session specifically focuses on post hoc local explainability for transformer-based NLP through a practical industrial project : the oncall assistant. It first details the general and scientific methods for explaining BERT and then explores challenges for a real-world implementation.
Neural networks have achieved impressive performance in many applications such as image recognition and generation, and speech recognition. State-of-the-art performance is usually achieved via a series of engineered modifications to existing neural architectures and their training procedures. However, a common feature of these systems is their large-scale nature. Indeed, modern neural networks usually contain millions – if not billions – of trainable parameters, and empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts the model performance. However, given a neural network model, it is not straightforward to address the crucial question `how do we scale the network?’. In this talk, I will discuss certain properties of large-scale residual neural networks and show how we can leverage different mathematical results to build robust residual networks with empirically confirmed benefits.
High Mountain Asia supplies freshwater to over one billion people via Asia’s largest rivers. In this area, rain and snowfall are the main drivers of river flow. However, the spatiotemporal distribution of precipitation is still poorly understood due to limited direct measurements from weather stations. Existing tools to fill in missing data or improve the resolution of coarser precipitation products produce biased results. In this talk, I will propose a method to generate more accurate high-resolution precipitation predictions over areas with sparse in situ data, called Multi-Fidelity Gaussian Processes (MFGPs). MFGP can combine multiple precipitation sources to increase the accuracy of precipitation estimates while providing principled uncertainties. This method can also make predictions in ungauged locations, away from the high-fidelity training distribution. Finally, MFGPs are simpler to implement and more applicable to small datasets than state-of-the-art machine learning models.
How can we acquire world models that veridically represent the outside world both in terms of what is there and in terms of how our actions affect it?
Can we state mathematical desiderata for their relationship with a posited reality existing outside our brains?
As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study these problems using tools from representation learning and group theory.
We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it.
We present the Homomorphism AutoEncoder, an autoencoder equipped with a learnable group representation acting on its latent space, trained using an equivariance-derived loss to enforce a suitable “homomorphism” property on the group representation.
The success of modern-day machine learning models can be attributed to algorithms and systems that effectively leverage the large amounts of available distributed data. In fact, very large models cannot be trained nor stored on a single machine anymore, and it is hard to collect the required data centrally. This explains the recent popularity of distributed optimization algorithms and paradigms like Federated Learning. Unfortunately, the gains achieved using distributed methods come with fundamental trade-offs with important desiderata: robustness and privacy. Specifically, we overview the algorithmic and theoretical advances in making distributed machine learning methods robust to adversarial participants and poisoned data, as well as ensuring that they preserve the privacy of the participants’ data, and the resulting fundamental trade-offs.
In this talk, we will delve into the asymptotic study of simple linear generative models when both the sample size and data dimension grow to infinity. In this high-dimensional regime, random matrix theory (RMT) appears to be a natural tool to assess the model’s performance by examining its asymptotic learned conditional probabilities, its associated fluctuations, and the model’s generalization error. This analytical approach not only enhances our comprehension of generative language models but might also offer novel insights into their refinement through the lens of high-dimensional statistics and RMT.
In this article, we propose a new quantization technique called Half-Quadratic Quantization (HQQ). Our approach, requiring no calibration data, significantly speeds up the quantization of large models, while offering compression quality competitive with that of calibration-based methods. For instance, HQQ takes less than 5 minutes to process the colossal Llama-2-70B, that’s over 50x faster compared to the widely adopted GPTQ. Our Llama-2-70B quantized to 2-bit outperforms the full-precision Llama-2-13B by a large margin for a comparable memory usage.