At the end of the 1600s, Leibniz started thinking about building a machine that would answer legal questions. However, the first systems applying Artificial Intelligence (AI) to the law didn’t appear before the 1970s. One famous example of these systems is TAXMAN. The first law chatbot DoNotPay was launched in 2015. TAXMAN built a formal model of US tax law. DoNotPay is an online chatbot that was first created to help appeal for parking tickets. Since then, various approaches have been proposed to automate some legal tasks. In this talk, we will overview how recent advances in AI and Natural Language Processing (NLP) allow the analysis of large numbers of legal documents:

Extraction of interesting entities/data from judgments to construct networks of lawyers and judgments. The network of lawyers is then used to compute metrics to rank lawyers based on their experience, wins/loss ratio, and their importance.
Automatic compliance checking of privacy policies against the GDPR.

ABSTRACT

Bone age determination on medical images is a challenging yet very common task. In this talk, we will first showcase a deep learning network developed to determine Risser stage (Bone age classification method) from pelvic radiographs, achieving similar accuracy to expert readers. We will then take a critical look and expand the conversation to address challenges in AI for medicine and how to best address them.

ABSTRACT

Much work has been done recently to make neural networks more interpretable, and one approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or L1-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. In this webinar, Mr. Lemhadri will introduce LassoNet, a neural network framework with global feature selection. The proposed approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, the method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks.

ABSTRACT

Accurate image segmentation is crucial for medical imaging applications, which typically rely on high-quality manual annotations which is tedious and time-consuming for clinical experts. In this talk, Dr. Gridach will present his recent work on Densely Oriented Pooling Network (DOPNet) for capturing variation in feature size and preserving spatial interconnection in medical imaging segmentation. Dr. Gridach will start with a brief computer vision review before leading to his proposed work.

ABSTRACT

The presence of pollutants in the air has a direct impact on our health and causes detrimental changes to our environment. Air quality monitoring is therefore of paramount importance. The high cost of the acquisition and maintenance of accurate air quality stations implies that only a small number of these stations can be deployed in a country. This presentation is about a low-cost approach to monitor air quality in urban areas. By combining Artificial Intelligence (AI) and Internet of Things (IoT), we can improve the spatial resolution of the air monitoring process, and successfully predict air quality based on readily available data.

ABSTRACT

Attention mechanisms have improved the performance of NLP tasks while allowing models to remain explainable. Self-attention is currently widely used; however, interpretability is difficult due to the numerous attention distributions. Recent work has shown that model representations can benefit from label-specific information while facilitating interpretation of predictions. We introduce the Label Attention Layer, a new form of self-attention where attention heads represent labels. We test our novel approach by running constituency and dependency parsing experiments on the Penn Treebank (PTB) and the Chinese Treebank (CTB) datasets. The new proposed model achieves state-of-the-art results for both tasks. Moreover, our model requires fewer self-attention layers compared to existing models. Finally, we share our findings that Label Attention heads can learn relations between syntactic categories and show pathways to analyze errors.

ABSTRACT

Recently, by relying only on the self-attention blocks, the transformer mechanism has taken many AI fields by storm. For example in NLP, several transformer-based architectures were proposed like BERT, GPT 2, GPT3 outperforming classical NLP approaches such RNN and LSTM… Also, in Biology, AlphaFold 2 was proposed as a transformer-based model that better predicts the structures of proteins from their genetic sequences. And more recently many researchers have tried to apply the same transformer recipe to tackle computer visions tasks such as classification, semantic segmentation, object detection…

The aim of this webinar is to give an overview of the Vision Transformer ViT and how it has changed the computer vision landscape by replacing the most famous convolution operator with only self-attention. The webinar will discuss how the proposed architecture succeeded to outperform CNN-based architectures like ResNet by only stacking transformer layers and by considering input images as patches tokens. Considerations such as scalability, complexity, interpretability and other ViT variants will be discussed as well.

ABSTRACT

Rich-resource languages have plenty of frameworks to consider when developing for language technology purposes. For low-resourced languages, either no frameworks exist such as the Amazigh language or very few components are integrated in known and large frameworks. We present a comparative study of frameworks in order to clarify which ones can handle Arabic suitably and report on best practices to be applied for low-resource languages.

ABSTRACT

In this project, a Reinforcement Learning (RL) agent is trained to obtain a safe policy thus rendering a risk averse agent that prioritize avoiding worst case scenarios.

The model leverages distributional RL (e.g. Deep Quantile Regression) and optimizes the Conditional Value at Risk (CVaR) thus providing the user with an adjustable level of risk aversion.

Many applications can benefit from this approach, especially in fields where worst case scenarios are inadmissible such as security or medicine.

ABSTRACT

Plastic debris are one of the most widespread debris contributing to marine pollution, it threatens food safety and quality, human health, coastal tourism and contributes to climate changes. Remote sensing has shown great effectiveness in locating this type of debris. By leveraging AI/ML and hydrodynamic ocean models, we will demonstrate how to detect, quantify and track plastic marine debris in the marine environment.

ABSTRACT

The problem of human motion (face and body) prediction and generation is at the core of many applications in computer vision and robotics, such as human-robot interaction, autonomous driving and computer graphics. In this talk I will present some of our recent achievements addressing these specific aspects: 1) generating videos of the facial expressions given a neutral face image, 2) dynamic 3D expression generation from an expression label, 3) Human motion prediction and generation of 3D skeleton. We model the temporal evolution of the 3D human motion and face expression as trajectory, what allows us to map human motions to single points on a sphere manifold. We propose a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of facial expression and human motion through different losses. Our solutions score best on diverse benchmarks.

ABSTRACT

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam’s razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. Then, we highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

ABSTRACT

Deep Neural networks are capable of integrating millions of examples, a key asset when it comes to dealing with incredibly complex systems like the immune system. AI offers a unique possibility to accelerate discoveries in biology by augmenting the capabilities of researchers like never before. In this presentation I will present various examples of such synergies, by introducing works on single cell representation, zoonotic transition, and prediction of markers of infection.

ABSTRACT

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages.

We created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.

We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system.

ABSTRACT

How can you show what a Machine Learning model does once it’s trained?

In this talk, you are going to learn how to create Machine Learning apps and demos using Streamlit and Gradio, Python libraries for this purpose. Additionally, you’ll see how to share them with the rest of the Open Source ecosystem. Learning to create graphic interfaces for models is extremely useful for sharing with other people interesting them.

What is required to follow it?
👉 Basic knowledge of Python
👉 Conceptual knowledge of ML
👉 A google Colab account
👉 A Hugging Face Hub account

ABSTRACT

Although machine learning and artificial intelligence have had many successes in the past years, many state-of-the-art methods still fail drastically in some real-life applications. One of the main reasons for that is the noisy and corrupted nature of real-life data. This failure phenomenon is what is typically known as overfitting.

In this talk, we study precisely what are the sources of overfitting in machine problems. The goal is to define formally what robustness properties are desired in machine learning algorithms to overcome this phenomenon and lead to stronger performance. We identify three overfitting sources naturally present in any dataset: (i) statistical error, as a result of working with finite sample data, (ii) noise, which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted. We show that existing machine learning formulations, such as LASSO and Ridge, are typically robust against one of these sources in isolation but do not provide protection against all overfitting sources simultaneously.

We design a novel machine learning formulation, which guarantees protection against all these sources of overfitting. We further show that this novel formulation provides “optimal” robustness. We finally show applications of our formulation to neural networks and show in experiments that the resulting novel robust neural networks considerably outperform state-of-art robust deep learning approaches.

ABSTRACT

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision uni-modal supervised learning.
The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness.
Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community.
In this talk, I will present LAION-5B, a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, and share our recent findings on training contrastive language-image models at a large scale, and how they perform on downstream tasks.

ABSTRACT

Language models trained on code have demonstrated remarkable code completion and synthesis abilities from natural language descriptions. However, the best-performing models are not publicly available, and critical information about their datasets, licensing, and evaluation is missing. BigCode project aims to build code models in an open and responsible approach with support from the ML community. We are working on addressing challenges related to dataset composition, model architecture, inference techniques, and evaluation. To this end, we released The Stack – the largest permissively licensed open-source dataset, with over 350 programming languages and an opt-out mechanism. We also developed SantaCoder – a 1.1B multilingual code model that outperforms larger open-source models in left-to-right code generation and infilling.

ABSTRACT

Uncertainty quantification is not an easy task. Its difficulty depends on various factors related to the available data, the application domain, and also the learned task. Having multiple outputs to predict simultaneously can be even more demanding, principally when these outputs are correlated.
This presentation focuses on producing confidence regions for such complex problems, mainly multi-target regression, by using conformal prediction: a theoretically proven method that can be added to any Machine Learning model to generate set predictions whose size and statistical guarantee depend on a user-defined error rate.

ABSTRACT

Explainable AI has gained a lot of attention from both legislators and the scientific community. There are many advantages of being able to explain the reasoning behind the decisions a model makes, top among them are fairness, accountability, and causality. More and more, explainability is used to improve both human and machine decision-making in a mutually reinforcing loop. This session specifically focuses on post hoc local explainability for transformer-based NLP through a practical industrial project : the oncall assistant. It first details the general and scientific methods for explaining BERT and then explores challenges for a real-world implementation.

ABSTRACT

Neural networks have achieved impressive performance in many applications such as image recognition and generation, and speech recognition. State-of-the-art performance is usually achieved via a series of engineered modifications to existing neural architectures and their training procedures. However, a common feature of these systems is their large-scale nature. Indeed, modern neural networks usually contain millions – if not billions – of trainable parameters, and empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts the model performance. However, given a neural network model, it is not straightforward to address the crucial question `how do we scale the network?’. In this talk, I will discuss certain properties of large-scale residual neural networks and show how we can leverage different mathematical results to build robust residual networks with empirically confirmed benefits.

ABSTRACT

High Mountain Asia supplies freshwater to over one billion people via Asia’s largest rivers. In this area, rain and snowfall are the main drivers of river flow. However, the spatiotemporal distribution of precipitation is still poorly understood due to limited direct measurements from weather stations. Existing tools to fill in missing data or improve the resolution of coarser precipitation products produce biased results. In this talk, I will propose a method to generate more accurate high-resolution precipitation predictions over areas with sparse in situ data, called Multi-Fidelity Gaussian Processes (MFGPs). MFGP can combine multiple precipitation sources to increase the accuracy of precipitation estimates while providing principled uncertainties. This method can also make predictions in ungauged locations, away from the high-fidelity training distribution. Finally, MFGPs are simpler to implement and more applicable to small datasets than state-of-the-art machine learning models.

ABSTRACT

How can we acquire world models that veridically represent the outside world both in terms of what is there and in terms of how our actions affect it?
Can we state mathematical desiderata for their relationship with a posited reality existing outside our brains?
As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study these problems using tools from representation learning and group theory.
We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it.
We present the Homomorphism AutoEncoder, an autoencoder equipped with a learnable group representation acting on its latent space, trained using an equivariance-derived loss to enforce a suitable “homomorphism” property on the group representation.

ABSTRACT

The success of modern-day machine learning models can be attributed to algorithms and systems that effectively leverage the large amounts of available distributed data. In fact, very large models cannot be trained nor stored on a single machine anymore, and it is hard to collect the required data centrally. This explains the recent popularity of distributed optimization algorithms and paradigms like Federated Learning. Unfortunately, the gains achieved using distributed methods come with fundamental trade-offs with important desiderata: robustness and privacy. Specifically, we overview the algorithmic and theoretical advances in making distributed machine learning methods robust to adversarial participants and poisoned data, as well as ensuring that they preserve the privacy of the participants’ data, and the resulting fundamental trade-offs.

ABSTRACT

In this talk, we will delve into the asymptotic study of simple linear generative models when both the sample size and data dimension grow to infinity. In this high-dimensional regime, random matrix theory (RMT) appears to be a natural tool to assess the model’s performance by examining its asymptotic learned conditional probabilities, its associated fluctuations, and the model’s generalization error. This analytical approach not only enhances our comprehension of generative language models but might also offer novel insights into their refinement through the lens of high-dimensional statistics and RMT.

ABSTRACT

Large Language Models (LLMs) have revolutionized various subfields of machine learning like natural language processing, speech recognition and computer vision, enabling machines to understand and generate outputs with unprecedented accuracy and fluency. However, one of the most critical challenges in deploying LLMs is their expensive memory requirements, for both training and inference. Quantization methods such as bitsandbytes, GPTQ and AWQ have made it possible to use large models such as the popular Llama-2 with significantly less memory, enabling the machine learning community to conduct remarkable research using a single consumer-grade GPU.

In this article, we propose a new quantization technique called Half-Quadratic Quantization (HQQ). Our approach, requiring no calibration data, significantly speeds up the quantization of large models, while offering compression quality competitive with that of calibration-based methods. For instance, HQQ takes less than 5 minutes to process the colossal Llama-2-70B, that’s over 50x faster compared to the widely adopted GPTQ. Our Llama-2-70B quantized to 2-bit outperforms the full-precision Llama-2-13B by a large margin for a comparable memory usage.

ABSTRACT

The webinar will focus on a theoretical approach to Graph Neural Networks’ adversarial robustness to derive provably robust defense methodologies.

Specifically, in this talk, we present an upper-bound on the expected adversarial robustness of Graph Convolutional Networks (GCNs) when subject to both structural and node-feature based adversarial attacks. Building on these findings, we connect the expected robustness of GNNs to the orthogonality of their weight matrices and consequently propose an attack-independent, more robust variant of the GCN, denoted Graph Convolutional Orthogonal Robust Networks (GCORNs). We further introduce a probabilistic method to estimate the expected robustness, which allows us to evaluate the effectiveness of GCORN on different datasets. Experimental results show that GCORN outperforms available adversarial defense approaches on benchmark datasets.

The presented work will be mainly based on the recent paper “Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks” which was accepted at the Twelfth International Conference on Learning Representations (ICLR 2024).

Welcome to login system

Password

Remember Me

Forget Password?