MoroccoAI Annual Conference

Accepted Papers

Deep conformal prediction for robust models

Soundouss Messoudi, Sylvain Rousseau, Sebastien Destercke

Deep networks, like some other learning models, can associate high trust to unreliable predictions. Making these models robust and reliable is therefore essential, especially for critical decisions. This paper shows that the density-based conformal prediction approach brings a convincing solution to this challenge. Conformal prediction consists in predicting a set of classes covering the real class with a user-defined frequency. In the case of atypical examples, the conformal prediction predicts the empty set. Experiments show the good behavior of the conformal approach, especially when handling noisy and outlier examples.

Exploring new Deep Learning architectures for buildings footprints extraction

Bouaaddi Ayoub, Artibi Yasser, Hajji Hicham, Mharzi Alaoui Hicham

Show Abstract ►

Deep learning architectures have proven to be effective in geospatial analysis, especially in image segmentation. However, for conventional building extraction only traditional machine learning methods are used. In this paper, we explore different and newer models, including the U-Net architecture, Attention U-Net, and TransUNet for building detection by using them and assessing their performances in extracting buildings footprints.

Automated aircraft maintenance inspection: Detecting aircraft dents using U-Net, Attention U-NET and TransU-NET

Amzoug Zohra, Essqalli Chaimae, Hajji Hicham, Bouarfa Soufiane

Show Abstract ►

Computer vision and deep learning are frequently used in automated detection and inspection. In the aeronautical field and aircraft inspection, it allows a huge time saving, and improves the accuracy of dents detection which reduces the risk of aircraft accidents. The novelty brought by this work is the use of 3 variants of deep learning architectures ranked the most powerful in segmentation, namely U-NET, Attention U-NET which reflects the effect of attention and TransU-Net characterized by the presence of transformers. Despite the reduced dataset we had, the results are encouraging and demonstrate the power of these architectures in aircraft inspection and dents detection, especially TransU-Net which gave us a precision of 83.18%, 87.27% as IOU and 50.27% in Recall. According to the reached performances, we can admit that transformers proved their strength on our case of study.

Moroccan Dialect -Darija- Open Dataset

Aissam Outchakoucht and Hamza Es-samaali

Show Abstract ►

Darija Open Dataset (DODa) is an open-source project for the Moroccan dialect. With more than 13,000 entries DODa is arguably the largest open-source collaborative project for Darija <=> English translation built for Natural Language Processing purposes. In fact, besides semantic categorization, DODa also adopts a syntactic one, presents words under different spellings, offers verb-to-noun and masculine-to-feminine correspondences, contains the conjugation of hundreds of verbs in different tenses, as well as more that 2500 translated sentences. This data paper presents a description of DODa. This collaborative project is hosted on GitHub and aims to be a standard resource for researchers, students, and anyone who is interested in Moroccan Dialect.

LSTM based models stability in the context of Sentiment Analysis for social media

Bousselham El Haddaoui, Raddouane Chiheb, Rdouan Faizi, and Abdellatif El Afia

Show Abstract ►

Deep learning techniques have proven their effectiveness for Sentiment Analysis (SA) related tasks. Recurrent neural networks (RNN), especially Long Short-Term Memory (LSTM) and Bidirectional LSTM, have become a reference for building accurate predictive models. However, the models complexity and the number of hyperparameters to configure raises several questions related to their stability. In this paper, we present various LSTM models and their key parameters, and we perform experiments to test the stability of these models in the context of Sentiment Analysis.

Classification of Malware Programs Using Opcodes and Non-Negative Matrix Factorization

Ouboti Djaneye-Boundjou

Show Abstract ►

x86 opcodes extracted from disassembly files, which are provided for each malware program in the imbalanced, labeled subset of the BIG 2015 dataset, are used to classify the said malware programs. More specifically, Non-Negative Matrix Factorization (NMF) is utilized to model documents of opcodes representing the malware programs as weighted mixtures of the generated NMF topics. A k Nearest Neighbors model, a Random Forest model, an XGBoost model, and an ensemble of the aforementioned models are each used to classify the malware programs based on NMF topic weight features. The proposed approach is promising as, on an adequately sampled and held-out test dataset, it yields minimums of 98.49% classification accuracy and 97.38% macro F1 score.

Image hiding by dilated inception network

Ismail KICH, El Bachir AMEUR, Youssef TAOUIL

Show Abstract ►

In this paper, a steganographic model hiding a color image into another color image of the same size is presented. The use of a deep auto-encoder network architecture and a loss function focusing on the quality of the stego image are investigated. Experiments on different image databases demonstrate the ability of the proposed architecture to hide one color image within another regardless of their sources and sizes.

Violent Content Dataset for Moroccan Arabic Dialect

Randa Zarnoufi, Walid Bachri, Hamid Jaafar, Mounia Abik

Show Abstract ►

Moroccan Arabic (MA) dialect is a low resource language. To perform any NLP task, we have to develop the necessary resources from scratch. This paper presents our work on the first MA dataset for violent contents detection from user generated text. The dataset will serve to build predictive models of violent contents widely present in social media and thus help to ensure online safety.

Arabic sentiment analysis based on deep reinforcement learning

Mohamed Zouidine, Mohammed Khalil

Show Abstract ►

This paper presents a new deep reinforcement learning based method for Arabic Sentiment Analysis using a policy gradient algorithm. To show the effectiveness of deep reinforcement learning techniques, an RNN-based model was trained with a combination of binary cross-entropy and policy gradient losses. Experiments on Large-Scale Arabic Book Reviews (LABR) dataset show that our method help to improve the performance of the trained model for Arabic Sentiment Analysis.

De-Identification of French Medical Reports of Fetal Echography

Salma El Anigri, Abdelhak Mahmoudi, Saad Slimani, Salaheddine Hounka, Taha Rehah, Youssef Bouyakhf, Mustapha Akiki, El Houssine Bouyakhf

Show Abstract ►

Medical reports record both information concerning the various personal data of the patient as well as his medical consultations, clinical, biological, and radiological examinations. Thus, before processing and/or publicly sharing these medical reports for scientific research purposes, sensitive data must be deleted for legal and ethical considerations. In this article, we present our ongoing rule-based de-identification process of French medical reports of fetal echography.

Developing predictive models from social media analysis as a source of Innovative Loyalty Marketing

Wissal MARHIT

Show Abstract ►

In this paper, a computational predictive model for innovative marketing is presented. The use of Text Mining, NLP and Machine Learning is used to target consumers in a sustainable personalized way giving birth to “LCF One to One Marketing Strategy” innovation in Loyalty Marketing Strategies.

Arabic light POS tagger

Khalid TNAJI, Karim BOUZOUBAA, Lhoussain Aouragh

Show Abstract ►

Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS tagger is a useful preprocessing tool in many natural language processing (NLP) applications. In this paper, we expose a new Arabic POS Tagger based on the combination of two main modules, one using the 1st order Markov model and a decision tree model. The tag set used for this POS is an elementary tag set composed of 4 tags {noun, verb, particle, punctuation} that are sufficient for some NLP applications but with a much greater accuracy and rapidity.

Broken plural extraction using machine learning techniques

Mariame Ouamer, Karim Bouzoubaa, and Rachida Tajmout

Show Abstract ►

This paper attempts to describe our own Arabic broken plural list and handle the problem of broken plural by developing a system to extract the plural and its singular form using several machine learning classifiers. Obtained results show that the Random Forest classifier outperforms the other statistical classifiers with an accuracy of approximately 98%.

Who is the system? On the opacity of algorithms ‘for’ cancer care in Morocco and the ethics of care

Amina Alaoui Soulimani

Show Abstract ►

Digitisating Morocco’s health infrastructure has entailed the adoption of foreign algorithms across hospitals and microbiology laboratories. While cancer diagnosis through artificially intelligent machines is at the heart of contemporary discourses on technology advancement ‘for’ health, the ethics of its deployment and the consent of data usage of cancer patients is not echoed enough, especially as public hospitals are posited within an imaginary that can be devoid of hope for non-middle class populations due to dilapidated infrastructures.