Data Scraping of Africa Calls for Tenders
Fintech (Financial Technology) is the term used to augment, streamline, digitize, and disrupt traditional financial services. Some of Fintech use cases include risk assessment, fraud detection, credit scoring, loan default prediction, chatbot assistance, and contracts management. Data is driving Fintech and the most weighty piece can be found in an unstructured format, such as documents, websites, and forums. Financial professionals spend a substantial amount of time reading analysts reports and financial press. Natural Language Processing (NLP) coupled with Machine Learning (ML) can significantly help decrease the amount of manual routine work.
Morocco has been engaged for several years in a dematerialized process aimed to enable freedom of access to public data, an equal treatment of competitors, and guarantees competitors' rights and transparency. Consequently, Morocco emerges as the third largest Fintech hub in the Arab World and among the top key markets in Africa [1, 2].
MoroccoAI is organizing the 2nd edition of its Data Challenge on smart data compilation for African Calls for Tenders. Tenders are procedures applied to generate offers from companies competing for work and service contracts in the framework of Public Procurement Contracts (PPCs). PPCs play a significant role in contract management use cases, which affect many sectors in the economy of every country (construction, public works, energy, telecommunications, etc.)
Main Objectives
The main objectives of this challenge are:
- Encourage data scientists to explore realistic scenarios in the everyday's life of data scientists. Indeed, data gathering, preparation and preprocessing are usually underestimated tasks.
- Understand the regulations behind scraping and gathering data.
- Leverage the outstanding advances in NLP and ML.
- Raise awareness and usefulness of open data.
- Meet the opportunities offered by contract management and understand the opportunities the Fintech industry is offering.
Challenge
Candidates are asked to collect and structure textual data of public procurement contracts of the current year (Start_date is in 2022). These could be gathered from public sources from African official portals like Morocco [3], Kenya [4] and Tunisia [5]. The scraped data should be structured according to the following schema (Tender_id, Language, Country, City, Portal_url, Reference_id, Category, Subject, Detail, Keywords, Start_date, End_date, End_time, Award_date, Public_buyer, Supplier, Value, Currency). For more details, see examples in the challenge's full description.
Rules of Participation
- Multiple individuals or entities may collaborate as a team. You may not participate on more than one Team. Each Team member must be a single individual;
- Team membership may not exceed the Maximum of 3 members;
- Freely & publicly available external data is allowed;
- Participants must be aware of the terms of services, rules and licenses (for example CC 4.0 [6]) of the portals and respect scraping best practices;
- Use of open source codes is allowed but should be acknowledged and their license respected;
- All the members of the participating team agree that they must share their winning solution's code with the challenge organizers for reproducibility of result and evaluation. The organizers do not claim to own your code and data;
- MoroccoAI is not responsible for any abuse of the rules, use of illegal scraping methods or use of unauthorized sources;
- Registered participants will be invited to an information session to explain the challenge in more details;
- Don't cheat and have fun
Evaluation
The Jury of the Data-challenge will evaluate participants according to the following criterias:- Quality and amount of scrapped data
- The adopted scraping approach (innovative or use of existing codes, etc…)
- Presentation of the work
- Bonus points for providing an additional notebook with exploratory data analysis
Submission
This is a Data Collection Competition. The participants are required to submit a google drive link that contains:- A submission csv file submission.csv (max size 10Mb). The csv file must follow this template
- A zip file named code.zip that contains the scraping code. It is recommended to join a Jupyter notebook main.ipynb (Python 3) that can be used to test the scraping.
- A presentation.pdf file that contains a presentation of the solution of the challenge.
- Any other document that is considered relevant to the evaluation process
Prizes
Special prizes will be announced very soon. In addition, Nvidia is offering credits for training courses and certifications by Nvidia Deep Learning Institute (DLI).
- The prizes will be awarded on the basis of the Jury evaluation.
- Due to logistics considerations, the participants need to be residents in Morocco to be eligible for the prize !
Acknowledgements
We thank the conference sponsors for supporting this challenge and NVIDIA for offering DLI credits.