Automatic search, analysis and classification of highly harmful web content (APAKT)

Artificial intelligence aids in moderating online content by quickly and effectively classifying harmful materials and supporting moderators in their daily work

Page image
Project logo

Challenge

The everyday job of the moderators at Dyżurnet.pl, a unit operating within the structure of NASK PIB, is to verify illegal content reported by users or algorithms, including materials containing child sexual abuse (CSAM). This task is both crucial and challenging. On the one hand, the goal is to maximize effectiveness and provide broad protection to potential recipients from such content. On the other hand, it is essential to protect the moderators, who are exposed to these materials for many hours each day.

In the APAKT project, together with the Warsaw University of Technology, we are developing a system to assist moderators through automatic detection and preliminary classification of suspicious materials. The system will also propose the order of reports to prioritize those requiring immediate intervention (as they may be potentially more harmful).

Potential clients for the APAKT program may include internet service providers, large portals, the police, forensic experts, and foreign institutions involved in removing pedophilic content from the internet. APAKT is capable of detecting pedophilia in videos, images, and texts.
Currently, it only supports the Polish language, but the models used, such as RoBERTa and StyloMetrix vectors, are available in English and Ukrainian as well.

The project is funded by a grant awarded by the National Centre for Research and Development.

Section image
Page description secondary image
Project leader
Prof. Andrzej Pacut

What we did

To date, we have developed a detailed project concept, including business requirements, diagrams and schematics reflecting the project objectives:

  • We have built a research environment with a data repository.
  • We have developed the legal framework necessary due to the sensitivity of the content to be analyzed.
  • We developed material class definitions related to CSAM materials and annotations.
  • We conducted a psychological workshop for all project team members.
    We collected and classified neutral materials.
  • We collected materials depicting sexual abuse of minors (CSAM), and compiled data obtained from the National Prosecutor’s Office.
  • We completed scientific tasks in the fields of biometrics and machine intelligence, as well as machine learning in text analysis.