Objectives

General objectives:

Development of a software toolkit for processing and linguistic analysis of the Romanian language.
Creation of annotated datasets for processing and linguistic analysis of the Romanian language.

Specific objectives:

Development of a software service for detecting incomplete sentences in Romanian texts.
Development of software services for detecting satire, sarcasm, and irony in Romanian texts.
Development of a software service for identifying the author of a Romanian text, applicable to texts with a minimum of 300 words in journalistic and blog styles.
Development of a software service for geolocating toponyms extracted through NER (Named Entity Recognition) from Romanian texts.
Development of a software service for classifying numerals in texts, specifically in exclusively literal transcriptions in Romanian, for temporal formats, numbers, and codes.
Development of a multidomain paraphrasing software service for the Romanian language.
Development of a software service for readability analysis in Romanian texts.
Integration of all services into a demonstrator software platform.
Development of a software service for lip reading in Romanian, usable in live scenarios (with reasonable delay) or with audio-video files.
Creation, use, and release of an annotated audio-video dataset for lip reading.
Creation of annotated text datasets for satire, sarcasm, and irony in Romanian.
Creation of annotated audio datasets for speech, including regionalisms and corresponding accents from Muntenia, Moldova, Transylvania, Oltenia, Criș, Maramureș, Bucovina, and Banat, as well as for business and legal jargon, rare neologisms, and loanwords.
Creation of an annotated text dataset for correcting geolocations of toponyms extracted through NER in Romanian texts.
Creation of an annotated text dataset for author recognition, using two categories of style - journalistic and blog - in Romanian.