SemEval2025 shared task on "Multilingual Characterization and Extraction of Narratives from Online News"

The internet has opened vast possibilities to easily create direct communication channels between information producers and consumers, potentially leaving the latter exposed to deceptive content and attempts at manipulation. Huge audiences can be affected online, and major crisis events are continuously subjected to the spread of harmful disinformation and propaganda.

In order to foster research and development of novel analytical functionalities to support end-users in analysing the news ecosystem and characterizing manipulation attempts, we launch in the frame of SemEval 2025 campaign a Task on Multilingual Characterization and Extraction of Narratives from Online News.

In particular, the task focuses on the automatic identification of narratives, their classification and identifying the roles of the relevant entities involved. These specific analytical dimensions are of paramount importance for facilitating the work of analysts studying target-specific disinformation phenomena.

We offer three subtasks on news articles: Entity Framing, Narrative Classification and Narrative Extraction, in five languages: Bulgarian, English, Hindi, (European) Portuguese, and Russian. The participants may take part in any number of subtask-language pairs (even just one), and may train their systems using the data for all languages (in a multilingual setup).

The task builts on top of the prior tasks on media analysis that focused on persuasion techniques, framing dimensions and news genre organized as part of SemEval 2023, SemEval 2020, and CheckThat! 2024 Lab campaigns, and the task on detecting entity roles in memes organized as part of the CheckThat! 2024 Lab .

Technical Description

The task covers news articles from two domains, namely, Ukraine-Russia War and Climate Change, and is subdivided into three subtasks

Subtask 1: Entity Framing

Definition: given a news article and a list of mentions of named entities (NEs) in the article, assign for each such mention one or more roles using a predefined taxonomy of fine-grained roles covering three main type of roles: protagonists, antagonists, and innocent. This is a multi-label multi-class text-span classification task.

Subtask 2: Narrative Classification

Definition: given a news article and a two-level taxonomy of narrative labels (where each narrative is subdivided into subnarratives) from a particular domain, assign to the article all the appropriate subnarrative labels. This is a multi-label multi-class document classification task.

Subtask 3: Narrative Extraction

Definition: given a news article and a dominant narrative of the text of this article, generate a free-text explanation (up to max. of 80 words) supporting the choice of this dominant narrative. The to-be-generated explanation should be grounded in the text fragments that provide evidence of the claims of the dominant narrative. This is a text-to-text generation task.

Example

Below is an example of a news article focusing on climate change topics. The key entities and main claim-related text fragments are marked and underlined respectivelly.

Met Office Should Put 2.5°C ‘Uncertainties’ Warning on All Future Temperature Claims

It is “abundantly clear” that the Met Office cannot scientifically claim to know the cur- rent average temperature of the U.K. to a hundredth of a degree centigrade, given that it is using data that has a margin of error of up to 2.5°C, notes the climate journalist Paul Homewood. His comments follow recent disclosures in the Daily Sceptic that nearly eight out of ten of the Met’s 380 measuring stations come with official ‘uncertainties’ of between 2-5°C. In addition, given the poor siting of the stations now and possibly in the past, the Met Office has no means of knowing whether it is comparing like with like when it publishes temperature trends going back to 1884.

There are five classes of measuring stations identified by the World Meteorological Office (WMO). Classes 4 and 5 come with uncertainties of 2°C and 5°C respectively and account for an astonishing 77% of the Met Office station total. Class 3 has an uncertainty rating of 1°C and accounts for another 8.4% of the total. The Class ratings identify potential corruptions in recordings caused by both human and natural involvement. Homewood calculates that the average uncertainty across the entire database is 2.5°C. In the graph below, he then calculates the range of annual U.K. temperatures going back to 2010 incorporating the margins of error.

The blue blocks show the annual temperature announced by the Met Office, while the red bars take account of the WMO uncertainties. It is highly unlikely that the red bars show the more accurate temperature, and there is much evidence to suggest temperatures are nearer the blue trend. But the point of the exercise is to note that the Met Office, in the interests of scientific exactitude, should disclose what could be large measurement inac- curacies. This is particularly important when it is making highly politicised statements using rising temperatures to promote the Net Zero fantasy. As Homewood observes, the Met Office “cannot say with any degree of scientific certainty that the last two years were the warmest on record, nor quantify how much, if any, the climate has warmed since 1884”.

The U.K. figures are of course an important component of the Met Office’s global temper- ature dataset known as HadCRUT. As we noted recently, there is ongoing concern about the accuracy of HadCRUT with large retrospective adjustments of warming in recent times and cooling further back in the record. In fact, this concern has been ongoing for some time. The late Christopher Booker was a great champion of climate scepticism and in February 2015 he suggested that the “fiddling” with temperature data “is the biggest science scandal ever”. Writing in the Telegraph, he noted: “When future generations look back on the global warming scare of the past 30 years, nothing will shock them more than the extent to which official temperatures records – on which the entire panic rested – were systematically ‘adjusted’ to show the Earth as having warmed more than the actual data justified.”

The corresponding system response (in a simplified form here) for all three subtasks, i.e., entity roles, narrative classification and extraction, are provided below.

Entity Roles
Met Office: Antagonist-[Deceiver]
Paul Homewood: Protagonist-[Guardian]
Daily Sceptic: Protagonist-[Guardian]
Christopher Booker: Protagonist-[Guardian,Virtuous]

Narrative Classification: Questioning the measurements and science (narrative) - Methodologies/metrics used are unreliable/fault (subnarrative)

Narrative Extraction: Paul Homewood claims that the Met Office is misleading the public about current UK temperatures by not disclosing a margin of error of up to 2.5º C. The Daily Sceptic reports that most of the Met Office’s 380 stations provide inaccurate measurements. Additionally, Christopher Booker argues that official reports have been repeatedly falsified to indicate climate warming. Thus, the Met Office cannot conclude with scientific certainty that the climate is becoming warm

We provide detailed guidelines for the annotations.

Data Description

We will provide a training set to build your systems locally. We will further provide a development set (without annotations) and an online submission website to score your systems. A public leaderboard will show the progress on the task of the researchers involved in the task.

The data is unique in its kind, it is multilingual, focuses on two highly-debated global topics, and covers various complementary dimensions relevant for the analysis of manipulative attempts in online news media. We use fine-grained entity role and narrative taxonomies.

Input Articles

The input for all tasks will be news and web articles in plain text format in UTF-8. After registrations, participants will be able to download from their team page the corpus. Specifically, articles are provided in the folders train-articles-subtask-x. Further, we will provide a set of dev-articles-subtask-x for which annotations are not provided.

Each article appears in one .txt file. The title (if it exists) is on the first row, followed by an empty row. The content of the article starts from the third row.

Articles in five languages (Bulgarian, English, Hindi, (European) Portuguese, and Russian) are collected from 2022 to mid 2024, they revolve around two topics, namely, Ukraine-Russia war and Climate Change. Our media/article selection covers mainly alternative news and web portals, large fraction of which were identified by fact-checkers and media credibility experts as potentially spreading mis-/disinformation. For the collection of the articles we exploit various news aggregation engines, like for instance Europe Media Monitor (EMM), a large-scale multi-lingual near real-time news aggregation and analysis engine, whereas fraction of the articles were collected and filtered manually. Articles whenever possible were retrieved with the Trafilatura library or other similar web-scraping tools, and otherwise were retrieved manually.

Gold Labels and Submission Format

Subtask 1 - Entity Framing

The format of a tab-separated line of the gold label and the submission files for subtask 1 is:

 article_id     entity_mention     start_offset     end_offset     main_role(*)     fine-grained_roles(*)

where article_id is the file name of the input article file, entity_mention is the string representing the entity mention, start_offset (end_offset) provides the start/end position of the mention, main_role is a string representing the main entity role, and fine-grained_roles is a tab-separated list of strings representing the fine-grained role(s). This is an example of a section of the gold file for three articles:

						
                               EN_10001.txt   Martin Luther King Jr.  10   32    Protagonist    Martyr
                               EN_10002.txt   Mahatma Gandhi          12   27    Protagonist    Martyr      Rebel
                               EN_10003.txt   ISIS                    4    8     Antagonist     Terrorist   Deceiver

Partial view of a gold label file for Subtask 1

IMPORTANT: For creating the submission file a list of all entity mentions and the corresponding offets for all the articles will be provided. There is no need to do any Named-Entity recognition on participant side.

(*) The leaderboard will evaluate predictions for both main_role and fine_grained_roles, however, the official evaluation metric is for the fine_grained_roles.

Note that main_role should take only one of three values from the 1st level of the taxonomy, while fine_grained_roles should take one or more values from the 2nd level of the taxonomy.

If you chose not to train a model to predict main_role, you still need to provide a proper value under main_role to pass the format checker code in the scorer.

Subtask 2 - Narrative Characterization

The format of a tab-separated line of the gold label and the submission files for subtask 2 is:

 article_id     narrative_1;...;narrative_N     subnarrative_1;...;subnarrative_N

where article_id is the file name of the input article file, narrative_x is a label (of the provided hierarchy) representing a narrative present in the article, and subnarrative_x is one of the labels (of the hierarchy) representing a subnarrative (corresponding to subnarrative_x) present in the article. In case neither specific narrative nor subnarrative label can be assigned the “Other” pseudo-label is used. In case the article is labeled with a specific narrative label but no specific subnarrative can be assigned, the convention “[Narrative] : Other” will be used to represent a label that although belongs to a narrative does not fit any of the sub-narratives. This is an example of a section of the gold file for three articles:

                              EN_10001.txt     URW: Blaming Others                           URW: Ukraine is the aggressor
                              EN_10002.txt     URW: Blaming Others;URW: Praise of Russia     URW: Blaming Others: Other;URW: Praising Russia’s military might
                              EN_10003.txt     Other                               Other

Partial view of a gold label file for Subtask 2.

IMPORTANT: For creating the submission file two lists one with the narratives and one with the sub-narratives need to be provided. No consistency requirement is enforced (e.g. sub-narrative X:Y need not have narrative X) as they are evaluated independently.

(*) The leaderboard will evaluate predictions for both narrative ('coarse-level') and sub-narrative('fine-level'), however, the official evaluation metric is for the sub-narrative.

Note that narrative should take one or more of the values from the 1st level of the taxonomies (either CC or URW), and sub-narrative should take one or more values from the 2nd level of the taxonomies.
Participants are free to deploy any strategy to deduce or infer seperately the narrative. If one choses not to train a model to predict narrative, one still needs to provide a proper value under narrative to pass the format checker code in the scorer (e.g. by just putting everything as 'Other').

Subtask 3 - Narrative Extraction

The format of a tab-separated line of the ground truth files for subtask 3 is:

 article_id   dominant_narrative   dominant_subnarrative   explanation

where article_id is the file name of the input article, dominant_narrative is the string representing the dominant narrative of the article, dominant_subnarrative is the string representing the corresponding dominant subnarrative of the article, and explanation is the string representing free text explanation of the dominant narrative. The format of the submission files is similar (see below), but does not include the dominant-narrative elements.

 article_id   explanation

This is an example of a section of the gold file for two articles:

                                EN_10001.txt     URW: Blaming Others     URW: Ukraine is the aggressor            Ukraine war started...
                                EN_10002.txt     URW: Praise of Russia   URW: Praising Russia’s military might    Russia is...

Partial view of a gold label file for Subtask 3

For the training set we provide one gold file per article as well as a single gold file with all gold labels for all files. The participants are expected to upload one file for all the articles per task.

Evaluation

Upon registration, participants will have access to their team page, where they can also download scripts for scoring the different tasks. Here is a brief description of the evaluation measures the scorers compute.

Subtask 1

Subtask 1 is a multiclass multi-label classification problem. The official evaluation measure will be Exact Match Ratio which measures the proportion of samples where all labels are correctly predicted.

Subtask 2

Subtask 2 is a multi-label multi-class classification problem. The official evaluation measure will be averaged (over test documents) samples F₁ computed for entire narrative_x:subnarrative_x labels. That is, we will first compute an F1 score per test document by comparing the predicted to the gold narrative_x:subnarrative_x labels of the document, and we will then average over the test documents. Both the narrative_x and the subnarrative_x part of each predicted narrative_x:subnarrative_x label will have to be correct for the predicted label to be considered correct. We will also report averaged samples F1 computed for narratives only, by ignoring the subnarrative_x parts of the narrative_x:subnarrative_x predicted and gold labels.

Subtask 3

Subtask 3 is a text generation task. The official evaluation measure is the average of similarities between gold and the corresponding predicted explanations using the F1 metric computed by BertScore (see also BertScore at huggingFace).

For all subtasks additional secondary evaluation scores will be computed as well and displayed on the leaderboard.

Related datasets and material

Apart from the specific training datasets for our task there are other related datasets that could be exploited in one way or another for assembling and elaborating models for the various tasks.

Subtask 1

[Sharma et al., 2023] provides a dataset for identifying heros, villains, and victims in memes, with similar coarse-level classes. The dataset for target sentiment analysis presented in [Orbach et al., 2021] could be used to build a model to identify target entities of certain sentiments.

Subtask 2

A dataset consisting of news articles and other types of texts for classification of Climate Change denial claims, which uses to some extent a similar taxonomy of narratives is presented in [Coan et al., 2021], while a cleaned version thereof is reported in [Piskorski et al., 2022]. A thorough analysis of the narratives in the context of Ukraine-Russia war is presented in [Amanatullah, et al., 2023]. A dataset for fine-grained narrative classification, albeit in a different domain than the ones in our task, namely, COVID-19 is presented in [Kotseva et al., 2023].

Subtask 3

Detecting persuasion techniques in texts might help to spot relevant text fragments with manipulative content. A wide range of relevant datasets for detection of such techniques in online news exists: [Da San Martino et al., 2020] [Piskorski et al., 2023] [Piskorski et al., 2024].

How to Participate

Ask to participate on the registration page, once your account is checked you'll be able to access the data and have the possibility to submit your predictions.
After we manually verify your account, you will get an email with your team passcode. In case you do not receive the email, after checking your SPAM folder, then send us an email. We recommed you write down the passcode (and bookmark your team page).
We will use your email only to send you updates on the corpus or to let you know if we organise any event on the topic, we promise.
Use the passcode on the top-right box to enter your team page. There you can download the data and submit your runs.
Phase 1. Submit your predictions on the development set to check your performance evolution. You will get an immediate feedback for each submission and you can check other participants' performances.
Avoid submitting an abnormal number of submissions with the purpose of guessing the gold labels.
Manual predictions are forbidden; the whole process should be automatic.
Phase 2. Once the test set is available, you will be able to submit your predictions on it, but you won't get any feedback until the end of the evaluation phase.
You can make as many submissions as you like, but we will evaluate only the latest one.
The dataset may include content which is protected by copyright of third parties. It may only be used in the context of this shared task, and only for scientific research purposes. The dataset may not be redistributed or shared in part or full with any third party. You may not share you passcode with others or give access to the dataset to unauthorised users. Any other use is explicitly prohibited. In particular, publicly exposing online models trained using this dataset and releasing related public APIs to appply these models on arbitrary texts is prohibited.
In order to disseminate the results, we give the chance to the users to share a link to a paper or a website describing their systems.

Dates

15 July 2024	Task description available
9 September 2024	Registration opens
12 September 2024	First chunk of training data released
16 October 2024	Second chunk of training data released
11 November 2024	Development dataset released and Leaderboard opened
7 December 2024	Complete training data released
10 January 2025	Release of the gold labels of the development set
19 January 2025	Release of the test set for Subtask 1 and 2
~~26 January 2025~~ 30 January 2025 at 23:59 (Anywhere on Earth)	Test submission site closes for Subtask 1 and 2
~~27 January 2025~~ 31 January 2025	Release of the test set for Subtask 3
~~31 January 2025~~ 4 February 2025 at 23:59 (Anywhere on Earth)	Test submission site closes for Subtask 3
~~28 February 2025~~	System Paper Submission Deadline
~~31 March 2025~~	Notification to authors
~~21 April 2025~~	Camera ready papers due
July 31 - Aug. 1 2025	SemEval workshop (co-located with ACL 2025)

Contact

We have created a google group for the task. Join it to ask any question and to interact with other participants.

If you need to contact the organisers only, send us an email.

Credits

The following people are making this task possible:

Jakub Piskorski, Institute of Computer Science, Polish Academy of Sciences, Poland
Giovanni Da San Martino, University of Padova, Italy
Elisa Sartori, University of Padova, Italy
Tarek Mahmoud, Mohamed bin Zayed University of Artificial Intelligence, UAE
Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, UAE
Zhuohan Xie, Mohamed bin Zayed University of Artificial Intelligence, UAE
Tanmoy Chakraborty, Indian Institute of Technology, Delhi
Shivam Sharma, Indian Institute of Technology, Delhi
Roman Yangarber, University of Helsinki, Finland
Ion Androutsopoulos, Athens University of Economics and Business, and Archimedes/Athena RC, Greece
John Pavlopoulos, Athens University of Economics and Business, and Archimedes/Athena RC, Greece
Nikolaos Nikolaidis, Athens University of Economics and Business, Greece
Ricardo Campos, University of Beira Interior, Portugal
Alípio Jorge, University of Porto and INESC TEC, Portugal
Purificação Silvano, University of Porto, Portugal
Nuno Ricardo Guimarães, University of Porto and INESC TEC, Portugal
Dimitar Dimitrov, Sofia University, Bulgaria
Ivan Koychev, Sofia University, Bulgaria
Nicolas Stefanovitch, European Commission Joint Research Centre, Italy

Template by pFind Goodies

SemEval 2025 Task 10 "Multilingual Characterization and Extraction of Narratives from Online News"

Technical Description

Subtask 1: Entity Framing

Subtask 2: Narrative Classification

Subtask 3: Narrative Extraction

Example

Data Description

Input Articles

Gold Labels and Submission Format

Subtask 1 - Entity Framing

Subtask 2 - Narrative Characterization

Subtask 3 - Narrative Extraction

Evaluation

Subtask 1

Subtask 2

Subtask 3

Related datasets and material

Subtask 1

Subtask 2

Subtask 3

How to Participate

Dates

Contact

Credits

SemEval 2025 Task 10
"Multilingual Characterization and Extraction of Narratives from Online News"