The internet has opened vast possibilities to easily create direct communication channels between information producers and consumers, potentially leaving the latter exposed to deceptive content and attempts at manipulation. Huge audiences can be affected online, and major crisis events are continuously subjected to the spread of harmful disinformation and propaganda.
In order to foster research and development of novel analytical functionalities to support end-users in analysing the news ecosystem and characterizing manipulation attempts, we launch in the frame of SemEval 2025 campaign a Task on Multilingual Characterization and Extraction of Narratives from Online News.
In particular, the task focuses on the automatic identification of narratives, their classification and identifying the roles of the relevant entities involved. These specific analytical dimensions are of paramount importance for facilitating the work of analysts studying target-specific disinformation phenomena.
We offer three subtasks on news articles: Entity Framing, Narrative Classification and Narrative Extraction, in five languages: Bulgarian, English, Hindi, (European) Portuguese, and Russian. The participants may take part in any number of subtask-language pairs (even just one), and may train their systems using the data for all languages (in a multilingual setup).
The task builts on top of the prior tasks on media analysis that focused on persuasion techniques, framing dimensions and news genre organized as part of SemEval 2023, SemEval 2020, and CheckThat! 2024 Lab campaigns, and the task on detecting entity roles in memes organized as part of the CheckThat! 2024 Lab .
The task covers news articles from two domains, namely, Ukraine-Russia War and Climate Change, and is subdivided into three subtasks
Definition: given a news article and a list of mentions of named entities (NEs) in the article, assign for each such mention one or more roles using a predefined taxonomy of fine-grained roles covering three main type of roles: protagonists, antagonists, and innocent. This is a multi-label multi-class text-span classification task.
Definition: given a news article and a two-level taxonomy of narrative labels (where each narrative is subdivided into subnarratives) from a particular domain, assign to the article all the appropriate subnarrative labels. This is a multi-label multi-class document classification task.
Definition: given a news article and a dominant narrative of the text of this article, generate a free-text explanation (up to max. of 80 words) supporting the choice of this dominant narrative. The to-be-generated explanation should be grounded in the text fragments that provide evidence of the claims of the dominant narrative. This is a text-to-text generation task.
Below is an example of a news article focusing on climate change topics. The key entities and main claim-related text fragments are marked and underlined respectivelly.
Met Office Should Put 2.5°C ‘Uncertainties’ Warning on All Future Temperature Claims
It is “abundantly clear” that the Met Office cannot scientifically claim to know the cur-
rent average temperature of the U.K. to a hundredth of a degree centigrade, given that
it is using data that has a margin of error of up to 2.5°C, notes the climate journalist
Paul Homewood. His comments follow recent disclosures in the Daily Sceptic that nearly
eight out of ten of the Met’s 380 measuring stations come with official ‘uncertainties’ of
between 2-5°C. In addition, given the poor siting of the stations now and possibly in the
past, the Met Office has no means of knowing whether it is comparing like with like when
it publishes temperature trends going back to 1884.
There are five classes of measuring stations identified by the World Meteorological Office
(WMO). Classes 4 and 5 come with uncertainties of 2°C and 5°C respectively and account
for an astonishing 77% of the Met Office station total. Class 3 has an uncertainty rating
of 1°C and accounts for another 8.4% of the total. The Class ratings identify potential
corruptions in recordings caused by both human and natural involvement. Homewood
calculates that the average uncertainty across the entire database is 2.5°C. In the graph
below, he then calculates the range of annual U.K. temperatures going back to 2010
incorporating the margins of error.
The blue blocks show the annual temperature announced by the Met Office, while the red
bars take account of the WMO uncertainties. It is highly unlikely that the red bars show
the more accurate temperature, and there is much evidence to suggest temperatures are
nearer the blue trend. But the point of the exercise is to note that the Met Office, in the
interests of scientific exactitude, should disclose what could be large measurement inac-
curacies. This is particularly important when it is making highly politicised statements
using rising temperatures to promote the Net Zero fantasy. As Homewood observes,
the Met Office “cannot say with any degree of scientific certainty that the last two years
were the warmest on record, nor quantify how much, if any, the climate has warmed since
1884”.
The U.K. figures are of course an important component of the Met Office’s global temper-
ature dataset known as HadCRUT. As we noted recently, there is ongoing concern about
the accuracy of HadCRUT with large retrospective adjustments of warming in recent
times and cooling further back in the record. In fact, this concern has been ongoing for
some time. The late Christopher Booker was a great champion of climate scepticism and
in February 2015 he suggested that the “fiddling” with temperature data “is the biggest
science scandal ever”. Writing in the Telegraph, he noted: “When future generations
look back on the global warming scare of the past 30 years, nothing will shock them
more than the extent to which official temperatures records – on which the entire panic
rested – were systematically ‘adjusted’ to show the Earth as having warmed more than
the actual data justified.”
The corresponding system response (in a simplified form here) for all three subtasks, i.e., entity roles, narrative classification and extraction, are provided below.
Entity Roles
Met Office: Antagonist-[Deceiver]
Paul Homewood: Protagonist-[Guardian]
Daily Sceptic: Protagonist-[Guardian]
Christopher Booker: Protagonist-[Guardian,Virtuous]
Narrative Classification: Questioning the measurements and science (narrative) - Methodologies/metrics used are unreliable/fault (subnarrative)
Narrative Extraction: Paul Homewood claims that the Met Office is misleading the public about current UK temperatures
by not disclosing a margin of error of up to 2.5º C. The Daily Sceptic reports that most of the Met
Office’s 380 stations provide inaccurate measurements. Additionally, Christopher Booker argues that
official reports have been repeatedly falsified to indicate climate warming. Thus, the Met Office
cannot conclude with scientific certainty that the climate is becoming warm
We will provide a training set to build your systems locally. We will further provide a development set (without annotations) and an online submission website to score your systems. A public leaderboard will show the progress on the task of the researchers involved in the task.
The data is unique in its kind, it is multilingual, focuses on two highly-debated global topics, and covers various complementary dimensions relevant for the analysis of manipulative attempts in online news media. We use fine-grained entity role and narrative taxonomies. >
The input for all tasks will be news and web articles in plain text format in UTF-8. After registrations, participants will be able to download from their team page the corpus. Specifically, articles are provided in the folders train-articles-subtask-x. Further, we will provide a set of dev-articles-subtask-x for which annotations are not provided.
Each article appears in one .txt file. The title (if it exists) is on the first row, followed by an empty row. The content of the article starts from the third row.
Articles in five languages (Bulgarian, English, Hindi, (European) Portuguese, and Russian) are collected from 2022 to mid 2024, they revolve around two topics, namely, Ukraine-Russia war and Climate Change. Our media/article selection covers mainly alternative news and web portals, large fraction of which were identified by fact-checkers and media credibility experts as potentially spreading mis-/disinformation. For the collection of the articles we exploit various news aggregation engines, like for instance Europe Media Monitor (EMM), a large-scale multi-lingual near real-time news aggregation and analysis engine, whereas fraction of the articles were collected and filtered manually. Articles whenever possible were retrieved with the Trafilatura library or other similar web-scraping tools, and otherwise were retrieved manually.
The format of a tab-separated line of the gold label and the submission files for subtask 1 is:
article_id entity_mention start_offset end_offset main_role(*) fine-grained_roles(*)
where article_id is the numeric id in the name of the input article file (e.g. the id of file article123456.txt is 123456), entity_mention is the string representing the entity mention, start_offset (end_offset) provides the start/end position of the mention, main_role is a string representing the main entity role, and fine-grained_roles is a tab-separated list of strings representing the fine-grained role(s). This is an example of a section of the gold file for the articles with ids 10001 - 10003:
10001 Martin Luther King Jr. 10 32 Protagonist Martyr 10002 Mahatma Gandhi 12 27 Protagonist Martyr Rebel 10003 ISIS 4 8 Antagonist Terrorist Deceiver
IMPORTANT: For creating the submission file a list of all entity mentions and the corresponding offets for all the articles will be provided. There is no need to do any Named-Entity recognition on participant side.
(*) The leaderboard will evaluate predictions for both main_role and fine_grained_roles, however, the official evaluation metric is for the fine_grained_roles.
Note that main_role should take only one of three values from the 1st level of the taxonomy, while fine_grained_roles should take one or more values from the 2nd level of the taxonomy.
If you chose not to train a model to predict main_role, you still need to provide a proper value under main_role to pass the format checker code in the scorer.
The format of a tab-separated line of the gold label and the submission files for subtask 2 is:
article_id narrative_1;...;narrative_N subnarrative_1;...;subnarrative_N
where article_id is the numeric id in the name of the input article file (e.g. the id of file article123456.txt is 123456), narrative_x is a label (of the provided hierarchy) representing a narrative present in the article, and subnarrative_x is one of the labels (of the hierarchy) representing a subnarrative (corresponding to subnarrative_x) present in the article. In case neither specific narrative nor subnarrative label can be assigned the “Other” pseudo-label is used. In case the article is labeled with a specific narrative label but no specific subnarrative can be assigned, the convention “[Narrative] : Other” will be used to represent a label that although belongs to a narrative does not fit any of the sub-narratives. This is an example of a section of the gold file for the articles with ids 10001 - 10003:
10001 Blaming Others Ukraine is the aggressor 10002 Blaming Others;Praise of Russia Blaming Others: Other;Praising Russia’s military might 10003 Other Other
IMPORTANT: For creating the submission file two lists one with the narratives and one with the sub-narratives need to be provided. No consistency requirement is enforced (e.g. sub-narrative X:Y need not have narrative X) as they are evaluated independently.
(*) The leaderboard will evaluate predictions for both narrative ('coarse-level') and sub-narrative('fine-level'), however, the official evaluation metric is for the sub-narrative.
Note that narrative should take one or more of the values from the 1st level of the taxonomies (either CC or URW), and sub-narrative should take one or more values from the 2nd level of the taxonomies.
Participants are free to deploy any strategy to deduce or infer seperately the narrative.
If one choses not to train a model to predict narrative, one still needs to provide a proper value under narrative to pass the format checker code in the scorer (e.g. by just putting everything as 'Other').
The format of a tab-separated line of the ground truth files for subtask 3 is:
article_id dominant_narrative dominant_subnarrative explanation
where article_id is the identifier of the article, dominant_narrative is the string representing the dominant narrative of the article, dominant_subnarrative is the string representing the corresponding dominant subnarrative of the article, and explanation is the string representing free text explanation of the dominant narrative. The format of the submission files is similar (see below), but does not include the dominant-narrative elements.
article_id explanation
This is an example of a section of the gold file for the articles with ids 10001 - 10002:
10001 Blaming Others Ukraine is the aggressor Ukraine war started... 10002 Praise of Russia Praising Russia’s military might Russia is...
For the training set we provide one gold file per article as well as a single gold file with all gold labels for all files. The participants are expected to upload one file for all the articles per task.
Upon registration, participants will have access to their team page, where they can also download scripts for scoring the different tasks. Here is a brief description of the evaluation measures the scorers compute.
Subtask 1 is a multiclass multi-label classification problem. The official evaluation measure will be Exact Match Ratio which measures the proportion of samples where all labels are correctly predicted.
Subtask 2 is a multi-label multi-class classification problem. The official evaluation measure will be averaged (over test documents) samples F1 computed for entire narrative_x:subnarrative_x labels. That is, we will first compute an F1 score per test document by comparing the predicted to the gold narrative_x:subnarrative_x labels of the document, and we will then average over the test documents. Both the narrative_x and the subnarrative_x part of each predicted narrative_x:subnarrative_x label will have to be correct for the predicted label to be considered correct. We will also report averaged samples F1 computed for narratives only, by ignoring the subnarrative_x parts of the narrative_x:subnarrative_x predicted and gold labels.
Subtask 3 is a text generation task. The official evaluation measure is the average of similarities between gold and the corresponding predicted explanations using the F1 metric computed by BertScore (see also BertScore at huggingFace).
For all subtasks additional secondary evaluation scores will be computed as well and displayed on the leaderboard.
Apart from the specific training datasets for our task there are other related datasets that could be exploited in one way or another for assembling and elaborating models for the various tasks.
[Sharma et al., 2023] provides a dataset for identifying heros, villains, and victims in memes, with similar coarse-level classes. The dataset for target sentiment analysis presented in [Orbach et al., 2021] could be used to build a model to identify target entities of certain sentiments.
A dataset consisting of news articles and other types of texts for classification of Climate Change denial claims, which uses to some extent a similar taxonomy of narratives is presented in [Coan et al., 2021], while a cleaned version thereof is reported in [Piskorski et al., 2022]. A thorough analysis of the narratives in the context of Ukraine-Russia war is presented in [Amanatullah, et al., 2023]. A dataset for fine-grained narrative classification, albeit in a different domain than the ones in our task, namely, COVID-19 is presented in [Kotseva et al., 2023].
Detecting persuasion techniques in texts might help to spot relevant text fragments with manipulative content. A wide range of relevant datasets for detection of such techniques in online news exists: [Da San Martino et al., 2020] [Piskorski et al., 2023] [Piskorski et al., 2024].
15 July 2024 | Task description available |
9 September 2024 | Registration opens |
12 September 2024 | First chunk of training data released |
16 October 2024 | Second chunk of training data released |
October/November 2024 | Release of further training and development data |
20 January 2025 | Release of the gold labels of the development set |
27 January 2025 | Release of the test set |
31 January 2025 at 23:59 (Anywhere on Earth) | Test submission site closes |
28 February 2025 | System Paper Submission Deadline |
31 March 2025 | Notification to authors |
21 April 2025 | Camera ready papers due |
Summer 2025 | SemEval workshop (co-located with a major NLP conference) |
We have created a google group for the task. Join it to ask any question and to interact with other participants.
Follow us on twitter to get the latest updates on the data and the competition!
If you need to contact the organisers only, send us an email.
The following people are making this task possible: