Registration to the website is open. After registering, you can get access to the data and submit your predictions.
Memes are one of the most popular type of content used in an online disinformation campaign.
They are mostly effective on social media platforms, since there they can easily reach a large number of users.
Memes in a disinformation campaign achieve their goal of influencing the users through a number of rhetorical and psychological techniques, such as causal oversimplification, name calling, smear.
The goal of the shared task is to build models for identifying such techniques in the textual content of a meme only (one subtask) and in a multimodal setting in which both the textual and the visual content are to be analysed together (two subtasks).
Technical Description
We refer to propaganda whenever information is purposefully shaped to foster a predetermined agenda.
Propaganda uses psychological and rhetorical techniques to reach its purpose.
Such techniques include the use of logical fallacies and appealing to the emotions of the audience.
Logical fallacies are usually hard to spot since the argumentation, at first sight,
might seem correct and objective.
However, a careful analysis shows that the conclusion cannot be drawn from
the premise without the misuse of logical rules.
Another set of techniques makes use of emotional language to induce the
audience to agree with the speaker only on the basis of the emotional bond
that is being created, provoking the suspension of any rational analysis of the argumentation.
Memes consist of an image superimposed with text.
The role of the image in a deceptive meme is either to reinforce/complement a technique in the text or to convey itself one or more persuasion techniques.
Technical Description
We defined the following subtasks:
- Subtask 1 -
Given only the “textual content” of a meme, identify which of the 20 persuasion techniques, organized in a hierarchy, it uses. If the ancestor node of a technique is selected, only a partial reward is given. This is a hierarchical multilabel classification problem.
You can find a view of the hierarchy in the figure below (note that there are 22 techniques in the image, but in subtask 1 "Transfer" and ""Appeal to Strong emotion" are not present, so just picture the hierarchy without them). Full details on it are available here.
If you need additional annotated data to solve this task, you can check the PTC corpus" as well as the SemEval 2023 task 3 data.
- Subtask 2a -
Given a meme, identify which of the 22 persuasion techniques, organized in a hierarchy, are used both in the textual and in the visual content of the meme (multimodal task).
If the ancestor node of a technique is selected, only partial reward will be given. This is a hierarchical multilabel classification problem.
You can find info on the hierarchy here.
- Subtask 2b -
Given a meme (both the textual and the visual content), identify whether it contains a persuasion technique (at least one of the 22 techniques we considered in this task), or no technique.
This is a binary classification problem. Note that this is a simplified version of subtask 2a in which the hierarchy is cut at the first two children of the root node.
Note that for all subtasks, there will be three surprise test datasets in different languages (a fourth one in English will be released as well), which will be revealed only at the final stages of the shared task. i.e. together with the release of the test data. This has the goal to test zero-shot approaches.
The hierarchy is basically a Directed Acyclic graph that groups subsets of the techniques that share similar characteristics in a hierarchical structure.
Hierarchy of the techniques for Subtask 2a (in Subtask 1 "Transfer" and "Appeal to Strong emotion" are not present).
The hierarchy is also inspired by
this document
Data Description
The corpus is availble in your team page after registering for the shared task github page. Beware that the content of some memes might be considered offensive or too strong by some viewers. Subscribe to the task mailing list and the Twitter accounts (see the bottom of the page) to get updates on the task (the mailing list will be the official channel of communications).
Note that, for all subtasks, you are free to use the annotations of
the
PTC corpus (more than 20,000 sentences).
The domain of that corpus is news articles, but the annotations are made using the same guidelines, altough fewer techniques were considered.
A similar, albeit multilingual, corpus is
also available . Even in this case, the domain of the corpus is news articles from 9 languages. Beware that the number of techniques and the annotation guidelines are slightly different.
We provide a training set to build your systems locally.
We will provide a development set and a public leaderboard to share your results in real time with the other participants involved in the task.
We will further provide a test set (without annotations) and an online submission website to score your systems.
Input and Submission File Format
The input data for subtask 1 is the text extracted from the meme.
The training, the development and the test sets for all subtasks are distributed as json files, one single file per subtask.
The input data for subtasks 2a and 2b, in addition to the text extracted from the meme, is the image of the meme itself. The images are distributed together with the subtask json in a zip file, and it is available, upon registration, from the personal page of your team.
Here is an example of a meme:
Subtask 1
The entry for that example in the json file for subtask 1 is
{
"id": "125",
"text": "I HATE TRUMP\n\nMOST TERRORIST DO",
"labels": [
"Loaded Language",
"Name calling/Labeling"
],
"link": "https://..."
},
where
-
id is the unique identifier of the example across all three subtasks
-
text is the textual content of the meme, as a single UTF-8 string.
While the text is first extracted automatically from the meme, we manually post-process it to remove errors and to format it in such a way that each sentence is on a single row and blocks of text in different areas of the image are separated by a blank row.
Note that task 1 is an NLP task since the image is not provided as an input.
-
labels is a list of valid technique names (the full list is available in your team page after registration) used in the text.
Since these are the gold labels, they will be provided for the training set only.
In this case two techniques were spotted: Loaded Language and Name calling/Labeling.
A submission for task 1 is a single json file with the same format as the input file, but where only the fields
id,
labels are required.
Note that if your algorithm detects no technique in a meme, then the field "labels" should be an empty list.
Subtask 2a
The input for subtask 2a is a json and a folder with the images of the memes.
The entry in the json file for the meme above is
{
"id": "125",
"text": "I HATE TRUMP\n\nMOST TERRORIST DO",
"labels": [
"Reductio ad hitlerum",
"Smears",
"Loaded Language",
"Name calling/Labeling"
],
"image": "125_image.png",
"link": "https://..."
},
where
image is the name of the file with the image of the meme in the folder.
The meaning of
id,
text and
labels is the same as for task 1. However, the list of technique names is different (the full list is available in your team page after registration).
Note that the field labels will be provided for the training set only, since it corresponds to the gold labels.
Notice, however, that now we are able to see the image of the meme, hence we might be able to spot more techniques. In this example
smears and
Reductio ad hitlerum become evident only after we are able to understand who the two sentences are attributed to.
There are other cases in which a technique is conveyed by the image only (see example with
id 189 in the training set).
A submission for task 2 consists in a single json file with the same format as the input file, but where only the fields id, labels, for each example, are required.
Subtask 2b
Subtask 2b is the same as subtask 2a. However, it is going to be evaluated as a binary task, whether at least one technique is present in the meme or no technique is present ("propagandistic" and "non_propagandistic", respectively). Notice, these two labels correspond to the children of the root node of the hierarchy.
The entry for that example in the json file for subtask 1 is
{
"id": "125",
"text": "I HATE TRUMP\n\nMOST TERRORIST DO",
"label": "propagandistic"
},
Evaluation
Upon registration, participants will have access to their team page, where they can also download the scripts we use for computing the results on the leaderboard. You can use the scorers to test your models locally.
Subtask 1 and 2a depends on a hierarchy. Taking the figure above as reference, the gold label is always a leaf node of the DAG. However, any node of the DAG can be a predicted label:
- if the prediction is a leaf node and it is the correct label, then a full reward is given. For example Red Herring is predicted and it is the gold label as well.
-
If the prediction is NOT a leaf node and it is an ancestor of the correct gold lable, then a partial reward is given (the reward depends on the distance between the two nodes). For example, if the gold label is Red Herring and the predicted label is Distraction or Appeal to Logic.
-
if the prediction is not an ancestor node of the correct label, then a null reward is given. For example, if the gold label is Red Herring and the predicted label is Black and White Fallacy or Appeal to Emotions.
< A graphical example is given here.
-
However, notice that, the hierarchy can be ignored by restricting the predictions to technique names only. This way, the task would be identical to SemEval 2023 task 3.
Here is a brief description of the evaluation measures the scorers compute.
Subtask 1
Subtask 1 is a hierarchical multilabel classification problem. Taking the figure above with the hierarchy as example, any node of the DAG can be a predicted label. The gold label is always a leaf node of the DAG. If the prediction is the correct label,
We use hierarchical-F1 (see section 6) as the official evaluation measure.
A graphical example of the evaluation function is available here.
Subtask 2a
Subtask 2a is a hierarchical multilabel classification problem.
We use hierarchical-F1 (see section 6) as the official evaluation measure.
A graphical example of the evaluation function is available here.
Subtask 2b
Subtask 2b is a binary classification problem. The two labels are indicate whether there is at least one persuasion technique in the meme or none.
We use macro-F1 as the official evaluation measure.
The final version of the hierarchy will also be inspired by
this document