Dataset of Propaganda Techniques of the State-Sponsored Information Operation of the People's Republic of China

AI-generated keywords: Computational Propaganda Mandarin Chinese Multi-Label Classification State-Backed Propaganda BERT Model

AI-generated Key Points

  • The rise of digital media has led to computational propaganda, which allows for unlimited dissemination of propaganda.
  • State-backed propaganda aims to influence people's thinking towards a particular political party or authority and is used in information warfare.
  • Most studies on detecting propaganda focus on machine learning, quantitative, and qualitative methods but there is limited research on Chinese Mandarin content.
  • Researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019.
  • The selected propaganda techniques include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking, name-calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non-political meme humor symbol.
  • This new dataset could help future research detect state-backed propaganda online especially in a cross-lingual context and cross-platform identity consolidation.
  • Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin.
  • Presenting irrelevant data obfuscation and appeal to authority are the most frequent propaganda techniques used in Mandarin Chinese tweets.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rong-Ching Chang, Chun-Ming Lai, Kai-Lai Chang, Chu-Hsing Lin

License: CC BY-NC-SA 4.0

Abstract: The digital media, identified as computational propaganda provides a pathway for propaganda to expand its reach without limit. State-backed propaganda aims to shape the audiences' cognition toward entities in favor of a certain political party or authority. Furthermore, it has become part of modern information warfare used in order to gain an advantage over opponents. Most of the current studies focus on using machine learning, quantitative, and qualitative methods to distinguish if a certain piece of information on social media is propaganda. Mainly conducted on English content, but very little research addresses Chinese Mandarin content. From propaganda detection, we want to go one step further to provide more fine-grained information on propaganda techniques that are applied. In this research, we aim to bridge the information gap by providing a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter. In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT. Potentially this could help future research in detecting state-backed propaganda online especially in a cross-lingual context and cross platforms identity consolidation.

Submitted to arXiv on 14 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.07544v1

The rise of digital media has given way to a new form of propaganda known as computational propaganda, which allows for the dissemination of propaganda without limits. State-backed propaganda aims to shape the audience's cognition towards entities that favor a particular political party or authority and has become an integral part of modern information warfare used to gain an advantage over opponents. While most current studies focus on using machine learning, quantitative, and qualitative methods to distinguish if a certain piece of information on social media is propaganda, very little research addresses Chinese Mandarin content. Most recent work in this field has focused on identifying if the information is propaganda or not using various methods such as qualitative analysis, quantitative analysis, and machine learning. The main features for this detection task could be divided into two parts: content-driven and network-driven. However, there is limited research based on text features due to the lack of annotated data sets. To address this gap, researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019. The dataset consists of multi-label propaganda techniques applied to sampled tweets. Additionally, researchers employed a fine-tuned BERT model for the multi-label classification task. The selected propaganda techniques were based on various studies and include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking, name-calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non political meme humor symbol. This new dataset could potentially help future research in detecting state-backed propaganda online especially in a cross-lingual context and cross platform identity consolidation. In Figure 1 we plot the language distribution for our dataset which shows that Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin. The dataset label statistics show that presenting irrelevant data obfuscation and appeal to authority are the most frequent propaganda techniques used in Mandarin Chinese tweets.
Created on 23 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.