Dataset of Propaganda Techniques of the State-Sponsored Information Operation of the People's Republic of China

AI-generated keywords: Computational Propaganda Mandarin Chinese Multi-Label Classification State-Backed Propaganda BERT Model

AI-generated Key Points

The rise of digital media has led to computational propaganda, which allows for unlimited dissemination of propaganda.
State-backed propaganda aims to influence people's thinking towards a particular political party or authority and is used in information warfare.
Most studies on detecting propaganda focus on machine learning, quantitative, and qualitative methods but there is limited research on Chinese Mandarin content.
Researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019.
The selected propaganda techniques include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking, name-calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non-political meme humor symbol.
This new dataset could help future research detect state-backed propaganda online especially in a cross-lingual context and cross-platform identity consolidation.
Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin.
Presenting irrelevant data obfuscation and appeal to authority are the most frequent propaganda techniques used in Mandarin Chinese tweets.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rong-Ching Chang, Chun-Ming Lai, Kai-Lai Chang, Chu-Hsing Lin

arXiv: 2106.07544v1 - DOI (cs.SI)

License: CC BY-NC-SA 4.0

Abstract: The digital media, identified as computational propaganda provides a pathway for propaganda to expand its reach without limit. State-backed propaganda aims to shape the audiences' cognition toward entities in favor of a certain political party or authority. Furthermore, it has become part of modern information warfare used in order to gain an advantage over opponents. Most of the current studies focus on using machine learning, quantitative, and qualitative methods to distinguish if a certain piece of information on social media is propaganda. Mainly conducted on English content, but very little research addresses Chinese Mandarin content. From propaganda detection, we want to go one step further to provide more fine-grained information on propaganda techniques that are applied. In this research, we aim to bridge the information gap by providing a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter. In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT. Potentially this could help future research in detecting state-backed propaganda online especially in a cross-lingual context and cross platforms identity consolidation.

Submitted to arXiv on 14 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.07544v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The rise of digital media has given way to a new form of propaganda known as computational propaganda, which allows for the dissemination of propaganda without limits. State-backed propaganda aims to shape the audience's cognition towards entities that favor a particular political party or authority and has become an integral part of modern information warfare used to gain an advantage over opponents. While most current studies focus on using machine learning, quantitative, and qualitative methods to distinguish if a certain piece of information on social media is propaganda, very little research addresses Chinese Mandarin content. Most recent work in this field has focused on identifying if the information is propaganda or not using various methods such as qualitative analysis, quantitative analysis, and machine learning. The main features for this detection task could be divided into two parts: content-driven and network-driven. However, there is limited research based on text features due to the lack of annotated data sets. To address this gap, researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019. The dataset consists of multi-label propaganda techniques applied to sampled tweets. Additionally, researchers employed a fine-tuned BERT model for the multi-label classification task. The selected propaganda techniques were based on various studies and include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking, name-calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non political meme humor symbol. This new dataset could potentially help future research in detecting state-backed propaganda online especially in a cross-lingual context and cross platform identity consolidation. In Figure 1 we plot the language distribution for our dataset which shows that Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin. The dataset label statistics show that presenting irrelevant data obfuscation and appeal to authority are the most frequent propaganda techniques used in Mandarin Chinese tweets.

- The rise of digital media has led to computational propaganda, which allows for unlimited dissemination of propaganda.
- State-backed propaganda aims to influence people's thinking towards a particular political party or authority and is used in information warfare.
- Most studies on detecting propaganda focus on machine learning, quantitative, and qualitative methods but there is limited research on Chinese Mandarin content.
- Researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019.
- The selected propaganda techniques include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking, name-calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non-political meme humor symbol.
- This new dataset could help future research detect state-backed propaganda online especially in a cross-lingual context and cross-platform identity consolidation.
- Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin.
- Presenting irrelevant data obfuscation and appeal to authority are the most frequent propaganda techniques used in Mandarin Chinese tweets.

1. Digital media has led to the spread of propaganda, which is when people try to influence others' opinions. 2. Some governments use propaganda to make people think a certain way about politics or authority. 3. People are trying to find ways to detect propaganda using machines and different methods. 4. Researchers made a list of techniques used in Mandarin Chinese propaganda, like presenting irrelevant information or using loaded language. 5. This new list can help people detect state-backed propaganda online, especially in different languages and platforms. Definitions- Digital media: technology that allows people to share information and communicate through the internet - Computational propaganda: the use of technology to spread false or misleading information - Propaganda: information that is used to influence people's opinions or beliefs - Machine learning: a type of artificial intelligence where computers learn from data instead of being programmed by humans - Mandarin: a language spoken in China and other parts of Asia

Understanding Computational Propaganda and its Impact on Chinese Mandarin Content

Exploring Current Research in Computational Propaganda Detection

Most recent work in this field has focused on identifying if the information is propaganda or not using various methods such as qualitative analysis, quantitative analysis, and machine learning. The main features for this detection task could be divided into two parts: content-driven and network-driven. However, there is limited research based on text features due to the lack of annotated data sets.

Introducing a New Dataset for Detecting State-Backed Propaganda Online

To address this gap, researchers have presented a multi-labeled propaganda techniques dataset in Mandarin based on a state-linked information operations dataset released by Twitter in July 2019. The dataset consists of multi-label propaganda techniques applied to sampled tweets. Additionally, researchers employed a fine-tuned BERT model for the multi-label classification task. The selected propaganda techniques were based on various studies and include presenting irrelevant data, misrepresentation of someone's position (straw man), whataboutism, oversimplification, obfuscation, appeal to authority, black-and-white thinking , name calling loaded language flag waving doubt mongering slogans appeal to fear or prejudice thought terminating cliché bandwagon reductio ad Hitlerum repetition neutral political non political meme humor symbol . This new dataset could potentially help future research in detecting state backed propagand online especially in cross lingual context and cross platform identity consolidation .

Language Distribution & Label Statistics

In Figure 1 we plot the language distribution for our dataset which shows that Mandarin Chinese is the dominant language with 97.5% of the tweets being in Mandarin . The dataset label statistics show that presenting irrelevant data obfuscation and appeal to authority are the most frequent propoganda techniques used in mandarin chinese tweets .

Conclusion

This article explored computational propoganda detection with respect to Chinese mandarin content specifically focusing on introducing a new multi labeled propoganda technique dataset released by twitter . By understanding how these datasets can be utilized it can help future research detect state backed propoganda online especially within cross lingual contexts .

Created on 23 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

55.1%

Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An E…

cs.SI

50.5%

Early Detection of Fake News by Utilizing the Credibility of News, Publishers…

cs.CL

49.9%

Spam Review Detection Using Deep Learning

cs.CL

49.8%

A Few Observations About State-Centric Online Propaganda

cs.CY

48.9%

How Do US Congress Members Advertise Climate Change: An Analysis Of Ads Run O…

cs.SI

47.8%

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

cs.CL

47.6%

Hate speech detection using static BERT embeddings

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.