CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

AI-generated keywords: Large Language Models Human Values Safety Responsibility Evaluation

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Rapid advancement of large language models (LLMs) raises concerns about potential risks and negative social impacts
  • Evaluating the alignment of LLMs with human values is increasingly important
  • Previous research has primarily focused on knowledge and reasoning abilities, neglecting alignment with human values in the Chinese context
  • CValues is introduced as the first Chinese human values evaluation benchmark for LLMs
  • CValues measures alignment ability in terms of safety and responsibility criteria
  • Adversarial safety prompts were manually collected across 10 scenarios, responsibility prompts were induced from 8 domains with professional experts' help
  • Human evaluation and multi-choice prompts are used for comprehensive evaluation of Chinese LLMs' values alignment
  • Most Chinese LLMs perform well in terms of safety but need improvement regarding responsibility
  • Automatic and human evaluations are both important in assessing alignment between LLMs and human values
  • CValues benchmark and code are available on ModelScope and Github platforms.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou, Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang, Jingren Zhou

Working in Process

Abstract: With the rapid evolution of large language models (LLMs), there is a growing concern that they may pose risks or have negative social impacts. Therefore, evaluation of human values alignment is becoming increasingly important. Previous work mainly focuses on assessing the performance of LLMs on certain knowledge and reasoning abilities, while neglecting the alignment to human values, especially in a Chinese context. In this paper, we present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs in terms of both safety and responsibility criteria. As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains by professional experts. To provide a comprehensive values evaluation of Chinese LLMs, we not only conduct human evaluation for reliable comparison, but also construct multi-choice prompts for automatic evaluation. Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility. Moreover, both the automatic and human evaluation are important for assessing the human values alignment in different aspects. The benchmark and code is available on ModelScope and Github.

Submitted to arXiv on 19 Jul. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2307.09705v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The rapid advancement of large language models (LLMs) has raised concerns about potential risks and negative social impacts. As a result, evaluating the alignment of LLMs with human values has become increasingly important. However, previous research has primarily focused on assessing LLMs' performance in terms of knowledge and reasoning abilities, neglecting their alignment with human values, particularly in the Chinese context. To address this gap, this paper introduces CValues, the first Chinese human values evaluation benchmark. CValues aims to measure the alignment ability of Chinese LLMs in terms of both safety and responsibility criteria. The researchers manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains with the help of professional experts. To provide a comprehensive evaluation of Chinese LLMs' values alignment, the study employs both human evaluation for reliable comparison and constructs multi-choice prompts for automatic evaluation. The findings indicate that while most Chinese LLMs perform well in terms of safety, there is still significant room for improvement regarding responsibility. The study highlights the importance of both automatic and human evaluations in assessing the alignment between LLMs and human values across different aspects. The CValues benchmark and code are available on ModelScope and Github platforms. In summary, this research presents an essential contribution to evaluating the values alignment of Chinese LLMs by introducing CValues as a benchmark. By focusing on safety and responsibility criteria, it sheds light on areas where improvements can be made to enhance their alignment with human values.
Created on 21 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.