CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

AI-generated keywords: Large Language Models Human Values Safety Responsibility Evaluation

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Rapid advancement of large language models (LLMs) raises concerns about potential risks and negative social impacts
Evaluating the alignment of LLMs with human values is increasingly important
Previous research has primarily focused on knowledge and reasoning abilities, neglecting alignment with human values in the Chinese context
CValues is introduced as the first Chinese human values evaluation benchmark for LLMs
CValues measures alignment ability in terms of safety and responsibility criteria
Adversarial safety prompts were manually collected across 10 scenarios, responsibility prompts were induced from 8 domains with professional experts' help
Human evaluation and multi-choice prompts are used for comprehensive evaluation of Chinese LLMs' values alignment
Most Chinese LLMs perform well in terms of safety but need improvement regarding responsibility
Automatic and human evaluations are both important in assessing alignment between LLMs and human values
CValues benchmark and code are available on ModelScope and Github platforms.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou, Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang, Jingren Zhou

arXiv: 2307.09705v1 - DOI (cs.CL)

Working in Process

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: With the rapid evolution of large language models (LLMs), there is a growing concern that they may pose risks or have negative social impacts. Therefore, evaluation of human values alignment is becoming increasingly important. Previous work mainly focuses on assessing the performance of LLMs on certain knowledge and reasoning abilities, while neglecting the alignment to human values, especially in a Chinese context. In this paper, we present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs in terms of both safety and responsibility criteria. As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains by professional experts. To provide a comprehensive values evaluation of Chinese LLMs, we not only conduct human evaluation for reliable comparison, but also construct multi-choice prompts for automatic evaluation. Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility. Moreover, both the automatic and human evaluation are important for assessing the human values alignment in different aspects. The benchmark and code is available on ModelScope and Github.

Submitted to arXiv on 19 Jul. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2307.09705v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

The rapid advancement of large language models (LLMs) has raised concerns about potential risks and negative social impacts. As a result, evaluating the alignment of LLMs with human values has become increasingly important. However, previous research has primarily focused on assessing LLMs' performance in terms of knowledge and reasoning abilities, neglecting their alignment with human values, particularly in the Chinese context. To address this gap, this paper introduces CValues, the first Chinese human values evaluation benchmark. CValues aims to measure the alignment ability of Chinese LLMs in terms of both safety and responsibility criteria. The researchers manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains with the help of professional experts. To provide a comprehensive evaluation of Chinese LLMs' values alignment, the study employs both human evaluation for reliable comparison and constructs multi-choice prompts for automatic evaluation. The findings indicate that while most Chinese LLMs perform well in terms of safety, there is still significant room for improvement regarding responsibility. The study highlights the importance of both automatic and human evaluations in assessing the alignment between LLMs and human values across different aspects. The CValues benchmark and code are available on ModelScope and Github platforms. In summary, this research presents an essential contribution to evaluating the values alignment of Chinese LLMs by introducing CValues as a benchmark. By focusing on safety and responsibility criteria, it sheds light on areas where improvements can be made to enhance their alignment with human values.

- Rapid advancement of large language models (LLMs) raises concerns about potential risks and negative social impacts
- Evaluating the alignment of LLMs with human values is increasingly important
- Previous research has primarily focused on knowledge and reasoning abilities, neglecting alignment with human values in the Chinese context
- CValues is introduced as the first Chinese human values evaluation benchmark for LLMs
- CValues measures alignment ability in terms of safety and responsibility criteria
- Adversarial safety prompts were manually collected across 10 scenarios, responsibility prompts were induced from 8 domains with professional experts' help
- Human evaluation and multi-choice prompts are used for comprehensive evaluation of Chinese LLMs' values alignment
- Most Chinese LLMs perform well in terms of safety but need improvement regarding responsibility
- Automatic and human evaluations are both important in assessing alignment between LLMs and human values
- CValues benchmark and code are available on ModelScope and Github platforms.

Key Points1. Language models are getting better at understanding and using language, but this can have some problems. 2. It's important to check if these models understand and follow human values. 3. Previous research on this topic has focused more on knowledge and reasoning, not values in the Chinese context. 4. CValues is a way to test how well Chinese language models align with human values. 5. CValues looks at safety and responsibility to see if the models are doing a good job. Definitions- Language Models: Programs that can understand and use language like humans do. - Alignment: Making sure something matches or fits well with something else. - Human Values: The things that people believe are important or right, like being kind or fair. - Benchmark: A standard or test used to compare things and see how well they perform. - Criteria: The standards or rules used to judge or evaluate something. - Evaluation: Checking how good or effective something is by looking at different aspects of it.

Evaluating the Alignment of Chinese Language Models with Human Values

Introducing CValues

CValues is an evaluation benchmark that was manually created by researchers to assess the alignment between Chinese LLMs and human values across different aspects. The benchmark consists of adversarial safety prompts across 10 scenarios and responsibility prompts from 8 domains which were induced with help from professional experts. In order to provide a comprehensive evaluation of Chinese LLMs' values alignment, both automatic evaluations using multi-choice prompts and manual evaluations were employed for reliable comparison.

Findings

The findings indicate that while most Chinese LLMs perform well in terms of safety, there is still significant room for improvement regarding responsibility. This highlights the importance of both automatic and human evaluations when assessing how well these models align with human values across different aspects. The CValues benchmark code is available on ModelScope and Github platforms for further use by other researchers or developers interested in measuring their model's value alignment capabilities against those used in this study.

Conclusion

In summary, this research presents an essential contribution to evaluating the values alignment of Chinese LLMs by introducing CValues as a benchmark which focuses on safety and responsibility criteria to shed light on areas where improvements can be made to enhance their alignment with human values.

Created on 21 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

73.7%

Safety Assessment of Chinese Large Language Models

cs.CL

64.2%

From Query Tools to Causal Architects: Harnessing Large Language Models for A…

cs.AI

64.0%

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Larg…

cs.SE

62.2%

Benchmarking Large Language Models for News Summarization

cs.CL

61.8%

Large language models effectively leverage document-level context for literar…

cs.CL

61.4%

Rethinking the Evaluation for Conversational Recommendation in the Era of Lar…

cs.CL

61.3%

Language Models (Mostly) Know What They Know

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.