Academically intelligent LLMs are not necessarily socially intelligent

AI-generated keywords: Social Intelligence Large Language Models Situational Evaluation of Social Intelligence (SESI) Academic Intelligence Real-World Scenarios

AI-generated Key Points

  • Developed standardized social intelligence test called Situational Evaluation of Social Intelligence (SESI) for assessing large language models (LLMs)
  • LLMs show significant progress in academic intelligence but room for improvement in social intelligence
  • Errors in social intelligence mainly due to superficial friendliness
  • Low correlation between social and academic intelligence in LLMs, indicating distinct abilities
  • LLMs' social intelligence influenced by various social factors like humans
  • SESI benchmark features long, complex, diverse social contexts with average length of 44.2 words and involving three or more active characters in 50% situations
  • Wide range of social relationship types included in benchmark, making it challenging
  • SESI assesses various dimensions of social intelligence beyond understanding contexts to achieving characters' social goals
  • Encourages detailed answers with average length of 25.8 words, focusing on substance rather than length
  • Evaluation included mainstream LLMs like OpenAI's GPT series, Vicuna, LLaMA 2-Chat, Mixtral against baseline benchmarks to assess knowledge and capabilities accurately
  • Emphasizes need for further development in LLMs' social intelligence and importance of considering both academic and social factors in evaluation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ruoxi Xu, Hongyu Lin, Xianpei Han, Le Sun, Yingfei Sun

License: CC BY 4.0

Abstract: The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear. Inspired by established human social intelligence frameworks, particularly Daniel Goleman's social intelligence theory, we have developed a standardized social intelligence test based on real-world social scenarios to comprehensively assess the social intelligence of LLMs, termed as the Situational Evaluation of Social Intelligence (SESI). We conducted an extensive evaluation with 13 recent popular and state-of-art LLM agents on SESI. The results indicate the social intelligence of LLMs still has significant room for improvement, with superficially friendliness as a primary reason for errors. Moreover, there exists a relatively low correlation between the social intelligence and academic intelligence exhibited by LLMs, suggesting that social intelligence is distinct from academic intelligence for LLMs. Additionally, while it is observed that LLMs can't ``understand'' what social intelligence is, their social intelligence, similar to that of humans, is influenced by social factors.

Submitted to arXiv on 11 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.06591v1

In this study, we developed a standardized social intelligence test called the Situational Evaluation of Social Intelligence (SESI) to assess the social intelligence of large language models (LLMs) based on real-world social scenarios. Our evaluation of 13 popular LLM agents on SESI revealed that while these models have made significant progress in academic intelligence, their social intelligence performance still has room for improvement. The results showed that errors in social intelligence were primarily due to superficial friendliness. Additionally, there was a low correlation between social and academic intelligence in LLMs, indicating that they are distinct abilities. Despite not fully understanding what social intelligence is, LLMs' social intelligence is influenced by various social factors just like humans. Further analysis of the SESI benchmark revealed that it features long, complex, and diverse social contexts with an average length of 44.2 words and involving three or more active characters in 50% of situations. The benchmark also encompasses a wide range of social relationship types, highlighting its challenging nature. SESI provides a comprehensive assessment across various dimensions of social intelligence beyond understanding contexts to achieving characters' social goals. Moreover, it encourages detailed and specific answers with an average length of 25.8 words, surpassing other common-sense reasoning benchmarks. The distribution of correct and incorrect answer lengths suggests that the benchmark focuses on substance rather than length in responses. The evaluation included mainstream LLMs such as OpenAI's GPT series, Vicuna, LLaMA 2-Chat, and Mixtral against baseline benchmarks like Natural Questions and Massive Multitask Language Understanding to accurately assess their knowledge and capabilities. Overall, this study highlights the need for further development in the social intelligence of LLMs and emphasizes the importance of considering both academic and social factors in evaluating these models' performance.
Created on 09 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.