Intent Mining from past conversations for Conversational Agent
AI-generated Key Points
- Framework for intent mining from past conversations to improve conversational systems
- Chatbots used for round-the-clock support and customer engagement
- Challenges in building and training intent models due to lack of diverse training data
- Four-step intent discovery framework proposed by the authors
- Extract textual utterances using a pre-trained Dialog Act Classifier
- Automatically cluster similar user utterances using ITER-DBSCAN algorithm
- Manual annotation of clusters with intent labels by subject matter experts
- Propagation of intent labels to unmapped utterances
- Introduction of better sentence representation method using Universal Sentence Encoder
- Effectiveness demonstrated using Microsoft application IT support data and publicly available datasets
- Comprehensive framework improves coverage, accuracy, and saves time compared to manual annotation methods
- Potential applications beyond conversational systems in short text clustering and labeling tasks
- Future work includes scalability for larger datasets and exploration in general-purpose clustering tasks
Authors: Ajay Chatterjee, Shubhashis Sengupta
Abstract: Conversational systems are of primary interest in the AI community. Chatbots are increasingly being deployed to provide round-the-clock support and to increase customer engagement. Many of the commercial bot building frameworks follow a standard approach that requires one to build and train an intent model to recognize a user input. Intent models are trained in a supervised setting with a collection of textual utterance and intent label pairs. Gathering a substantial and wide coverage of training data for different intent is a bottleneck in the bot building process. Moreover, the cost of labeling a hundred to thousands of conversations with intent is a time consuming and laborious job. In this paper, we present an intent discovery framework that involves 4 primary steps: Extraction of textual utterances from a conversation using a pre-trained domain agnostic Dialog Act Classifier (Data Extraction), automatic clustering of similar user utterances (Clustering), manual annotation of clusters with an intent label (Labeling) and propagation of intent labels to the utterances from the previous step, which are not mapped to any cluster (Label Propagation); to generate intent training data from raw conversations. We have introduced a novel density-based clustering algorithm ITER-DBSCAN for unbalanced data clustering. Subject Matter Expert (Annotators with domain expertise) manually looks into the clustered user utterances and provides an intent label for discovery. We conducted user studies to validate the effectiveness of the trained intent model generated in terms of coverage of intents, accuracy and time saving concerning manual annotation. Although the system is developed for building an intent model for the conversational system, this framework can also be used for a short text clustering or as a labeling framework.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.