From Scattered Sources to Comprehensive Technology Landscape: A Recommendation-based Retrieval Approach

AI-generated keywords: Technology monitoring

AI-generated Key Points

  • End-to-end framework for extracting and classifying technological mentions from company corpuses
  • Technology classifier based on DistilBERT model with fine-tuning for better accuracy
  • Recommendation-based retrieval model for more relevant results
  • Comparison with tf-idf based retrieval method, showing significant outperformance in both company-company and technology-company retrieval tasks
  • Promise of the proposed framework in automating technology and company retrieval from raw web data by leveraging advanced language models and recommendation-based techniques
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chi Thang Duong, Dimitri Percia David, Ljiljana Dolamic, Alain Mermoud, Vincent Lenders, Karl Aberer

License: CC BY 4.0

Abstract: Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.

Submitted to arXiv on 09 Dec. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2112.04810v1

, , , , The paper presents an end-to-end framework for extracting and classifying technological mentions from company corpuses, as well as retrieving related technologies and companies. The technology classifier is based on the DistilBERT model with fine-tuning and refinement to achieve better accuracy. The recommendation-based retrieval model enables more relevant results. The authors also discuss existing works in technology monitoring and forecasting, specifically in the context of technology landscape monitoring. They mention various methods such as keyword-based or entity-based approaches for technology retrieval, citing examples like automated frameworks for detecting new technologies in texts, measuring proximity between patents and a company's technological footprint, and developing models for technology forecasting based on text mining techniques. Furthermore, the authors compare their approach with a tf-idf based retrieval method using tf-idf as a feature for technology classification and retrieval. Their approach significantly outperforms the baseline in both company-company retrieval and technology-company retrieval tasks. The improved performance is attributed to their better technology classifier utilizing a language model and their recommendation retrieval model considering relationships between companies, technologies, and their similarities simultaneously. Overall, the proposed framework shows promise in automating the retrieval of technologies and associated companies from raw web data by leveraging advanced language models and recommendation-based techniques. Future works may include allowing users to reformulate queries for better capturing intentions or combining technology classification with entity extraction for more accurate results.
Created on 02 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.