Cross-modal Memory Networks for Radiology Report Generation

AI-generated keywords: Medical imaging Radiology report generation Cross-modal Memory Networks (CMN) Clinical automation Artificial intelligence

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Medical imaging is crucial for medical diagnosis in clinical practice
  • Text reports of images are vital for understanding and guiding treatments
  • Automating report generation helps radiologists and advances clinical automation
  • Cross-modal mappings enhance radiology report generation
  • Cross-modal Memory Networks (CMN) framework improves encoder-decoder models
  • CMN captures alignment between images and texts for accurate reports
  • CMN model demonstrates state-of-the-art performance on benchmark datasets
  • CMN excels at aligning information from images and texts for improved accuracy
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan

Natural Language Processing. 11 pages, 6 figures. ACL-IJCNLP 2021

Abstract: Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.

Submitted to arXiv on 28 Apr. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2204.13258v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Medical imaging is a crucial component of medical diagnosis in clinical practice. Text reports of images play a vital role in understanding and guiding subsequent treatments. Automating the generation of these reports not only helps alleviate the workload of radiologists but also advances clinical automation, a field gaining traction in the application of artificial intelligence to healthcare. Previous studies have predominantly focused on the encoder-decoder paradigm for text generation, but few have explored the significance of cross-modal mappings in enhancing radiology report generation. In response to this gap, a novel approach called Cross-modal Memory Networks (CMN) is introduced in this paper. The CMN framework enhances the traditional encoder-decoder model by incorporating a shared memory mechanism that captures the alignment between images and texts. This shared memory facilitates seamless interaction and generation across modalities, leading to more accurate and contextually relevant reports. Experimental results demonstrate the efficacy of the proposed CMN model, showcasing state-of-the-art performance on two widely used benchmark datasets: IU X-Ray and MIMIC-CXR. Further analysis confirms that the CMN model excels at aligning information from radiology images and texts, resulting in improved accuracy in generating reports with respect to clinical indicators. The authors - Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan - contribute significantly to advancing the field of radiology report generation through their innovative approach that leverages cross-modal memory networks for enhanced performance and efficiency in medical imaging analysis.
Created on 06 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.