What do Asian Religions Have in Common? An Unsupervised Text Analytics Exploration

AI-generated keywords: Sacred texts Text mining Similarity measures Supervised learning algorithms Bag of Words

AI-generated Key Points

  • The paper explores similarities between various sacred texts using text mining techniques
  • Sacred texts can vary based on factors such as geographical location or the time of the birth of a particular religion
  • Despite differences, there may be similarities in the lessons taught by these texts
  • The study uses Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts) as the corpus
  • Similarity is measured using different measures like Euclidean, Manhattan, Jaccard, and Cosine applied to raw Document Term Frequency (DTM) and normalized DTM
  • Supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest are used to predict the correct sacred text for any given chapter in the corpus
  • K-means clustering visualizations on Euclidean distances of raw DTM reveal patterns of similarity among the sacred texts
  • Upanishads and Tao Te Ching are found to be the most similar texts in the corpus
  • The research aims to find similarities between various sacred texts in terms of what they teach and how they teach religious lessons
  • Text mining using machine learning and feature extraction is employed to identify patterns in document collections
  • Similarity measures such as Euclidean, Manhattan, Jaccard, and Cosine are applied to analyze word frequency matrices and calculate distance matrices on Document Term Matrix formed by LDA (Latent Dirichlet Allocation)
  • Supervised learning algorithms including KNN, SVM, and Random Forest are trained on a labeled corpus to predict the origin of fragments of spiritual literature with accuracy measured for effectiveness in prediction
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Preeti Sah, Ernest Fokoué

18 pages, 22 figures
License: CC BY 4.0

Abstract: The main source of various religious teachings is their sacred texts which vary from religion to religion based on different factors like the geographical location or time of the birth of a particular religion. Despite these differences, there could be similarities between the sacred texts based on what lessons it teaches to its followers. This paper attempts to find the similarity using text mining techniques. The corpus consisting of Asian (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian (four Bible texts) is used to explore findings of similarity measures like Euclidean, Manhattan, Jaccard and Cosine on raw Document Term Frequency [DTM], normalized DTM which reveals similarity based on word usage. The performance of Supervised learning algorithms like K-Nearest Neighbor [KNN], Support Vector Machine [SVM] and Random Forest is measured based on its accuracy to predict correct scared text for any given chapter in the corpus. The K-means clustering visualizations on Euclidean distances of raw DTM reveals that there exists a pattern of similarity among these sacred texts with Upanishads and Tao Te Ching is the most similar text in the corpus.

Submitted to arXiv on 20 Dec. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1912.10847v1

This paper explores the similarities between various sacred texts using text mining techniques. The main source of religious teachings is their sacred texts, which can vary based on factors such as geographical location or the time of the birth of a particular religion. Despite these differences, there may be similarities in the lessons taught by these texts. The corpus used in this study consists of Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts). To measure similarity, different measures like Euclidean, Manhattan, Jaccard, and Cosine are applied to raw Document Term Frequency (DTM) and normalized DTM. These measures reveal similarity based on word usage. Additionally, supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest are used to predict the correct sacred text for any given chapter in the corpus. The study also includes K-means clustering visualizations on Euclidean distances of raw DTM. These visualizations reveal a pattern of similarity among the sacred texts, with Upanishads and Tao Te Ching being the most similar texts in the corpus. The research aims to find if there are any similarities between various sacred texts in terms of what they teach and how they teach religious lessons. Text mining using machine learning and feature extraction is employed to identify patterns in document collections. The study applies similarity measures such as Euclidean, Manhattan, Jaccard, and Cosine to analyze word frequency matrices and calculate distance matrices on Document Term Matrix formed by LDA (Latent Dirichlet Allocation) to determine similarities between texts based on probabilistic models. The paper also explores supervised learning algorithms including K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest which are trained on a labeled corpus to predict the origin of fragments of spiritual literature from a given chapter with accuracy measured for effectiveness in prediction. Overall this research aims to create a corpus where document is smallest unit of data after data cleaning followed by generating Bag of Words DTM for analysis through various similarity measures and predictive aspects so as to confirm or discover some closeness among sacred texts.
Created on 01 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.