What do Asian Religions Have in Common? An Unsupervised Text Analytics Exploration

AI-generated keywords: Sacred texts Text mining Similarity measures Supervised learning algorithms Bag of Words

AI-generated Key Points

The paper explores similarities between various sacred texts using text mining techniques
Sacred texts can vary based on factors such as geographical location or the time of the birth of a particular religion
Despite differences, there may be similarities in the lessons taught by these texts
The study uses Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts) as the corpus
Similarity is measured using different measures like Euclidean, Manhattan, Jaccard, and Cosine applied to raw Document Term Frequency (DTM) and normalized DTM
Supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest are used to predict the correct sacred text for any given chapter in the corpus
K-means clustering visualizations on Euclidean distances of raw DTM reveal patterns of similarity among the sacred texts
Upanishads and Tao Te Ching are found to be the most similar texts in the corpus
The research aims to find similarities between various sacred texts in terms of what they teach and how they teach religious lessons
Text mining using machine learning and feature extraction is employed to identify patterns in document collections
Similarity measures such as Euclidean, Manhattan, Jaccard, and Cosine are applied to analyze word frequency matrices and calculate distance matrices on Document Term Matrix formed by LDA (Latent Dirichlet Allocation)
Supervised learning algorithms including KNN, SVM, and Random Forest are trained on a labeled corpus to predict the origin of fragments of spiritual literature with accuracy measured for effectiveness in prediction

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Preeti Sah, Ernest Fokoué

arXiv: 1912.10847v1 - DOI (cs.CL)

18 pages, 22 figures

License: CC BY 4.0

Abstract: The main source of various religious teachings is their sacred texts which vary from religion to religion based on different factors like the geographical location or time of the birth of a particular religion. Despite these differences, there could be similarities between the sacred texts based on what lessons it teaches to its followers. This paper attempts to find the similarity using text mining techniques. The corpus consisting of Asian (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian (four Bible texts) is used to explore findings of similarity measures like Euclidean, Manhattan, Jaccard and Cosine on raw Document Term Frequency [DTM], normalized DTM which reveals similarity based on word usage. The performance of Supervised learning algorithms like K-Nearest Neighbor [KNN], Support Vector Machine [SVM] and Random Forest is measured based on its accuracy to predict correct scared text for any given chapter in the corpus. The K-means clustering visualizations on Euclidean distances of raw DTM reveals that there exists a pattern of similarity among these sacred texts with Upanishads and Tao Te Ching is the most similar text in the corpus.

Submitted to arXiv on 20 Dec. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1912.10847v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

This paper explores the similarities between various sacred texts using text mining techniques. The main source of religious teachings is their sacred texts, which can vary based on factors such as geographical location or the time of the birth of a particular religion. Despite these differences, there may be similarities in the lessons taught by these texts. The corpus used in this study consists of Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts). To measure similarity, different measures like Euclidean, Manhattan, Jaccard, and Cosine are applied to raw Document Term Frequency (DTM) and normalized DTM. These measures reveal similarity based on word usage. Additionally, supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest are used to predict the correct sacred text for any given chapter in the corpus. The study also includes K-means clustering visualizations on Euclidean distances of raw DTM. These visualizations reveal a pattern of similarity among the sacred texts, with Upanishads and Tao Te Ching being the most similar texts in the corpus. The research aims to find if there are any similarities between various sacred texts in terms of what they teach and how they teach religious lessons. Text mining using machine learning and feature extraction is employed to identify patterns in document collections. The study applies similarity measures such as Euclidean, Manhattan, Jaccard, and Cosine to analyze word frequency matrices and calculate distance matrices on Document Term Matrix formed by LDA (Latent Dirichlet Allocation) to determine similarities between texts based on probabilistic models. The paper also explores supervised learning algorithms including K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest which are trained on a labeled corpus to predict the origin of fragments of spiritual literature from a given chapter with accuracy measured for effectiveness in prediction. Overall this research aims to create a corpus where document is smallest unit of data after data cleaning followed by generating Bag of Words DTM for analysis through various similarity measures and predictive aspects so as to confirm or discover some closeness among sacred texts.

- The paper explores similarities between various sacred texts using text mining techniques
- Sacred texts can vary based on factors such as geographical location or the time of the birth of a particular religion
- Despite differences, there may be similarities in the lessons taught by these texts
- The study uses Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts) as the corpus
- Similarity is measured using different measures like Euclidean, Manhattan, Jaccard, and Cosine applied to raw Document Term Frequency (DTM) and normalized DTM
- Supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest are used to predict the correct sacred text for any given chapter in the corpus
- K-means clustering visualizations on Euclidean distances of raw DTM reveal patterns of similarity among the sacred texts
- Upanishads and Tao Te Ching are found to be the most similar texts in the corpus
- The research aims to find similarities between various sacred texts in terms of what they teach and how they teach religious lessons
- Text mining using machine learning and feature extraction is employed to identify patterns in document collections
- Similarity measures such as Euclidean, Manhattan, Jaccard, and Cosine are applied to analyze word frequency matrices and calculate distance matrices on Document Term Matrix formed by LDA (Latent Dirichlet Allocation)
- Supervised learning algorithms including KNN, SVM, and Random Forest are trained on a labeled corpus to predict the origin of fragments of spiritual literature with accuracy measured for effectiveness in prediction

This paper is about finding similarities between different sacred texts using a computer program. Sacred texts are important religious books. The researchers used Asian and non-Asian texts to study. They measured the similarity using different methods and used computer programs to predict which text a chapter belongs to. They found that Upanishads and Tao Te Ching are very similar. The goal of the research is to understand what these texts teach and how they teach it. They used special techniques to analyze the words in the texts and trained computers to make predictions." Definitions- Sacred texts: Important religious books - Corpus: A collection of texts used for study - Similarity: How much two things are alike - Measures: Different ways of measuring something - Algorithms: Step-by-step instructions for a computer program - Clustering visualizations: Showing patterns in groups of things

Exploring Similarities Between Sacred Texts Using Text Mining Techniques

Religious teachings are often found in sacred texts, which can vary based on geographical location or the time of the birth of a particular religion. Despite these differences, there may be similarities in the lessons taught by these texts. This paper explores how text mining techniques can be used to identify patterns and measure similarity between various sacred texts.

Data Collection and Preprocessing

The corpus used for this study consists of Asian texts (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian texts (four Bible texts). The data was preprocessed to remove punctuation marks and stopwords before being converted into Document Term Frequency (DTM). Latent Dirichlet Allocation (LDA) was then applied to generate a Bag of Words DTM for analysis.

Similarity Measures

Different measures such as Euclidean, Manhattan, Jaccard, and Cosine were applied to raw DTM and normalized DTM in order to measure similarity between documents. These measures reveal similarity based on word usage. K-means clustering visualizations were also generated using Euclidean distances of raw DTM which revealed a pattern of similarity among the sacred texts with Upanishads and Tao Te Ching being the most similar text in the corpus.

Supervised Learning Algorithms

In addition to measuring similarity between documents using distance matrices from LDA models, supervised learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest were used to predict the correct origin for any given chapter in the corpus. Accuracy was measured for effectiveness in prediction.

Conclusion

This research aimed at creating a corpus where document is smallest unit of data after data cleaning followed by generating Bag of Words DTM for analysis through various similarity measures and predictive aspects so as to confirm or discover some closeness among sacred texts. The results showed that there are similarities between different religious teachings across cultures when analyzed using text mining techniques like feature extraction methods combined with machine learning algorithms like KNN SVM etc., along with traditional distance metrics like Euclidean Manhattan Jaccard Cosine etc..

Created on 01 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

54.6%

KLUE: Korean Language Understanding Evaluation

cs.CL

51.1%

HICEM: A High-Coverage Emotion Model for Artificial Emotional Intelligence

cs.CL

50.1%

Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matri…

cs.CL

49.8%

API-Spector: an API-to-API Specification Recommendation Engine

cs.SE

48.6%

Trustworthy Social Bias Measurement

cs.CL

48.2%

PicHunt: Social Media Image Retrieval for Improved Law Enforcement

cs.MM

47.7%

Enlarging Instance-specific and Class-specific Information for Open-set Actio…

cs.CV

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.