Testing for differential abundance in compositional counts data, with application to microbiome studies
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Study focuses on identifying differences in microbiome community across different groups
- Measurement of relative frequencies of taxa using sequencing PCR amplicons
- Statistical inference is challenging due to high number of taxa and strong correlations between them
- Data is compositional and sparse with technical zeros present
- Proposed novel approach for differential abundance testing using a set of reference taxa and data-adaptive method for identifying them
- Existing methods do not provide control over false positive discoveries or valid inference in certain scenarios
- Valuable contribution to the field by addressing limitations of existing methods and providing new approach for analyzing compositional counts data with technical zeros
Authors: Barak Brill, Amnon Amir, Ruth Heller
Abstract: In order to identify which taxa differ in the microbiome community across groups, the relative frequencies of the taxa are measured for each unit in the group by sequencing PCR amplicons. Statistical inference in this setting is challenging due to the high number of taxa compared to sampled units, low prevalence of some taxa, and strong correlations between the different taxa. Moreover, the total number of sequenced reads per sample is limited by the sequencing procedure. Thus, the data is compositional: a change of a taxon's abundance in the community induces a change in sequenced counts across all taxa. The data is sparse, with zero counts present either due to biological variance or limited sequencing depth, i.e. a technical zero. For low abundance taxa, the chance for technical zeros, is non-negligible and varies between sample groups. Compositional counts data poses a problem for standard normalization techniques since technical zeros cannot be normalized in a way that ensures equality of taxon distributions across sample groups. This problem is aggravated in settings where the condition studied severely affects the microbial load of the host. We introduce a novel approach for differential abundance testing of compositional data, with a non-neglible amount of "zeros". Our approach uses a set of reference taxa, which are non-differentially abundant. We suggest a data-adaptive approach for identifying a set of reference taxa from the data. We demonstrate that existing methods for differential abundance testing, including methods designed to address compositionality, do not provide control over the rate of false positive discoveries when the change in microbial load is vast. We demonstrate that methods using microbial load measurements do not provide valid inference, since the microbial load measured cannot adjust for technical zeros.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.