Testing for differential abundance in compositional counts data, with application to microbiome studies
Authors: Barak Brill, Amnon Amir, Ruth Heller
Abstract: In order to identify which taxa differ in the microbiome community across groups, the relative frequencies of the taxa are measured for each unit in the group by sequencing PCR amplicons. Statistical inference in this setting is challenging due to the high number of taxa compared to sampled units, low prevalence of some taxa, and strong correlations between the different taxa. Moreover, the total number of sequenced reads per sample is limited by the sequencing procedure. Thus, the data is compositional: a change of a taxon's abundance in the community induces a change in sequenced counts across all taxa. The data is sparse, with zero counts present either due to biological variance or limited sequencing depth, i.e. a technical zero. For low abundance taxa, the chance for technical zeros, is non-negligible and varies between sample groups. Compositional counts data poses a problem for standard normalization techniques since technical zeros cannot be normalized in a way that ensures equality of taxon distributions across sample groups. This problem is aggravated in settings where the condition studied severely affects the microbial load of the host. We introduce a novel approach for differential abundance testing of compositional data, with a non-neglible amount of "zeros". Our approach uses a set of reference taxa, which are non-differentially abundant. We suggest a data-adaptive approach for identifying a set of reference taxa from the data. We demonstrate that existing methods for differential abundance testing, including methods designed to address compositionality, do not provide control over the rate of false positive discoveries when the change in microbial load is vast. We demonstrate that methods using microbial load measurements do not provide valid inference, since the microbial load measured cannot adjust for technical zeros.
Explore the paper tree
Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.