Perspectives Paper
A critical perspective on interpreting amplicon sequencing data in soil ecological research

https://doi.org/10.1016/j.soilbio.2021.108357Get rights and content
Under a Creative Commons license
open access

Highlights

  • Soil complexity necessitates careful interpretation of sequencing data.

  • Studies often do not account for data compositionality, leading to misinterpretation.

  • Functions should not be inferred from phylogeny as they are rarely conserved.

  • We discuss complementary approaches that help to improve ecological insights.

  • We call for journals and authors to improve study reproducibility and data availability.

Abstract

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation, and availability to improve the rigor of future studies.

Keywords

Amplicon sequencing
Soil metabarcoding
Soil microorganisms
Soil microbial diversity
Soil complexity
Compositional data

1. Introduction

Soil is one of the most biologically diverse and heterogeneous ecosystems, presenting unique challenges to scientists in the fields of soil and microbial ecology (Bickel and Or, 2020). The critical role of microorganisms as drivers of biogeochemical processes is well-documented, and a major goal of soil ecology remains to decipher the link between the diversity of soil microbial communities, and their function in the environment (Hinsinger et al., 2009; Manzoni et al., 2012). Historically, studies of microbial communities revealed a rather narrow perspective of the diversity by targeting mainly cultivable bacteria, taxa of high abundance, or microorganisms grouped according to morphological or physiological properties (Staley and Konopka, 1985; Steen et al., 2019; Frostegård et al., 2011). The introduction of next-generation sequencing technologies such as amplicon sequencing has revolutionized our ability to characterize microbial diversity by enabling the investigation of community composition at a much greater phylogenetic resolution than ever before.

Amplicon sequencing (also termed metabarcoding) is based on PCR-amplification of variable regions of DNA within conserved phylogenetic or functional marker genes (Gołbiewski and Tretyn, 2019; Semenov, 2021) - see also Supplementary Table S1 for examples. The accessibility of established assays, the affordability, as well as the availability of free analysis software packages have facilitated the broad use of amplicon sequencing for characterization of the microbial diversity in environmental samples (Caporaso et al., 2012). In the field of soil science, its application has accelerated in the last decade as evidenced by the growing number of studies published in specialized soil journals (Fig. 1). The majority of these manuscripts report the analysis of soil community composition and diversity based on phylogenetic marker genes such as the 16S rRNA gene for bacteria and archaea as well as internal transcribed spacer (ITS) regions for fungi. In addition, functional genes can be targeted to obtain information on the organism that may contribute to a specific environmental process (Angel et al., 2018; Séneca et al., 2020; Aigle et al., 2020).

Fig. 1
  1. Download : Download high-res image (410KB)
  2. Download : Download full-size image

Fig. 1. Increase in the number of articles using amplicon sequencing in soil microbiome research published in soil science journals (as defined in Web of Science, www.webofknowledge.com). Bars represent the total number of articles using amplicon sequencing, whereas the blue points and line represent their percentage of the total number of articles published in those journals per year from 1990 to 2020. The pie chart represents the number of articles in the top ten contributing soil science journals in 2020 (as total number of articles). Numbers inside the chart represent the number of articles using amplicon sequencing (only reported for the top three journals), while the numbers outside the chart represent the percentage of the total number of articles for each journal. See Supplementary file for a more detailed description of methods and the complete list of journals (Table S2). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Such work has enabled researchers to successfully investigate the composition and dynamics of soil microbial communities. Our understanding of microbial diversity has increased dramatically and the activity of microbial communities has now been widely recognized as central in the field of soil science where research questions were historically often tackled from the perspective of individual disciplines such as chemistry, physics, and biology (Baveye et al., 2018). As evident by the high number of studies being published in recent years, it is safe to say that microbial community analysis via marker-gene sequencing has become a standard tool in soil research. At this stage, it is necessary to discuss potentials, challenges, and pitfalls of the technique applied by soil scientists.

In this perspective, we aim to describe the unique challenges of studying microbial communities in soil ecosystems, and to address common misconceptions in the analysis and interpretation of amplicon sequencing data. Patterns often arise in community data, but the interpretation of these patterns in a soil context remains challenging and limited due to the poor link between the sequenced marker gene regions and microbial functions, as well as the compositional nature of the data itself. We provide suggestions for designing sequencing experiments and analyzing data to gain improved insights into microbial community structure and dynamics within the context of the complex soil environment. Amplicon sequencing, when used as part of a well-designed experiment, represents an informative approach for investigating microbial community structure and correlations between taxa and environmental parameters, as well as for developing new hypotheses and testing old and new hypotheses regarding microbial community dynamics.

2. Technical considerations in a heterogeneous and diverse habitat

The diversity of microorganisms in soil has been well-documented as a major challenge in studying soil microbial communities (Gans, 2005; Fierer and Jackson, 2006). A single gram of soil is estimated to contain 108-109 cells (Bloem et al., 1995; Nunan et al., 2001) and tens of thousands of microbial taxa (Roesch et al., 2007). Additionally, compared to host-associated microbiomes (e.g., gut, skin, or plant root microbiome), free-living bacteria exhibit higher levels of diversity. In a recent comparison of alpha-, beta-, and gamma-diversity from samples collected as part of the Earth Microbiome Project (EMP), soils were determined to have the highest alpha-diversity across all environments (Walters and Martiny, 2020). In terms of beta- and gamma-diversity, soil came in second only to sediment samples. Fewer studies have investigated the diversity and global distribution of fungi (Tedersoo et al., 2014; Větrovský et al., 2019). These studies indicate that more heterogeneous environments, such as soils and sediments, may contain more diverse fungal communities that more homogeneous habitats (e.g., marine, freshwater, air, biofilms) (Fierer and Lennon, 2011; Walters and Martiny, 2020; Torsvik, 2002).

In this perspective, we discuss amplicon sequencing as it represents the most cost- and time-effective sequencing method for studying soil microbial communities including uncultivable organisms. While other sequencing approaches such as metagenomics and metatranscriptomics yield more detailed information on the functional potential and the transcriptional profile of soil microbial communities, they are limited by higher sequencing costs and challenges related to data analysis. These constraints pose limitations particularly with regard to proper replication and the resulting workload (Gołębiewski and Tretyn, 2020; Vestergaard et al., 2017). Researchers interested in studying the microbial composition of soils via amplicon sequencing are confronted with technical challenges throughout the sample processing workflow. The general workflow of amplicon sequencing includes: 1) planning and implementation of the experimental design, 2) nucleic acid extraction (including quality control), 3) primer choice, PCR amplification, sequencing, 4) processing and analysis of sequence data, and 5) data interpretation (Fig. 2). At each of these steps, a subset of the sample is selected and information can be lost as a result of the techniques applied (i.e., nucleic acid extraction method, primer selection, statistical approaches), with consequences for data interpretation in the context of ecological questions (Morton et al., 2019; McLaren et al., 2019). As with any scientific experiment, the specific hypotheses to be addressed should determine the experimental design. Besides this, in experiments involving amplicon sequencing, one must consider the appropriate spatial scale (i.e., aggregate/microscale, centimetre scale, meter scale) and the frequency of sampling in order to address specific questions regarding community dynamics. While the sample that is sequenced represents the specific moment in time when it was frozen or extracted, the presence of exogenous or relic DNA in soil samples has the potential to influence community composition and downstream data interpretation ((Lennon et al., 2018; Carini et al., 2016); discussed in section 5). Additionally, sample replication remains a critical concern in soil studies, particularly when it comes to statistical inference and/or construction of co-occurrence networks (discussed in sections 5 and 6).

Fig. 2
  1. Download : Download high-res image (304KB)
  2. Download : Download full-size image

Fig. 2. The main steps of an amplicon sequencing analysis workflow. Items listed in gray boxes represent critical selection points at each step of the amplicon sequencing process that may strongly influence the robustness and direction of the results.

The physicochemical properties of soils make nucleic acid extraction from this matrix particularly challenging. Numerous extraction protocols and kits have been developed to circumvent challenges with DNA extraction from soil, however, each method introduces distinct bias on the subset of the microbial community retrieved (Terrat et al., 2011; Zielińska et al., 2017; Dopheide et al., 2018). Using a spike-in before nucleic acid extraction can help to characterize this bias (see section 4). The presence of inhibitors, such as humic substances, is common in soil and can reduce the quality and purity of nucleic acids in the extracted samples and decrease the efficiency of reverse transcription and/or PCR reactions (Schrader et al., 2012). In addition to the nucleic acid extraction method of choice (chemical or physical lysis, DNA and/or RNA extraction), primer selection dictates the organisms or functions targeted by the approach (phylogenetic or functional marker; see Table S1). Finally, due to the diversity and heterogeneity of soil samples the resulting data is often sparse, containing numerous taxa with low abundance and prevalence which may be dealt with through filtering thresholds or statistical approaches (see section 3). The loss of information at each step of the process - from sampling to analysis - must be carefully considered in light of amplicon sequencing data interpretation. Keeping all these factors in mind, the application of sequencing technologies to soil has provided invaluable information regarding the structure and critical nature of understanding microbial communities.

3. Challenges in amplicon sequence data analysis

3.1. Primer selection dictates phylogenetic coverage

As choice of primers can influence the taxa observed in an amplicon sequencing dataset, it is of utmost importance to take care with regard to primer selection and the interpretation of resulting data as to the community changes/impacts of treatments. Given the high diversity of soil communities (of which the understanding is constantly growing due to massive sequencing efforts), no primer pair will cover the complete phylogenetic breadth, especially on a high rank such as domain (e.g., bacteria, archaea, fungi). As a consequence, part of the communities will always be missing. This is inevitable but it is of particular concern when studies attribute soil functions to taxa found to be “rare” in their amplicon sequencing data due to low coverage of that group (Chen et al., 2020). Nevertheless, evaluated and recommended primer pairs are available including the updated versions of 515F-806R primers for surveys of archaea and bacteria (e.g., https://earthmicrobiome.org/). Care must also be taken with regard to amplification of non-target organisms, particularly in samples with a high proportion of plant material (e.g. root samples). In this case, PNA clamps can be used to block amplification of plant chloroplasts and mitochondria and to “enrich” for microbial target sequences (Lucaciu et al., 2019). Primer selection is even more challenging for studies of eukaryotes, owing to hypervariable sequence lengths and multiple gene copy numbers (due to multiple operons and/or polykaryosis) that contribute to biased amplification of some phylogenetic groups during PCR. This bias may for example lead to the under-estimation of some fungal groups, having a downstream effect on diversity estimates (Baldrian et al., 2021). Arbuscular mycorrhizal fungi for instance are largely overlooked by commonly used ITS primers which could lead investigators to infer that arbuscular mycorrhizal fungi are rare (George et al., 2019). A promising alternative to ITS-targeted short-read sequencing is the use of long-read sequencing (e.g., PacBio) which enables the investigation of most fungi (including Glomeromycota) and other soil eukaryotes through covering both the full ITS region and part of the small subunit of the rRNA gene (Tedersoo and Anslan, 2019; Tedersoo et al., 2020). We refer readers to in-depth reviews that further discuss challenges regarding amplicon sequencing of fungi specifically, including discussion of primer selection and coverage (Nilsson et al., 2019; Baldrian et al., 2021).

The choice of primers has substantial impacts on estimates of diversity in community studies. As a consequence, we urge researchers to use tools such as TestPrime (https://www.arb-silva.de/search/testprime/) to evaluate the current status of the coverage of their target microbial groups of interest before sequencing and to discuss this aspect in their publications. We also recommend that reviewers critically assess the coverage of the target group of organisms used in a study to improve future evaluation of sequencing-based research in soil ecology.

3.2. Compositionality necessitates careful data processing

One of the first steps in the analysis of amplicon sequencing data is the removal of potential sequencing errors. Doing so eliminates sequencing artefacts that may falsely boost diversity levels (Edgar et al., 2011; Haas et al., 2011). The use of amplicon sequencing variants (ASVs), instead of operational taxonomic units (OTUs) helps to overcome this issue by assigning a greater probability of a true biological sequence being more abundant than an error-containing sequence (Callahan et al., 2017)⁠. To that end, bioinformatic tools such as DADA2 (Callahan et al., 2016)⁠ and Deblur (Amir et al., 2017)⁠ attempt to use sequencing error profiles to resolve amplicon sequencing data into ASVs. An ASV is more likely to have an intrinsic biological meaning (i.e., being a true DNA sequence), as opposed to an OTU which can either be a representation of the most abundant biological sequence or a consensus sequence (of which the latter may not exist in reality). In addition, ASVs facilitate the merging of datasets, particularly when the same sequencing primer pairs are used.

Another relevant step when analyzing sequencing data is to account for the different sequencing efforts across samples (i.e., sequencing depth) that can result in a substantially different number of recovered reads even among replicates. Ways to tackle this include total library size normalization and rarefaction, with both remaining debated to date (McMurdie and Holmes, 2014; Weiss et al., 2017)⁠. Both approaches attempt to address differences in library size across sequenced samples. However, the admissibility of rarefaction has recently come into question as it may remove data particularly of rare community members from downstream analyses. The alternative presented by library size normalization attempts to address this using a “waste not, want not” approach. In this case, low-abundance taxa are retained that may actually play an important role in the environment and could otherwise be discarded when rarefaction is used.

As a result of variation in diversity across samples and the high frequency of low-abundance taxa, resulting OTU/ASV matrices are often zero-inflated. Bioinformatic tools such as DeSeq2 and EdgeR provide ways to normalize count tables (Love et al., 2014; Robinson and Oshlack, 2010)⁠. Both methods are applied on raw or low-abundance filtered count tables, and have performed well in both real as well as simulated datasets and outperform rarefaction-based approaches (McMurdie and Holmes, 2014)⁠. Other alternatives that account for the compositional aspect of sequencing data include centered log-ratio (CLR), isometric log-ratio (ILR) or additive log-ratio (ALR) transformations on a count data matrix with adequate replacements of zeros (Aitchison, 1984; Egozcue, 2003).

Following data normalization, traditional workflows include the generation of distance matrices for ordination, clustering, and variance partitioning analyses (for useful software packages see also Table S3). Commonly used distance metrics include Bray-Curtis, Jaccard and Unifrac (weighted and unweighted). These metrics are often used although they do not take into account the compositional nature of sequencing data. The Aitchison distance - defined as the Euclidian distance on top of a centered log-ratio transformed count matrix – is a viable compositional alternative (Aitchison, 1984) that allows performing ordinations (e.g., PCA biplots). Additionally, the “Philr” transformation metric has been introduced as a compositional alternative to the weighted Unifrac that carries phylogenetic information (Silverman et al., 2017)⁠. Most of the above-mentioned compositional options are implemented in R packages and include publicly available tutorials. In light of the challenges related to normalization and analysis of compositional data, we recommend a critical evaluation of available data analysis tools by reviewing bench-marking papers (e.g. Starke et al., 2021; Prodan et al., 2020; Zemb et al., 2020) to best address the nature of each experimental setup (see also section 6).

Another aspect that prevents data analyses from being fully quantitative is the potential of multiple copies of marker genes present per organism, which may also vary across taxa. For example, the number 16S rRNA genes can vary from 1 to 18 across strains of the same species (Stoddard et al., 2014; Větrovský and Baldrian, 2013), and thus markers such a 16S rRNA genes can lead to inaccurate estimates of the relative abundance and diversity of microbial communities. Several computational tools can correct amplicon datasets for the abundance of the 16S rRNA genes based on existing genome information (e.g., PICRUSt2 (Douglas et al., 2020) and CopyRighter (Angly et al., 2014)). However, correcting for 16S rRNA gene abundance in sequencing surveys remains challenging, particularly for soil, as the gene abundances are only known for a subset of the soil microbes (Louca et al., 2018; Nunan et al., 2020). This challenge becomes even more problematic for marker genes of fungi and other eukaryotes, such as protists, as gene abundance per nucleus or per unit of biomass here can vary drastically between taxa and with organism age (Gong et al., 2013; Gong and Marchetti, 2019). Other housekeeping genes, which occur only once in a genome, have been proposed as universal phylogenetic marker genes (such as recA (Eisen, 1995)), but their use remains limited due to lower phylogenetic resolution and limited availability in databases.

3.3. Insufficient data availability contributes to a lack of reproducibility

Reproducibility and reusability of research results are predicated on sharing data and analysis scripts, a topic of growing relevance in light of increasing amounts of sequencing data obtained from soils around the globe and with the increasing complexity of analyses. Proper data sharing practices allow researchers to re-analyze specific aspects of published datasets, and/or investigate patterns in soil communities across datasets in the form of meta-analyses. A prerequisite to ensure data storage and availability in a useable format is that authors are required to do so by respective journals. In order to assess the current state of data deposition in the field, we searched the author guidelines of the 10 specialized soil journals contributing most sequencing studies (see Fig. 1 for reference). Out of the 10 journals, many “encourage their authors to make data available” while only 2 journals specifically require sequencing data to be deposited in public repositories such as GenBank before a manuscript is accepted for publication. Even if authors feel encouraged to comply, storage of their data in a repository does not always facilitate reproducibility of the reported research. Deposited datasets often contain only raw results from whole sequencing runs, and provide little meaningful information on the individual amplicons and on the corresponding metadata. As a consequence, it may be difficult to reconstitute the exact datasets used for the reported statistics and illustrations from such data. This requires that the applied quality filters and processing steps (see section 3.1), as well as the versions of applied software packages, be precisely reported.

Consequently, we call on all specialized soil journals that accept and publish sequencing data to (i) provide community standards for reproducible data analysis in their data policy statements and (ii) require the submission of sequencing data, ASV/OTU tables, together with sample metadata, to open repositories (such as GenBank, Dryad, or FigShare) and (iii) require that analysis scripts be made available on open hosting services (such as GitHub) or accompany the publication as a supplement. These steps will greatly facilitate reproducibility, open science, and meta-analyses.

4. Addressing and interpreting compositional sequencing data

4.1. Interpreting relative abundance data

The compositionality of amplicon sequencing data presents challenges to the interpretation of changes in microbial community structure. The amount of sequence data obtained through high-throughput sequencing is a fixed value, resulting in a random sampling of sequences from a sample that cannot be directly linked to absolute abundance based on sequences alone (Gloor et al., 2017). Numerous studies have revealed shifts in microbial community composition across treatments including gradients of temperature, pH, and salinity, as well as seasonal or temporal parameters. This practice may be suitable on a community level when community shifts are of interest (e.g., phylum level), and has been reported to yield comparable ecological conclusions as data generated with quantitative approaches (Piwosz et al., 2020). However, at higher taxonomic resolution (e.g., genus level), quantitative inferences from relative abundance sequencing data become more challenging. Due to the nature of sequencing, a change in the relative abundance of one species is always reflected in a corresponding change in one or more other species. We depict such challenges in interpretation in the following thought experiment.

Amplicon sequencing data obtained from the same soil sample at two different time points (t1, t2) consists of two species (A, B). The relative abundance observed for species A and B is 0.55 and 0.45 at time point 1 (t1), and 0.8 and 0.2 at time point 2 (t2), respectively (Fig. 3). From t1 to t2, species B decreases in relative abundance coupled to an increase in the relative abundance of species A. The bars below (t2a-t2e) illustrate five examples of changes in absolute abundance in t2 that could underlie the patterns observed in relative abundance data. The initial time point (t1) is also shown for comparison.

Fig. 3
  1. Download : Download high-res image (252KB)
  2. Download : Download full-size image

Fig. 3. Relationship between the relative abundance of species as observed via amplicon sequencing and their absolute abundances. The upper panel shows the relative abundance of two species A (shades of blue) and B (shades of pink) at two time points in a thought experiment. From t1 to t2, a decrease in relative abundance of species B is observed, coupled to an increase in relative abundance of species A. The relative abundance pattern observed at t2 could have been caused by five changes in the biomass and absolute abundance of the microbial community as shown in the lower panel. Time point t1 is shown for comparison. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

The first case represents a situation where the absolute abundance matches the relative abundance observations. There are no changes in total biomass from t1 to t2 and species A increases, whereas species B decreases (Fig. 3, t2a). The second case depicts an increase in overall biomass between t1 and t2 caused by an absolute increase in species A and no absolute changes in species B (Fig. 3, t2b). The third case represents an opposite scenario where the decreases in total biomass between t1 to t2 is caused by a decrease in species B and no changes in species A (Fig. 3, t2c). The fourth case represents a situation where there is a general increase in biomass from t1 to t2 prompted by increases in absolute abundances of both species A and B (Fig. 3, t2d), while the fifth case represents an opposite scenario (Fig. 3, t2e). For some of these examples, observed changes in relative abundance may accurately reflect true biological changes (t2a, t2d and t2e), whereas interpretation of the community shifts that underlie observed patterns remains more difficult for the other scenarios (t2b and t2c). Without information on absolute abundances, there is still room for ambiguous interpretations solely based on relative abundance plots (see section 4.2). This theoretical exercise shows, that even for a community of only two member species, there are five potential scenarios of changes in the absolute abundance that could cause the observed shift in relative abundance. Given that soil communities usually harbour thousands of species, the degree of complexity increases dramatically.

4.2. Experimental approaches to address compositionality

The challenge of interpreting relative abundance data as illustrated in Fig. 3 indicates the advantages of adding quantitative information to current amplicon sequencing approaches. Knowledge on absolute values (e.g., total microbial biomass) can help to make more robust inferences about the nature of observed shifts in microbial community structure (Fig. 3, t2d and t2e; (Barlow et al., 2020; Wang et al., 2021). In the following, we discuss some approaches ranging from molecular techniques to classic soil microbiology that could help improve our interpretation of amplicon sequencing data.

4.2.1. Quantitative PCR approaches

One relatively affordable and well-established quantitative method is quantitative real-time PCR (qPCR). qPCR enables to assess abundance of a marker gene which may be multiplied by the relative abundance data of the same sample obtained by amplicon sequencing. This approach benefits strongly from using the same primers in both qPCR and sequencing to reduce bias stemming from PCR (see section 2) and from correcting for multiple occurrences of said marker gene in the genome of target organisms.

A relatively novel alternative to traditional qPCR is digital PCR (dPCR) which requires no external standard for quantification, offers higher precision, and is relatively unaffected by the presence of PCR inhibitors. This represents a tremendous advantage when working with nucleic acid extracts from soil (Dong et al., 2015). However, like standard qPCR, the efficiency of this method is affected by the degeneracy of the primers, which means particular care must be taken during primer choice (see section 3.1). In addition, both dPCR and qPCR are limited in terms of absolute quantification of the fungal ITS gene due to the hypervariable target region and its variable-length (Nilsson et al., 2019).

A major advantage of both quantitative PCR approaches is the possibility of using the same DNA extracts as for the community profiling without additional sample processing that would be required for other methods (see sections 4.2.2 - 4.2.4). Consequently, quantitative PCR approaches have been used successfully to address the compositionality of sequencing data and can aid in the interpretation of microbial community data in soil (e.g. (Tkacz et al., 2018; Zemb et al., 2020; Vandeputte et al., 2017; Kleyer et al., 2017)).

4.2.2. Spike-ins

Introducing an internal standard (also called a spike-in) can be a useful tool toward achieving more quantitative amplicon data analyses. Spike-ins can be introduced in the form of microbial cells (Stämmler et al., 2016) or as selected DNA sequences (Tkacz et al., 2018; Hardwick et al., 2018; Wang et al., 2021). The spike should be uniquely detectable as a non-member of the existing microbial community, and should not be introduced in concentrations that would shift the sequencing effort towards it. Additionally, the timing of the addition will determine the type of information retrieved. While adding the spike after extraction can provide good estimates of amplification and/or sequencing biases, it does not take extraction efficiency into account (Hardwick et al., 2018; Stämmler et al., 2016)⁠. A recent amplicon sequencing study applied a synthetic DNA spike of known concentration to faecal samples prior to extraction. They combined this with qPCR quantification to calculate the number of gene copies after accounting for the extraction yield. The ratio of each OTU against the initial concentration of 16S rRNA genes was used to calculate more accurate abundance levels of each OTU after taking extraction efficiency into account (Zemb et al., 2020)⁠. If performed in a comparable manner, spike-ins represent an essential tool to determine abundances of taxa more quantitatively via sequencing in future soil studies.

4.2.3. Direct cell counts

Another approach towards absolute abundance data from soil communities are direct cell counts obtained through fluorescence microscopy (Bloem et al., 1995) or fluorescence-activated cell counting (Khalili et al., 2019; Frossard et al., 2016) of cells liberated from soil particle surfaces (Riis et al., 1998; Lentendu et al., 2013). Total counts help to assess the absolute abundance of microbial cells that fall within a certain range of parameters such as cell size and morphology. Cell counting approaches remain more straightforward for single-cell archaea and bacteria than for filamentous bacteria, fungi or other soil eukaryotes. The success of cell counting can be negatively affected by soil autofluorescence (low signal-to-noise ratio), partial separation of microbial cells from soil particles, or masking the detection of cells by overlaying soil particles. Nevertheless, assessing the number of cells in samples also subjected to sequencing may help to estimate changes in absolute abundance and to better interpret sequencing data (Fig. 3).

In addition, the observation and enumeration of target species of interest through marker-based approaches (e.g., FISH: fluorescence in situ hybridization) enables the quantification of absolute abundances of those species identified through sequencing. This practice not only allows soil ecologists to verify if the change observed in relative abundance indeed translates to shifts in the community by counting taxa of interest on filters (Piwosz et al., 2020), but also expands the interpretation of sequencing data to localize and visualize species of interest in situ (e.g., on roots (Martin et al., 2020)) and to elucidate ecological implications behind changing abundances of target species in soil samples. Applications of FISH in conjunction with amplicon sequencing to soil samples are surprisingly rare albeit such targeted localization and enumeration is a powerful tool to understand the dynamics of certain phylogenetic groups in soil on a quantitative basis. A major reason may be that microscopy in undisturbed soil samples is limited by the opacity of soil and auto-fluorescence of soil particles. Nevertheless, the preparation of thin soil layers or cells extracted from soil on filters allow targeting individual microorganisms of interest (Eichorst et al., 2015; Eickhorst and Tippkötter, 2008). In addition, newer methods to increase signal-to-noise ratios have been developed to overcome challenges related to soil auto-fluorescence (Lukumbuzya et al., 2019; Stoecker et al., 2010).

4.2.4. Classical soil biogeochemical methods

Traditional soil biogeochemical approaches enable the quantification of total microbial biomass in soil, including methods such as chloroform fumigation extraction (CFE) (Brookes et al., 1985), phospholipid fatty acid (PLFA) profiling (Frostegård et al., 1991, 2011; Buyer and Sasser, 2012) and ergosterol measurements (Joergensen and Wichern, 2008; Montgomery et al., 2000). In contrast to PCR-based methods, they assess the concentration of chemical microbial biomarkers in soil directly, thereby avoiding biases introduced by amplification of the target molecules. For example, such quantitative information regarding an increase or decrease in total microbial biomass between treatments would complement corresponding shifts in relative abundance data as observed via amplicon sequencing (Fig. 3). In addition to assessing total microbial biomass, PLFA measurements can also generate abundance information for microorganisms at a coarse phylogenetic resolution. The ability to obtain abundance profiles for bacteria, fungi, as well as distinguishing between gram-positive, gram-negative, and Actinobacteria, could be used as a “benchmark” for interpreting relative abundance data for more specific subsets of an amplicon dataset (i.e. (Drigo et al., 2010)). A combined interpretation of datasets from biochemical and molecular methods with fundamentally different measurement principles, however, may not always be as straightforward as the combination of amplicon sequencing data with quantitative PCR (see section 4.2.1).

Overall, we suggest that adding any quantitative measurement of microbial abundance such as quantitative PCR, cell counting, CFE, or PLFA will benefit and guide the interpretation of amplicon sequencing data. The use of more quantitative tools will provide a more robust foundation to reduce misinterpretation of compositional sequencing data by providing a link between total microbial biomass and changes in the relative abundance of microbial groups.

5. Linking sequences to ecological context

5.1. Soil spatial complexity occurs on micro- and macro-scales

Investigating microbial community composition in soils presents unique challenges. Compared to well-mixed ecosystems, microbial life (i.e., growth, activity, dormancy, and turnover) in soil is strongly limited by the complex network of pores, as well as gas transport and diffusion in the aqueous phase (Bickel and Or, 2020; Young, 2004; Vos et al., 2013)⁠. Soil microarchitecture is a key factor that influences the potential for microorganisms to interact with each other (Wilpiszeski et al., 2019). In practice, however, the analysis of soil microbial communities through amplicon sequencing does not account for soil microarchitecture. Researchers commonly use bulk homogenization approaches to extract nucleic acids from 250 to 500 mg of fresh soil which naturally obscures spatial arrangements of microbial cells in this soil sample. Additionally, sampling at such a relatively large volume of soil may conceal environmental characteristics that vary at smaller scales (e.g. micrometer), including oxygen gradients and the physical availability of substrates which may influence ecological interpretation of sequencing data. From the microbial perspective, nucleic acid extraction represents a macroscopic measurement of the “whole” microbial community. This practice does not negatively affect soil microbiome analyses unless interactions among microbial taxa are inferred (e.g., via network analysis, see section 5.4).

The spatial heterogeneity of soil and the microbial communities therein does not only persist on the microscale, but certainly also on a centimeter, meter, field, or ecosystem scale (Becker et al., 2006; Wolfe et al., 2006; Franklin and Mills, 2003). Sampling “the same soil” a few meters apart or at different depths in the soil profile might result in individual samples with varying biogeochemical properties such as pH, water saturation, soil texture, and also plant root distribution (Zhang and Hartemink, 2021). Choosing a sufficient number of biological replicates to assess sample or plot variability while balancing the cost-to-gain ratio is certainly an important measure to address heterogeneity of soil samples and microbial communities therein. Thus, it is critical to carefully evaluate the representativeness of biological replicates and to avoid pseudo-replication by performing repeated analysis of the same nucleic acid extract from a single sample (see section 6). A recent study showed distinct and consistent differences in bacterial and fungal communities between individual replicate soil samples throughout a season even though 10–15 cores were randomly sampled in individual subplots and pooled (Carini et al., 2020)⁠. Another study showed that chemical soil properties, as well as microbial biomass and communities, exhibited high levels of spatial variation across 49 samples in a 6 × 6 m forest plot (Štursová et al., 2016)⁠. The pooling of samples, individual extractions of DNA/RNA and/or amplification reactions made from a single DNA template can certainly dampen confounding effects of community heterogeneity. Nevertheless, existing intraplot variability and representativeness of samples, as well as the appropriateness of sampling strategies to correctly address them, must be critically assessed in any study on soil microbiomes. Otherwise, drawing of generalized macro-ecological conclusions from soil samples taken and pooled across large distances may yield speculative information at best (Zhang et al., 2020; Dini-Andreote et al., 2020).

5.2. Temporal scales to consider when analyzing microbial dynamics

When designing an experiment, one must not only consider the spatial scales at which microorganisms live and interact but as well the temporal scale, i.e., the frequency at which sampling should occur to capture temporal dynamics. Amplicon sequencing represents a snapshot of microbial prevalence at a given moment. Given that microbial community turnover among different soils may range from weeks to years (e.g. (Spohn et al., 2016)), it is difficult to assess the best temporal sampling strategy a priori. If for example effects of root exudation on soil microbial community dynamics are of interest, it is important to consider the different temporal scales of the processes to be correlated. Root exudation varies with plant development stage and shows diurnal patterns (Oburger et al., 2014), whereas community changes on a DNA level may not be detectable on such a short temporal scale (in contrast to RNA, see below). Any pattern of a single sampling time point would rather represent a legacy community that established around plant roots instead of the current state of a community that can be linked to root exudation (composition, rate) measured at the same time point.

Another soil parameter that might mask the detection of community shifts is intrinsically linked with microbial turnover: relic or environmental/exogenous DNA. Relic DNA is extracellular DNA from nonviable cells that has leaked into the environment and that is thought to persist in soils for months to years (Levy-Booth et al., 2007; Carini et al., 2016). Relic DNA has been estimated to comprise approximately 40% of the amplifiable soil DNA pool and has been successfully removed from soil samples via the application of DNAses or propidium monoazide (Lennon et al., 2018; Carini et al., 2020)⁠. The latter study found greater differences in soil communities across several time points where relic DNA was removed as compared to samples where relic DNA was still present. Consequently, the presence of relic DNA may complicate the interpretation of sequencing data by over- or under-estimating microbial diversity which may be of particular concern when temporal dynamics are key to the scientific question.

One possibility to address short temporal dynamics while eliminating bias of relic DNA is ribosomal RNA (rRNA) amplicon sequencing via complementary DNA (cDNA) synthesis. The lifetime of rRNA in soils is relatively short and has been estimated to range from days to a few weeks depending on biogeochemical parameters such as temperature, pH, and water saturation (Schostag et al., 2020; Blazewicz et al., 2013). Thus, rRNA-targeted amplicon sequencing may increase the chances of capturing dynamics within soil microbial communities over time and may be used to carefully assess the “active” fraction thereof (Vieira et al., 2019; Blazewicz et al., 2013) (see Table S2). Caution should still be taken when sequencing of nucleic acids at higher frequencies, even if relic DNA has been removed or RNA is used. If community dynamics are to be investigated in short time intervals (e.g., minutes to hours) we suggest combining amplicon sequencing with methods for targeting the metabolically active cell fraction (as discussed in section 7).

5.3. Inferring function from phylogeny

Although some links exist between the soil environment and the community composition therein, amplicon sequencing cannot be used to predict microbial function and roles in ecological processes (Fierer et al., 2007; Fierer, 2017), with the exception of microbial groups with experimentally validated congruence between phylogeny and function (e.g. diazotrophs or ammonia-oxidizing archaea and bacteria). Nevertheless, it can serve as a useful tool to survey microbial communities through detection of a section of a single gene or gene region (Fig. 4). The consequence of targeting a subsection of microbial genomes is that ecological insights that can be extracted from these data remain limited. Function of taxa identified via amplicon sequencing cannot simply be inferred from the phylogeny of these organisms, as complex evolutionary processes (e.g., horizontal gene transfer) play a key role in functional trait distribution across the genomes of microorganisms (Menna and Hungria, 2011). Function may not necessarily be conserved across phylogenetic levels, and therefore processes cannot be reliably predicted and assigned to taxa using amplicon sequencing targeting phylogenetic markers such as 16S rRNA genes (Li et al., 2019, Nunan et al., 2020). Consequently, we suggest to avoid inferring life strategies of taxa via their classification into a phylum (e.g., equating Proteobacteria with fast-growing r-strategist) and using such assumptions to explain processes in soils for surveys based on general markers such as 16S rRNA genes (Jeewani et al., 2020) and ITS regions (Zhou et al., 2021).

Fig. 4
  1. Download : Download high-res image (709KB)
  2. Download : Download full-size image

Fig. 4. Schematic representation of the main spatio-temporal scales of soil ecosystems. Climate and seasonal patterns are depicted aboveground. The three main scales at which researchers investigate soil microbial communities are depicted as the macroscale, mesoscale, and microscale. Circle insets show the resolution at which microbial communities can be studied at each scale, emphasizing that careful experimental planning must be undertaken to capture community dynamics of interest. A partial single region of a selected marker gene that is captured by amplicon sequencing is depicted in the lower yellow box. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Recent studies apply functional predictions using packages such as PICRUSt2 (Douglas et al., 2020) or Tax4Fun (Aβhauer et al., 2015), which suggest that metagenomes (and therefore functional potential of organisms) can be extrapolated from the sequenced amplicon using phylogenetic markers. In the case of fungi, FUNGuild or FungalTraits have been developed, which parses OTUs/ASVs into functional guilds based on similarity to existing reference sequences (Nguyen et al., 2016; Põlme et al., 2020). The main limitation of these approaches lies in the fact that they are dependent on a single gene, and the completeness of reference sequence databases, many of which remain incomplete due to bias in the types of organisms for which we have references (section 3; Choi et al., 2016)⁠. However, these prediction-based software packages can be used to generate valuable hypotheses for further investigation or an additional line of evidence to support a finding. In such cases, we recommend to follow up by either FISH-counting of the identified species, functional gene-targeted sequencing, or SIP experiments to learn more about the species or community that is hypothesized to be responsible/involved in an ecosystem process (further discussed in section 7).

5.4. Interpreting co-occurrence data and networks

Challenges associated with amplicon sequencing analysis and interpretation also complicate the use of co-occurrence network analysis from soil samples. Generally, co-occurrence analysis generates networks with biological species as nodes and edges representing associations between them. Network construction is based on the detection of correlations between taxa, and have been used to investigate properties of microbial communities including organismal co-existence (e.g. Barberán et al., 2011), identification of keystone species (e.g. Banerjee et al., 2018; Zheng et al., 2021) and the stability of community structure (e.g. de Vries et al., 2018; Shi et al., 2016).

Network construction is based on the detection of significant associations between taxa across a sufficiently large number of soil samples (see also section 6). The most popular construction strategies rely on pairwise correlations, and thus only on the information of the spatial distribution of microbes in the soil at the resolution of soil sample size. This microbial distribution, however, results from the interplay of numerous factors such as environmental filtering, microbial interactions, or stochastic processes such as dispersion (Faust, 2021). While there are many potential ecological drivers that may have contributed to observed co-occurrences, resulting networks are most often interpreted solely as interactions. Such an interpretation is even more speculative for soil than for other environments due to large sample volumes relative to a small interaction radius of soil microorganisms.

One way to address these problems is by increasing the potential signal of microbial interactions by drastically decreasing the soil sample size down to the microscale or aggregate scale. In this way, co-occurring taxa could theoretically interact with each other within the habitable pore space (see Fig. 4), although even at small scales one must acknowledge possible environmental confounding factors. It is also important to keep in mind that the data contained in each environmental sample is only a snapshot of complex spatio-temporal dynamics (see sections 5.1 and 5.2) and thus large numbers of samples are recommended. From a theoretical point of view, advances in network construction methods are necessary since correlations are unreliable for inferring ecological relationships that exist in natural systems (Berry and Widder, 2014; Blanchet et al., 2020). This is even more problematic due to the compositionality of amplicon sequencing data, since pairwise correlations between the relative abundances of two taxa can lead to spurious associations. With respect to the nature of these data, methods such as SparCC (VLR) and SpiecEasi (CLR) were recently implemented, which rely on the use of log ratios (see Section 4) to address compositionality in the process of network construction (Kurtz et al., 2015; Friedman and Alm, 2012). Another option for the latter is to convert relative abundances into absolute values by using the total gene abundance obtained from quantitative PCR (see section 4). To improve the analysis and interpretation we suggest a careful comparison of data with null models to help interpret the results and eliminate some spurious associations between species (Connor et al., 2017).

At the moment there are no established standards for network construction. This necessitates that one becomes familiar with the technical details of the inference process in order to properly interpret the constructed models while considering all of their limitations (Röttjers and Faust, 2018; Faust, 2021). Since the majority of constructed networks are based on correlations, we suggest that soil researchers refrain from using associations of microbial abundances as proxy for microbial interactions, and to explore additional ecological factors that may have led to the observed co-occurrence pattern. Complementing amplicon sequencing data with additional measurements can significantly improve the ecological insights from network analysis (see sections 4 and 7; Goberna et al., 2019; Lima-Mendez et al., 2015). In general, the constructed networks should serve mainly as a hypothesis generation tool. We recommend performing follow-up experiments to further investigate potential interactions and to test hypotheses formulated by the network analysis (e.g. stability of a community or influence of environmental factors). Finally, the field of network inference is rapidly evolving and alternatives are emerging to address currently standing issues. Nevertheless, a definite framework is still missing that would allow for straightforward interpretation of generated co-occurrence networks.

6. Addressing the overinterpretation of sequencing data

Amplicon sequencing data are well-suited for exploratory analysis and hypothesis generation in soil research, but can also be applied for targeted hypothesis testing if appropriate complementary and statistical methods are selected ((Gloor et al., 2017); sections 3 Challenges in amplicon sequence data analysis, 4 Addressing and interpreting compositional sequencing data). As amplicon datasets from soil are characterized by compositionality, heterogeneity and sparsity, the use of standard statistical methods (including Pearson correlations or t-tests on proportions) can lead to very high false-positive discovery rates (up to 100% (Mandal et al., 2015; Morton et al., 2017))⁠. Almost any soil microbiome data set will show significant correlations as the data consist of thousands of individual variables. The possibility to obtain significant results, therefore, may also lead to an abuse of the statistical significance (also referred to as “p-hacking”). These effects are further compounded by spatio-temporal dynamics that contribute to challenges in statistical inference from amplicon sequencing in soils (see section 5). Consequently, we ask researchers to apply caution when inferring effects or associations solely based on statistical significance. The recent discussion surrounding the abuse of p-values has resulted in alternatives and suggestions for the use of more stringent p-values to reduce the false-positive discovery rate (Nuzzo, 2014; Amrhein et al., 2019; Wasserstein et al., 2019; Benjamin et al., 2017). This would require an estimated dramatic increase in sample size (up to 70%), which would be costly, but could also save resources in the long run that would have been spent on unsubstantiated research.

We explored the impact of sample replication on statistical power in soil microbiome analysis using a published dataset on bacterial and fungal communities that features a range of soils representative of the heterogeneity and biological diversity of soils (Zheng et al., 2019) following the approach described previously (Kelly et al., 2015). We simulated OTU/ASV tables and computed the dependency of statistical power of permutational multivariate analysis of variance (PERMANOVA) on the effect size, by bootstraping the simulated matrices with varying replicate numbers (4, 5, 8 and 10 replicates; Fig. 5). We described the procedure used in the supplementary information and address the reader to previous publication (Kelly et al., 2015) for further details and how to implement the analysis with the package ‘micropower’ available for R programming language.

Fig. 5
  1. Download : Download high-res image (497KB)
  2. Download : Download full-size image

Fig. 5. Impact of sample replication on statistical power in soil microbiome analysis: a) the calculated PERMANOVA power for a range of simulated effects (quantified by the adjusted coefficient of determination omega-squared (ω2) and divided by number of replicates per treatment); b) the average PERMANOVA power of panel ‘a’, grouped by number of replicates per treatment and into three effect size ranges: Low (0.001–0.04), Medium (0.04–0.08) and high (0.08–0.12). PERMANOVA power was calculated as the proportion of bootstrap distance matrices for which PERMANOVA p-values are less than the pre-specified threshold for type I error (0.05).

Fig. 5a shows the statistical power to detect significant differences with increasing effect size for multiple groups (representing different sample sizes). This clearly shows that even a small increase in the sample size increases the power to detect small differences. These results are similar to the findings described earlier (Kelly et al., 2015) using the Human Microbiome Project (HMP) dataset with 16S rRNA marker gene data sampled at multiple body sites. To better visualize these differences, we further calculated the average statistical power for a range of effect sizes (ω2) defined as ‘Low’ (0.001–0.04), ‘Medium’ (0.04–0.08) and ‘High’ (0.08–0.12). Our analysis showed that the number of replicates hardly affects the statistical power if there was a strong effect of treatment/site (Fig. 5b, “High”). However, if the simulated treatment/site effect was lower, we found that an increase of the replicate number from 4 to 5 was sufficient to almost double the statistical power of small effect size (“Low”) and to achieve the recommended power above 0.8 for medium effect sizes (Fig. 5b, “Low” and “Medium”). Consequently, these effects were more pronounced when the number of replicates was doubled (4–8; Fig. 5b). Identical effects were observed for the fungal data set (Fig. S1bc).

In practice, obtaining knowledge about the level of differences in soil microbial communities a priori is a complicated undertaking. If preliminary sequencing data is available we encourage researchers to perform such power analyses before experimental planning. Such considerations should also include the amount of replicates that will be pooled to alleviate the spatial heterogeneity of soils (see section 5). We refer to further literature on experimental planning and robust statistical analyses (e.g. (Coenen et al., 2020; Kelly et al., 2015; Johnson et al., 2014)).

7. Complementary approaches to amplicon sequencing that improve ecological insights

As a consequence of the relative nature of amplicon sequencing data, the majority of such studies are descriptive. Marker-gene base surveys have certainly contributed to generating valuable knowledge regarding microbial diversity and community structure, underpinning the critical roles of microorganisms in the environment. However, the limitation of using DNA sequence information to infer in situ activity, or even potential metabolic functions, has been looming over the field of environmental microbiology from its early days. This inherent property results from both the fact that two organisms with closely related 16S rRNA gene sequences might possess different metabolic capacities (Li et al., 2019), and even if the function of the organism is known, the presence of DNA or even RNA does not necessarily indicate that the cells are active (Blazewicz et al., 2013). Consequently, the application of rRNA/rDNA ratios to assess cell-level metabolic activity and to classify taxa as “dormant” may be of limited use, especially when investigating communities as complex as those found in soil (Steven et al., 2017; Jia et al., 2019).

Nevertheless, amplicon sequencing data can be combined with other types of data to improve investigations of ecological patterns. Using stable isotopes as an indicator of activity is one of the more popular and robust ways to bridge the gap between microorganisms and their function in ecological processes. In environmental microbiology, DNA or RNA stable isotope probing (SIP) is applied by incubating a sample with an isotopically-labelled substrate (including heavy and rare stable isotopes of C, N, H or O), that can be incorporated into the biomass of metabolically active cells (Angel, 2019; Dumont and Murrell, 2005). Unfortunately, for P no stable isotopes next to the one and only 31P exist. The identity/community profile of the labelled organisms may then be determined using separation of different buoyant densities of the nucleic acids and subsequent sequencing of the different density fractions which allows drawing causal ecological interpretations of the microorganisms active in the uptake and/or assimilation of the substrate. Organisms labelled through SIP may further be detected and identified on a single-cell level using other methods, such as Raman microspectroscopy or NanoSIMS in combination with FISH (Musat et al., 2016; Wang et al., 2016).

Other recent advances in linking microorganisms to functions include so-called ‘next-generation physiology’ approaches (Hatzenpichler et al., 2020). Similar to SIP, these methods require the introduction of isotopically labelled or non-canonical molecule into the sample for the detection of metabolically active organisms. The use of heavy-water labelling has become a recent popular approach for universal targeting of all active organisms using either 18O–H2O (Aanderud and Lennon, 2011; Schwartz, 2007; Angel and Conrad, 2013) or deuterium oxide (D2O) (Li et al., 2019; Eichorst et al., 2015). The assimilation of 18O–H2O into DNA can be used to deduce microbial growth rates (Hungate et al., 2015), whereas D2O can be detected in the newly synthesized lipids or proteins of active cells (Li et al., 2019). Combined with the identification of taxa of interest through amplicon sequencing, next-generation physiology approaches represent powerful tools to bring us to the next step in soil ecological research.

Amplicon sequencing may also be combined with BioOrthogonal Non-Canonical Amino acid Tagging (BONCAT) to target only the fraction of cells within a soil sample that is translationally active in situ (Couradeau et al., 2019; Reichart et al., 2020). The use of modified indicator molecules opens new avenues for detecting metabolically active cells in the context of environmental samples, however, the application to soil remains limited to very few studies so far (Couradeau et al., 2019; Reichart et al., 2020). Coupling these labelling approaches to cell sorting via fluorescence-activated cell sorting (FACS) (Couradeau et al., 2019) or Raman-activated cell sorting (RACS) (Lee et al., 2019), provides a non-destructive alternative to NanoSIMS for identifying the metabolically active organisms, and thus allowing the labelled fraction of cells to be targeted for downstream sequencing. Additionally, combining these labelling approaches with cell sorting and sequencing may further circumvent challenges associated with exogenous (relic) DNA.

In addition, amplicon sequencing can certainly also be a valuable tool for planning of more targeted metagenomic or metatranscriptomic studies to investigate phylogenetic composition, functional potential and/or gene expression in the community context (Regalado et al., 2020). These “omics” approaches remain promising for improving the link between organisms and their ecological roles and circumvent methodological challenges introduced through amplicon sequencing, such as PCR bias. However, both sequencing and bioinformatic costs for gaining functionally relevant insights into ecosystem processes by “omics” approaches are typically orders of magnitudes higher than those needed for analyzing amplicon sequencing data. The use of a limited number of metagenomes or metatranscriptomes in complement to amplicon sequencing presents a cost-effective and informative approach for linking microbial community structure to function in the complex soil environment.

8. Summary and outlook

Amplicon sequencing is and will remain a valuable approach for investigating the structure of microbial communities in soils. However, the complex nature of soils and high diversity of organisms therein necessitate careful considerations, from sampling strategies to statistical analyses, to avoid mis- or over-interpretation of the data. Amplicon sequencing has largely been used in a descriptive manner, allowing one to catalogue nucleic acids of organisms present in a given sample. However, amplicon sequencing has great potential to increase our understanding of microbial community ecology in soils when used to test well-defined scientific questions. As one key goal of soil microbial ecology is to link organisms to environmental processes, sequencing-based studies need to be complemented with other data types, in addition to appropriate normalization and statistical approaches. By improving the quantitative nature of amplicon sequencing, as well as coupling sequence data with high-quality metadata and complementary measurements, we can reveal the mechanisms underlying microbial community structure in soil. Understanding the nature of amplicon data and the role of sequencing as a valuable tool for soil scientists will further expand our understanding of microbial community diversity and structure in the immensely complex soil environment.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank Petra Pjevac for helpful comments on the manuscript. Figs. 2 and 4 were created with the help of BioRender.com. JS was supported by the Austrian Science Fund (FWF) DK+ program 'Microbial Nitrogen Cycling' (W1257-B20). RA was supported by the Czech Science Foundation (Junior Grant No. 19-24309Y) and MEYS (EF16_013/0001782 - SoWa Ecosystems Research). JJ was supported by the Czech Science Foundation (21-07275 S). LA, KS and CK have received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 819446).

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Download : Download Word document (304KB)

Multimedia component 1.

References

Cited by (28)

View all citing articles on Scopus