Author: Sandi Yen
Contributors:
Updated: 11 May 2020
Executive Summary
It is common to sequence multiple projects together on a single run in order to obtain optimal numbers of sequenced reads per sample whilst minimising data redundancy. This report I explore and validate methods to mitigate any downstream effects cross-project contamination may have. The dataset used in this study included human stool (CMS001), two defined mouse microbiomes (GFU005, Nanopore), a DNA standard obtained from Zymo (DNA_Std), and a specific pathogen-free mouse microbiome (SPF).
In general, about 20-40 ASVs were identified as cross-project contaminants. However, the diversity of the samples are still overestimated, as evidenced by the DNA standards and the MM12 experiments, suggesting that this method may still be too conservative. A follow-up report will be conducted on removing contaminants introduced at the bench level (kit contaminants, PCR, library preparation, etc.)
Evidence of cross-project contamination
The DNA Standards and GFU005 project allow us to measure the extent of cross-project contamination, as the theoretical composition of those samples are known. Table 1 shows the number of ASVs observed in each project, as compared to the expected number of ASVs, and the number of ASVs identified in this report as project contaminants.
Summary of number of ASVs in each project.
expt
|
n_sample
|
n_expected
|
n_observed
|
n_contam
|
dna_std
|
2
|
10
|
161
|
18
|
Nanopore
|
4
|
12
|
121
|
35
|
SPF
|
8
|
NA
|
490
|
21
|
GFU005
|
112
|
12
|
1514
|
43
|
CMS001
|
116
|
NA
|
3804
|
22
|
Project prevalence
It is expected that cross-project contaminations would occur rarely, and randomly. Therefore, ASVs that are observed in a small percentage of samples within a project are good candidates for cross-project contaminants. The threshold for what is considered “low prevalence” was determined by tallying hte number of ASVs observed at 5% prevalence, 4% prevalence and so on. The point where a large drop off of ASV frequencing is used as the threshold prevalence level. For example, Table 2 shows a prevalence of 2% is can be considered as “low prevalence” in both CMS001 and GFU005
Number of ASVs observed at each prevalence level.
Sample Prevalence
|
CMS001
|
GFU005
|
0.01
|
1817
|
953
|
0.02
|
2285
|
1110
|
0.03
|
2580
|
1179
|
0.04
|
2910
|
1234
|
0.05
|
3018
|
1267
|
Identifying candidate contaminants based on prevalence does not work for projects with few samples, as is the case with the DNA standards (2 samples), SPF (8 samples), and “Nanopore” project (4 samples). In these cases, candidate contaminants can be identified using a read count cut-off of 0.1% of the total abundance in the project. For the DNA standards, the total abundance (of ASVs expected to be in the standard, based on genus) was 94320. Therefore, ASVs with less than on equal to 94 reads were considered as candidate contaminants for the DNA standards. Contaminants to the Nanopore and SPF projects were identified in a similar way, 0.1% of where total abundance in the project was used to set cutoff thresholds for contaminants at 212 and 398 for Nanopore and SPF, respectivley.
Applying these threshold cutoffs to their respective projects results in identifing candidate project contaminats (Table 3). Next these low abundance or low prevalence ASVs warrented further investigation into their likelihood of originating from another project from the same sequencing run.
Number of candidate contaminants based on low abundance cut-off or low prevalence cut-off.
experiment
|
num. ASVs
|
num. candidate contaminants
|
cutoff method
|
Nanopore
|
121
|
83
|
low abundance
|
dna_std
|
161
|
101
|
low abundance
|
SPF
|
490
|
119
|
low abundance
|
GFU005
|
1514
|
91
|
low prevalence
|
CMS001
|
3804
|
106
|
low prevalence
|
Cross-Project comparison of ASV abundance
It is expected ASVs originating from other projects (rather than the current project) would be very abundant in other projects and rare in the current project. Therefore, these cross-project contaminants would have a very high fold change. Furthermore, the occurance of cross-project contaminants should be rare. Therefore, ASVs with large fold-changes and low frequencies can be considered as cross-project contaminants. Fold-changes were calculated between CMS001 and the project with the highest abundnaces for a given ASV. Fold-changes were calculated as “Other Project” / “Current Project”, so a high fold-change indicates that the ASV was abundant in the other project, and minimally present in current project of interest.
Comparisons of the candidate contaminants across all projects can be summarized by looking at how common these fold changes are. The fold changes of candidate contaminants identified for each project are tallied, and the frequencies are summarized in Tables 5-9. Based on the frequency of fold changes, cut-off for what is considered as a ‘large-fold’ difference is evident. These cut-offs are summarized in Table 4.
Fold change threshold for identifying project contaminants.
expt
|
threshold
|
dna_std
|
80
|
Nanopore
|
40
|
SPF
|
60
|
GFU005
|
60
|
CMS001
|
30
|
Tally of fold changes of candidate contaminants to DNA Standards.
Fold change (bin upper bound)
|
Number of ASVs
|
10
|
14
|
20
|
23
|
30
|
15
|
40
|
15
|
50
|
6
|
60
|
5
|
70
|
4
|
80
|
1
|
90
|
1
|
100
|
1
|
110
|
1
|
180
|
1
|
200
|
1
|
220
|
1
|
230
|
1
|
270
|
1
|
330
|
1
|
370
|
1
|
460
|
1
|
570
|
1
|
740
|
1
|
900
|
1
|
950
|
1
|
1270
|
1
|
3910
|
1
|
5030
|
1
|
Tally of fold changes of candidate contaminants to “Nanopore” project.
Fold change (bin upper bound)
|
Number of ASVs
|
10
|
21
|
20
|
16
|
30
|
9
|
40
|
2
|
50
|
2
|
60
|
1
|
70
|
1
|
80
|
2
|
90
|
1
|
100
|
1
|
140
|
2
|
160
|
1
|
190
|
1
|
210
|
1
|
220
|
1
|
230
|
1
|
250
|
1
|
260
|
1
|
280
|
2
|
300
|
1
|
330
|
1
|
350
|
2
|
390
|
1
|
420
|
2
|
450
|
1
|
460
|
1
|
470
|
1
|
510
|
1
|
590
|
1
|
640
|
1
|
670
|
1
|
1620
|
1
|
11480
|
1
|
Tally of fold changes of candidate contaminants to SPF project.
Fold change (bin upper bound)
|
Number of ASVs
|
10
|
36
|
20
|
18
|
30
|
19
|
40
|
14
|
50
|
10
|
60
|
1
|
70
|
2
|
80
|
2
|
90
|
2
|
110
|
1
|
160
|
1
|
230
|
1
|
390
|
1
|
400
|
1
|
430
|
1
|
750
|
1
|
790
|
1
|
3330
|
1
|
4020
|
1
|
7220
|
1
|
21190
|
1
|
23790
|
1
|
34610
|
1
|
35020
|
1
|
Tally of fold changes of candidate contaminants to GFU005 project.
Fold change (bin upper bound)
|
Number of ASVs
|
10
|
25
|
20
|
8
|
30
|
7
|
40
|
3
|
50
|
4
|
60
|
1
|
70
|
1
|
80
|
1
|
90
|
2
|
100
|
1
|
180
|
1
|
210
|
1
|
220
|
1
|
300
|
1
|
340
|
1
|
430
|
1
|
530
|
1
|
670
|
1
|
1100
|
1
|
1110
|
1
|
1730
|
1
|
1950
|
1
|
2610
|
1
|
2650
|
1
|
2850
|
1
|
2920
|
1
|
3050
|
1
|
3060
|
1
|
3650
|
1
|
3690
|
1
|
3720
|
1
|
3800
|
1
|
4140
|
1
|
4520
|
1
|
5720
|
1
|
6150
|
1
|
6160
|
1
|
6860
|
1
|
7810
|
1
|
9360
|
1
|
10810
|
1
|
11300
|
1
|
11680
|
1
|
13660
|
1
|
18340
|
1
|
20360
|
1
|
22530
|
1
|
41070
|
1
|
Tally of fold changes of candidate contaminants to CMS001 project.
Fold change (bin upper bound)
|
Number of ASVs
|
10
|
59
|
20
|
19
|
30
|
6
|
40
|
3
|
50
|
2
|
60
|
2
|
70
|
1
|
80
|
1
|
90
|
4
|
130
|
1
|
150
|
1
|
190
|
1
|
240
|
1
|
250
|
1
|
740
|
1
|
800
|
1
|
1390
|
1
|
4030
|
1
|
Contaminants identified for DNA standards. Total abundance for each ASV within the projects are shown.
featureID
|
CMS001
|
dna_std
|
GFU005
|
Nanopore
|
SPF
|
ASV103
|
211
|
11
|
893
|
2
|
62
|
ASV141
|
17364
|
81
|
2303
|
0
|
163
|
ASV163
|
274
|
35
|
1031
|
0
|
3404
|
ASV167
|
4788
|
27
|
538
|
0
|
12
|
ASV194
|
62
|
13
|
240
|
0
|
7327
|
ASV199
|
27343
|
7
|
298
|
0
|
0
|
ASV201
|
16075
|
17
|
545
|
0
|
4
|
ASV203
|
103
|
9
|
378
|
0
|
911
|
ASV206
|
101
|
8
|
312
|
0
|
1575
|
ASV208
|
9
|
3
|
61
|
0
|
2198
|
ASV21
|
223
|
36
|
45474
|
2476
|
0
|
ASV211
|
8
|
5
|
94
|
0
|
4484
|
ASV218
|
6
|
4
|
36
|
0
|
1466
|
ASV241
|
320
|
30
|
612
|
0
|
9882
|
ASV242
|
3796
|
17
|
156
|
0
|
17
|
ASV243
|
31093
|
68
|
460
|
0
|
11037
|
ASV245
|
4468
|
17
|
375
|
0
|
1131
|
ASV70
|
321
|
16
|
1189
|
7
|
80352
|
The taxonomic classifications associated with the ASVs identified as likely cross-project contaminants for DNA standards.
featureID
|
Taxon
|
ASV103
|
k__Bacteria;p__NA;c__NA;o__NA;f__NA;g__NA;s__NA
|
ASV141
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__NA;g__NA;s__NA
|
ASV163
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__NA;s__NA
|
ASV167
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Lachnospiracea incertae sedis;s__NA
|
ASV194
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__NA;s__NA
|
ASV199
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Clostridium XI;s__NA
|
ASV201
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Parabacteroides;s__Parabacteroides_distasonis(AB238922)
|
ASV203
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__NA;s__NA
|
ASV206
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__NA;s__NA
|
ASV208
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__NA;g__NA;s__NA
|
ASV21
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Barnesiella;s__NA
|
ASV211
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__NA;s__NA
|
ASV218
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Anaerotruncus;s__NA
|
ASV241
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Barnesiella;s__NA
|
ASV242
|
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Parabacteroides;s__Parabacteroides_distasonis(AB238922)
|
ASV243
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Clostridium XlVa;s__NA
|
ASV245
|
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Intestinimonas;s__NA
|
ASV70
|
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus_apodemi(AJ871178)
|
ASVs remaining in DNA standards after removal of contaminants. Expected member of DNA standard are demarked with *. Total abundance for ASVs for each project shown.
featureID
|
expected_member
|
dna_std
|
Nanopore
|
SPF
|
GFU005
|
CMS001
|
ASV10
|
- </td>
8889
|
395
|
184
|
8310
|
122
|
</tr>
ASV101
|
|
62
|
6
|
0
|
796
|
291
|
ASV106
|
|
95
|
7
|
27
|
2276
|
870
|
ASV110
|
|
113
|
6
|
1570
|
4018
|
1594
|
ASV118
|
|
383
|
7
|
6831
|
11339
|
3949
|
ASV120
|
|
41
|
5
|
61
|
1500
|
513
|
ASV122
|
- </td>
15195
|
0
|
0
|
4
|
11
|
</tr>
ASV123
|
- </td>
13128
|
0
|
0
|
12
|
0
|
</tr>
ASV124
|
- </td>
12401
|
0
|
0
|
3
|
0
|
</tr>
ASV125
|
- </td>
11653
|
0
|
0
|
6
|
0
|
</tr>
ASV126
|
- </td>
9231
|
0
|
985
|
5
|
39010
|
</tr>
ASV127
|
- </td>
8921
|
0
|
0
|
0
|
287
|
</tr>
ASV128
|
- </td>
6754
|
0
|
0
|
0
|
25
|
</tr>
ASV129
|
- </td>
2244
|
0
|
0
|
0
|
0
|
</tr>
ASV13
|
|
256
|
45
|
271
|
7023
|
2962
|
ASV130
|
- </td>
1907
|
0
|
0
|
0
|
0
|
</tr>
ASV131
|
- </td>
1635
|
0
|
0
|
0
|
0
|
</tr>
ASV132
|
- </td>
1520
|
0
|
0
|
0
|
0
|
</tr>
ASV133
|
|
19
|
0
|
0
|
0
|
0
|
ASV134
|
|
11
|
0
|
0
|
0
|
3
|
ASV135
|
|
136
|
0
|
494
|
3695
|
1856
|
ASV136
|
|
6
|
0
|
0
|
0
|
0
|
ASV137
|
|
15
|
0
|
0
|
0
|
0
|
ASV138
|
|
3
|
0
|
0
|
0
|
0
|
ASV139
|
|
3
|
0
|
0
|
0
|
0
|
ASV140
|
|
4
|
0
|
0
|
0
|
0
|
ASV142
|
- </td>
3
|
0
|
0
|
0
|
0
|
</tr>
ASV143
|
|
12
|
0
|
0
|
0
|
0
|
ASV144
|
|
5
|
0
|
0
|
0
|
0
|
ASV145
|
- </td>
4
|
0
|
0
|
0
|
0
|
</tr>
ASV146
|
- </td>
3
|
0
|
0
|
0
|
0
|
</tr>
ASV147
|
|
3
|
0
|
0
|
0
|
0
|
ASV148
|
|
4
|
0
|
0
|
0
|
0
|
ASV149
|
|
2
|
0
|
0
|
0
|
0
|
ASV150
|
- </td>
2
|
0
|
0
|
0
|
0
|
</tr>
ASV151
|
|
63
|
0
|
22
|
848
|
385
|
ASV152
|
|
152
|
0
|
117
|
5351
|
1216
|
ASV153
|
|
151
|
0
|
275
|
2459
|
846
|
ASV154
|
|
135
|
0
|
92
|
3907
|
1534
|
ASV155
|
|
102
|
0
|
59
|
2478
|
738
|
ASV156
|
|
80
|
0
|
354
|
869
|
542
|
ASV157
|
|
75
|
0
|
87
|
2827
|
1293
|
ASV158
|
|
68
|
0
|
195
|
2330
|
543
|
ASV159
|
|
54
|
0
|
3581
|
919
|
2256
|
ASV160
|
|
43
|
0
|
19
|
924
|
568
|
ASV161
|
|
39
|
0
|
42
|
755
|
167
|
ASV162
|
|
41
|
0
|
0
|
1327
|
241
|
ASV164
|
|
42
|
0
|
506
|
682
|
1968
|
ASV165
|
|
25
|
0
|
18
|
689
|
196
|
ASV166
|
|
28
|
0
|
0
|
38
|
20
|
ASV168
|
|
33
|
0
|
0
|
506
|
109
|
ASV169
|
|
31
|
0
|
0
|
354
|
82
|
ASV17
|
- </td>
566
|
64
|
400
|
16373
|
6024
|
</tr>
ASV170
|
|
21
|
0
|
30
|
1081
|
342
|
ASV171
|
|
18
|
0
|
18
|
209
|
30
|
ASV172
|
|
19
|
0
|
18
|
899
|
259
|
ASV173
|
|
26
|
0
|
13
|
355
|
199
|
ASV174
|
|
17
|
0
|
34
|
675
|
296
|
ASV175
|
|
16
|
0
|
16
|
142
|
251
|
ASV176
|
|
25
|
0
|
311
|
332
|
85
|
ASV177
|
|
14
|
0
|
28
|
385
|
165
|
ASV178
|
|
16
|
0
|
0
|
484
|
92
|
ASV179
|
|
11
|
0
|
51
|
264
|
97
|
ASV180
|
|
12
|
0
|
0
|
156
|
10
|
ASV181
|
|
17
|
0
|
0
|
93
|
14
|
ASV182
|
|
11
|
0
|
462
|
127
|
0
|
ASV183
|
|
16
|
0
|
0
|
849
|
212
|
ASV184
|
|
8
|
0
|
0
|
55
|
0
|
ASV185
|
|
9
|
0
|
0
|
21
|
0
|
ASV186
|
|
10
|
0
|
663
|
256
|
24
|
ASV187
|
|
11
|
0
|
87
|
129
|
15
|
ASV188
|
|
5
|
0
|
0
|
0
|
0
|
ASV189
|
|
8
|
0
|
32
|
424
|
84
|
ASV190
|
|
7
|
0
|
0
|
73
|
0
|
ASV191
|
|
6
|
0
|
0
|
154
|
118
|
ASV192
|
|
2
|
0
|
0
|
0
|
0
|
ASV193
|
|
2
|
0
|
0
|
0
|
0
|
ASV195
|
|
5
|
0
|
0
|
13
|
2
|
ASV196
|
|
54
|
0
|
30
|
1260
|
240
|
ASV197
|
|
20
|
0
|
10
|
263
|
38
|
ASV198
|
|
10
|
0
|
9
|
135
|
73
|
ASV200
|
|
10
|
0
|
0
|
0
|
0
|
ASV202
|
|
12
|
0
|
476
|
270
|
101
|
ASV204
|
|
9
|
0
|
0
|
75
|
3
|
ASV205
|
|
5
|
0
|
9
|
27
|
32
|
ASV207
|
|
12
|
0
|
0
|
30
|
11
|
ASV209
|
|
6
|
0
|
0
|
121
|
13
|
ASV210
|
|
10
|
0
|
233
|
344
|
75
|
ASV212
|
|
28
|
0
|
1635
|
161
|
0
|
ASV213
|
|
15
|
0
|
0
|
405
|
421
|
ASV214
|
|
2
|
0
|
0
|
2
|
0
|
ASV215
|
|
7
|
0
|
0
|
0
|
0
|
ASV216
|
|
8
|
0
|
0
|
22
|
36
|
ASV217
|
|
32
|
0
|
66
|
1331
|
500
|
ASV219
|
|
14
|
0
|
1065
|
53
|
44
|
ASV22
|
|
106
|
8
|
75
|
2736
|
747
|
ASV220
|
|
8
|
0
|
0
|
0
|
0
|
ASV221
|
|
9
|
0
|
17
|
14
|
25
|
ASV222
|
- </td>
6
|
0
|
0
|
81
|
42
|
</tr>
ASV223
|
|
3
|
0
|
0
|
0
|
0
|
ASV224
|
|
4
|
0
|
165
|
0
|
0
|
ASV225
|
- </td>
53
|
0
|
64
|
1543
|
1041
|
</tr>
ASV226
|
|
5
|
0
|
0
|
160
|
50
|
ASV227
|
|
9
|
0
|
0
|
31
|
0
|
ASV228
|
|
10
|
0
|
14
|
129
|
27
|
ASV229
|
|
25
|
0
|
613
|
238
|
43
|
ASV23
|
|
251
|
21
|
250
|
7173
|
2615
|
ASV230
|
|
5
|
0
|
0
|
0
|
0
|
ASV231
|
|
6
|
0
|
15
|
368
|
26
|
ASV232
|
|
33
|
0
|
0
|
297
|
88
|
ASV233
|
|
46
|
0
|
3102
|
1710
|
279
|
ASV234
|
|
2
|
0
|
3
|
106
|
27
|
ASV235
|
|
14
|
0
|
77
|
183
|
45
|
ASV236
|
|
4
|
0
|
0
|
103
|
6
|
ASV237
|
|
3
|
0
|
0
|
0
|
0
|
ASV238
|
|
11
|
0
|
141
|
0
|
0
|
ASV239
|
|
9
|
0
|
0
|
97
|
38
|
ASV24
|
|
297
|
40
|
371
|
9958
|
4200
|
ASV240
|
|
9
|
0
|
0
|
23
|
6
|
ASV244
|
|
7
|
0
|
18
|
347
|
18
|
ASV246
|
|
31
|
0
|
37
|
1163
|
413
|
ASV247
|
|
3
|
0
|
0
|
46
|
0
|
ASV248
|
|
7
|
0
|
61
|
0
|
0
|
ASV25
|
|
333
|
22
|
293
|
9062
|
3716
|
ASV28
|
|
290
|
37
|
274
|
8453
|
4121
|
ASV4
|
|
192
|
70849
|
2893
|
1578750
|
3560
|
ASV43
|
|
69
|
6
|
36
|
1654
|
483
|
ASV44
|
- </td>
51
|
4
|
25
|
1111
|
333
|
</tr>
ASV59
|
|
30
|
2
|
15
|
828
|
308
|
ASV6
|
- </td>
138
|
2903
|
5785
|
239901
|
1324
|
</tr>
ASV67
|
|
78
|
4
|
775
|
1805
|
2560
|
ASV68
|
|
138
|
15
|
79
|
3058
|
946
|
ASV71
|
|
174
|
9
|
156
|
4212
|
1137
|
ASV72
|
|
39
|
6
|
0
|
1278
|
645
|
ASV73
|
|
15
|
2
|
7
|
200
|
50
|
ASV74
|
|
94
|
8
|
80
|
3096
|
884
|
ASV75
|
|
10
|
3
|
10
|
187
|
46
|
ASV76
|
|
93
|
5
|
114
|
2938
|
1080
|
ASV78
|
|
104
|
6
|
98
|
2712
|
672
|
ASV80
|
|
129
|
24
|
104
|
4487
|
1493
|
ASV81
|
|
369
|
23
|
396
|
11567
|
4121
|
ASV84
|
|
19
|
5
|
19
|
668
|
212
|
ASV96
|
|
25
|
8
|
17
|
610
|
240
|
</tbody>
</table>
</li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul></li>
</ul>
The taxonomic classifications associated with the ASVs remaining in DNA standards after removal of contaminants. Expected member of DNA standard are demarked with *.
featureID
|
expected_member
|
Taxon
|
ASV10
|
| | | | | | | | | | | | | | | | | | | | | |