This vignette will demonstrate additional usage of the statomatic package beyond that shown in the basic workflow vignette. It is recommended to read the basic workflow vignette before reading this one.
In this example we’ll demonstate how to use statomatic without creating a statomatic data set.
All the functions used by statomatic were first developed to be compatible with data.frames. The statomatic data set was added later as an optional way to keep data and results organized all in one place. The data.frame compatibility remains, and we can use most of statomatic’s functions individually.
We’ll use the same normalized counts that were used in the multi-group multi-factor example of the basic workflow vignette.
x <- read.csv("normalized_counts.csv")
head(x)
#> X Sample_600 Sample_601 Sample_602 Sample_616 Sample_617
#> 1 ENSMUSG00000051285 169.21935 309.83231 417.709088 214.03944 202.50044
#> 2 ENSMUSG00000057363 47.53353 66.64997 54.805802 41.23696 49.30862
#> 3 ENSMUSG00000100555 23.76676 55.84187 0.000000 48.10978 18.19153
#> 4 ENSMUSG00000078184 50.38554 59.44457 5.924952 12.76382 24.41495
#> 5 ENSMUSG00000061024 273.79311 293.62015 232.554350 144.32935 360.00079
#> 6 ENSMUSG00000025911 154.95929 525.99438 309.578721 208.14845 256.59631
#> Sample_603 Sample_604 Sample_605 Sample_606 Sample_621 Sample_607 Sample_608
#> 1 110.24045 171.38160 61.79141 190.642528 427.33305 228.73359 410.89116
#> 2 57.16171 0.00000 0.00000 103.558904 58.42021 63.76202 88.17407
#> 3 28.58086 151.21906 68.65713 5.884029 30.29196 21.25401 15.87133
#> 4 32.66384 10.08127 0.00000 44.718618 37.86495 19.22981 18.51655
#> 5 145.62627 55.44699 109.85140 222.416283 315.90190 253.02388 319.19013
#> 6 155.15322 171.38160 0.00000 264.781289 529.02750 301.60446 336.82494
#> Sample_609 Sample_610 Sample_623 Sample_611 Sample_612 Sample_613 Sample_614
#> 1 313.05044 412.25499 133.76758 93.303195 248.040131 46.21469 417.35080
#> 2 60.75650 14.45406 99.46820 103.366522 18.445254 0.00000 65.25144
#> 3 45.30993 40.01052 30.86944 5.093783 1.456204 390.25738 0.00000
#> 4 39.13131 36.86833 56.59398 57.025521 49.996347 0.00000 82.04884
#> 5 209.04355 211.36447 221.23100 240.525945 248.040131 302.96296 440.60874
#> 6 375.86649 528.09696 265.82019 148.589375 171.832106 10.26993 323.02694
#> Sample_615
#> 1 513.48498
#> 2 39.36603
#> 3 0.00000
#> 4 14.61838
#> 5 213.52043
#> 6 646.77697
We always need to make sure that the data inside the data.frame is purely numeric, and purely the numbers we’re trying to analyze. We can use the cf() function to set a specified column (defaults to column 1) as the rownames of the data.frame.
x <- cf(x)
head(x)
#> Sample_600 Sample_601 Sample_602 Sample_616 Sample_617
#> ENSMUSG00000051285 169.21935 309.83231 417.709088 214.03944 202.50044
#> ENSMUSG00000057363 47.53353 66.64997 54.805802 41.23696 49.30862
#> ENSMUSG00000100555 23.76676 55.84187 0.000000 48.10978 18.19153
#> ENSMUSG00000078184 50.38554 59.44457 5.924952 12.76382 24.41495
#> ENSMUSG00000061024 273.79311 293.62015 232.554350 144.32935 360.00079
#> ENSMUSG00000025911 154.95929 525.99438 309.578721 208.14845 256.59631
#> Sample_603 Sample_604 Sample_605 Sample_606 Sample_621
#> ENSMUSG00000051285 110.24045 171.38160 61.79141 190.642528 427.33305
#> ENSMUSG00000057363 57.16171 0.00000 0.00000 103.558904 58.42021
#> ENSMUSG00000100555 28.58086 151.21906 68.65713 5.884029 30.29196
#> ENSMUSG00000078184 32.66384 10.08127 0.00000 44.718618 37.86495
#> ENSMUSG00000061024 145.62627 55.44699 109.85140 222.416283 315.90190
#> ENSMUSG00000025911 155.15322 171.38160 0.00000 264.781289 529.02750
#> Sample_607 Sample_608 Sample_609 Sample_610 Sample_623
#> ENSMUSG00000051285 228.73359 410.89116 313.05044 412.25499 133.76758
#> ENSMUSG00000057363 63.76202 88.17407 60.75650 14.45406 99.46820
#> ENSMUSG00000100555 21.25401 15.87133 45.30993 40.01052 30.86944
#> ENSMUSG00000078184 19.22981 18.51655 39.13131 36.86833 56.59398
#> ENSMUSG00000061024 253.02388 319.19013 209.04355 211.36447 221.23100
#> ENSMUSG00000025911 301.60446 336.82494 375.86649 528.09696 265.82019
#> Sample_611 Sample_612 Sample_613 Sample_614 Sample_615
#> ENSMUSG00000051285 93.303195 248.040131 46.21469 417.35080 513.48498
#> ENSMUSG00000057363 103.366522 18.445254 0.00000 65.25144 39.36603
#> ENSMUSG00000100555 5.093783 1.456204 390.25738 0.00000 0.00000
#> ENSMUSG00000078184 57.025521 49.996347 0.00000 82.04884 14.61838
#> ENSMUSG00000061024 240.525945 248.040131 302.96296 440.60874 213.52043
#> ENSMUSG00000025911 148.589375 171.832106 10.26993 323.02694 646.77697
We also always need a sample_info object to map our samples (columns of the data.frame) to the experimental factors (groups, etc.).
sample_info <- data.frame(samples = colnames(x), sex = c(rep("Male", 10), rep("Female", 10)),
genotype = c(rep("WT", 5), rep("KO", 5), rep("WT", 5), rep("KO", 5)),
group = c(rep("M_WT", 5), rep("M_KO", 5), rep("F_WT", 5), rep("F_KO", 5)))
sample_info
#> samples sex genotype group
#> 1 Sample_600 Male WT M_WT
#> 2 Sample_601 Male WT M_WT
#> 3 Sample_602 Male WT M_WT
#> 4 Sample_616 Male WT M_WT
#> 5 Sample_617 Male WT M_WT
#> 6 Sample_603 Male KO M_KO
#> 7 Sample_604 Male KO M_KO
#> 8 Sample_605 Male KO M_KO
#> 9 Sample_606 Male KO M_KO
#> 10 Sample_621 Male KO M_KO
#> 11 Sample_607 Female WT F_WT
#> 12 Sample_608 Female WT F_WT
#> 13 Sample_609 Female WT F_WT
#> 14 Sample_610 Female WT F_WT
#> 15 Sample_623 Female WT F_WT
#> 16 Sample_611 Female KO F_KO
#> 17 Sample_612 Female KO F_KO
#> 18 Sample_613 Female KO F_KO
#> 19 Sample_614 Female KO F_KO
#> 20 Sample_615 Female KO F_KO
Now, instead of building the statomatic data set, we can just jump right in to the analysis. To do all the same things that sds_analyze does, we can use the multigroup_main() function. This function will sort each row of the data based on it’s distributions and scedasticity and perform the applicable tests. The return value of the function is a list, which contains all the results of the analysis.
res <- multigroup_main(x, var1 = sample_info$sex, var2 = sample_info$genotype, var3 = sample_info$group)
#> sorting data 100% done...
#> Using two way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 100% done
#> Dunn Test 100% done
#> Calculating fold changes...
names(res)
#> [1] "anova_results" "welch_results" "kw_results" "foldchanges"
#> [5] "ne" "nu" "nn"
You can access the elements of this list either by doing something like res[[1]] (double brackets recommended) or res$anova_results. Also notice that instead of specifying a design like we did when building a statomatic data set, we used the var1, 2, and 3 to tell statomatic what the experimental factors are. Based on these, statomatic knew to use a two way anova.
The multigroup_main function can handle both a multi factor and a single factor analysis. We could also do this:
res <- multigroup_main(x, var1 = sample_info$group)
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 100% done
#> Dunn Test 100% done
#> Calculating fold changes...
names(res)
#> [1] "anova_results" "welch_results" "kw_results" "foldchanges"
#> [5] "ne" "nu" "nn"
And notice that now statomatic used a one way anova.
There is also a twogroup_main function, which works very similarly to the example directly above. Note that the twogroup_main function can only handle one factor with exactly two groups.
res <- twogroup_main(x, var = sample_info$sex)
#> sorting data 100% done...
#> t-test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> Calculating fold changes...
#> Wilcox-test 100% done
#> Calculating fold changes...
names(res)
#> [1] "ttest_results" "welch_results" "wilcox_results" "foldchanges"
#> [5] "ne" "nu" "nn"
The sds_analyze function is comprised of these two main functions, along with some control flow stuff to make sure the data is routed to the correct one. In the same way that sds_analyze is comprised of the multigroup_main and twogroup_main functions above, each of the two main functions is comprised of yet more functions which can also be used independently.
We’ll first dig into the multigroup_main function.
By doing this:
multigroup_main
#> function (x, var1, var2 = NULL, var3 = NULL, padj = FALSE, write_files = FALSE,
#> file_names = c("anova_results.csv", "welch_results.csv",
#> "kw_results.csv"))
#> {
#> if (write_files == TRUE && length(file_names) != 3)
#> stop("This function creates three files but has not been provided three names")
#> if (is.null(var3)) {
#> var3 <- var1
#> }
#> nenunn <- test_norm(x, var1, var2)
#> ne <- nenunn[[1]]
#> nu <- nenunn[[2]]
#> nn <- nenunn[[3]]
#> all_fcs <- NULL
#> anova_results <- data.frame()
#> welch_results <- data.frame()
#> kw_results <- data.frame()
#> if (nrow(ne) > 0) {
#> my_anova <- run_anova(ne, var1 = var1, var2 = var2, padj)
#> if (padj == TRUE) {
#> if (is.null(var2)) {
#> colnames(my_anova) <- c(substitute(var1), paste0(deparse(substitute(var1)),
#> "_padj"))
#> }
#> else {
#> colnames(my_anova) <- c(substitute(var1), substitute(var2),
#> "interaction", paste0(deparse(substitute(var1)),
#> "_padj"), paste0(deparse(substitute(var2)),
#> "_padj"), "interaction_padj")
#> }
#> }
#> else {
#> if (is.null(var2)) {
#> colnames(my_anova) <- "Pval"
#> }
#> else {
#> colnames(my_anova) <- c(substitute(var1), substitute(var2),
#> "interaction")
#> }
#> }
#> tukey_res <- run_tukey(ne, var3)
#> fc_ne <- fold_change(ne, var3)
#> tukey_res <- combine_fc_pvals(tukey_res, fc_ne)
#> anova_results <- cbind(my_anova, tukey_res)
#> all_fcs <- rbind(all_fcs, fc_ne)
#> }
#> if (nrow(nu) > 0) {
#> welch_res <- run_welch(nu, var3, padj)
#> dunnett_res <- run_dunnett(nu, var3)
#> fc_nu <- fold_change(nu, var3)
#> dunnett_res <- combine_fc_pvals(dunnett_res, fc_nu)
#> welch_results <- cbind(welch_res, dunnett_res)
#> all_fcs <- rbind(all_fcs, fc_nu)
#> }
#> if (nrow(nn) > 0) {
#> krusk_res <- run_kruskal(nn, var3, padj)
#> dunn_res <- run_dunn(nn, var3)
#> fc_nn <- fold_change(nn, var3)
#> dunn_res <- combine_fc_pvals(dunn_res, fc_nn)
#> kw_results <- cbind(krusk_res, dunn_res)
#> all_fcs <- rbind(all_fcs, fc_nn)
#> }
#> if (write_files == TRUE) {
#> write.csv(anova_results, file_names[1])
#> write.csv(welch_results, file_names[2])
#> write.csv(kw_results, file_names[3])
#> return(NULL)
#> }
#> res <- list(anova_results = anova_results, welch_results = welch_results,
#> kw_results = kw_results, foldchanges = all_fcs, ne = ne,
#> nu = nu, nn = nn)
#> return(res)
#> }
#> <bytecode: 0x158f7fc70>
#> <environment: namespace:statomatic>
We can see the source code of the function.
Don’t worry about the whole thing, but notice the functions: test_norm, run_anova, run_tukey, run_welch, run_dunnett, run_kruskal, run_dunn, and fold_change.
Each of these functions is available to use independently of the rest.
The test_norm function is the function that sorts the data by distribution and scedasticity. It can handle either 1 or 2 factors.
The return value is a list with three data.frames. The first, res[[1]], is data determined to have normally distributed residuals and equal variances amongst the factors. The second, res[[2]], has normally distributed residuals but unequal variances amongst the factors. The third, res[[3]], is data with non-normally distributed residuals.
To use test_norm with only one factor, simply supply only one.
The usage of the rest of the functions is identical to that of the test_norm function, and their names are descriptive of what they do.
run_anova runs an anova test, run_tukey runs a tukey test, etc.
The run_anova function can handle either one factor or two, and either a one way or two way anova will be performed in accordance with the number of factors supplied. The rest of the functions only support using one factor.
We’ll demonstate the run_anova function.
anova_res <- run_anova(x, sample_info$sex, sample_info$genotype)
#> Using two way anova
#> anova 100% done
head(anova_res)
#> sample_info$sex sample_info$genotype interaction
#> ENSMUSG00000051285 0.4181785 0.4267951 0.79596064
#> ENSMUSG00000057363 0.6363190 0.3757298 0.70346317
#> ENSMUSG00000100555 0.7760540 0.3688330 0.80312281
#> ENSMUSG00000078184 0.3813036 0.9576038 0.57458343
#> ENSMUSG00000061024 0.1879463 0.5524404 0.08030823
#> ENSMUSG00000025911 0.5146527 0.3078191 0.83172345
And the run_tukey function.
tukey_res <- run_tukey(x, sample_info$group)
#> Tukey test 100% done...
head(tukey_res)
#> Pval_F_KO_vs_F_WT Pval_F_KO_vs_M_KO Pval_F_KO_vs_M_WT
#> ENSMUSG00000051285 0.9790520 0.8652082 0.9999995
#> ENSMUSG00000057363 0.7955281 0.9998882 0.9899203
#> ENSMUSG00000100555 0.8379946 0.9800907 0.8257741
#> ENSMUSG00000078184 0.9699575 0.7280883 0.9050970
#> ENSMUSG00000061024 0.8094858 0.1413002 0.9469754
#> ENSMUSG00000025911 0.8062227 0.9883949 0.9925533
#> Pval_F_WT_vs_M_KO Pval_F_WT_vs_M_WT Pval_M_KO_vs_M_WT
#> ENSMUSG00000051285 0.6568296 0.9773126 0.8699999
#> ENSMUSG00000057363 0.7597698 0.9258447 0.9820468
#> ENSMUSG00000100555 0.9688062 0.9999939 0.9635951
#> ENSMUSG00000078184 0.9310571 0.9954529 0.9824965
#> ENSMUSG00000061024 0.5161829 0.9850262 0.3321566
#> ENSMUSG00000025911 0.6260176 0.9228793 0.9330056
#> test
#> ENSMUSG00000051285 tukey-test
#> ENSMUSG00000057363 tukey-test
#> ENSMUSG00000100555 tukey-test
#> ENSMUSG00000078184 tukey-test
#> ENSMUSG00000061024 tukey-test
#> ENSMUSG00000025911 tukey-test
We’ll discuss only briefly about the twogroup_main function. You can view it’s source code the same way we did for the multigroup_main function.
This function is made up of the same test_norm function, run_ttest, run_welch, and run_wilcox, which are all available to use independently.
Run_ttest runs a t-test, run welch runs a welch test (and is the same welch test used in the multigroup case), and run_wilcox runs a wilcox test. Each of these three functions can handle only one factor, and their usage is identical to the run_tukey function.
To end this vignette, we’ll demonstrate a more advanced use case for using the multigroup_main function.
In this example we have six different tables we want to analyze. These tables are from a metagenomic analysis, and contain abundance information at the phylum, class, order, family, genus, and species level. We looked at this species data before, in the basic workflow vignette.
There are 24 samples in this experiment, and 3 experiemental factors, with 8 samples per factor. We will call the factors groups 1-3. Therefore, we’ll use the multigroup main function with one factor.
We’re going to apply the statomatic method to all six of these tables in order to identify differences between groups at each taxonomic level.
We read in the data all together and store it in a list.
#get all files in the current directory with abundance.csv in the name
a <- list.files(".", pattern = "abundance.csv")
a
#> [1] "class_abundance.csv" "family_abundance.csv" "genus_abundance.csv"
#> [4] "order_abundance.csv" "phylum_abundance.csv" "species_abundance.csv"
Notice that the file names are in alphabetical order.
#read them in with read.csv
read_in <- lapply(a, read.csv)
#look at first three rows of data
lapply(read_in, head, 3)
#> [[1]]
#> class sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
#> 1 Clostridia 102 475 2910 1188 1278 324
#> 2 Bacteroidia 1066 1005 383 801 623 628
#> 3 Erysipelotrichia 588 186 141 157 127 123
#> sample_7 sample_8 sample_9 sample_10 sample_11 sample_12 sample_13 sample_14
#> 1 4350 641 1176 883 426 5880 82 158
#> 2 596 549 259 264 551 270 207 582
#> 3 752 139 129 132 229 249 131 464
#> sample_15 sample_16 sample_17 sample_18 sample_19 sample_20 sample_21
#> 1 1632 531 384 829 1708 4864 799
#> 2 972 700 477 724 1227 661 1754
#> 3 144 135 121 746 752 171 251
#> sample_22 sample_23 sample_24
#> 1 841 1262 524
#> 2 1656 688 1523
#> 3 230 410 62
#>
#> [[2]]
#> family sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
#> 1 Lachnospiraceae 36 123 1888 426 586 69
#> 2 Oscillospiraceae 39 215 501 487 344 145
#> 3 unclassified 76 204 384 342 421 83
#> sample_7 sample_8 sample_9 sample_10 sample_11 sample_12 sample_13 sample_14
#> 1 2786 258 808 266 217 3684 38 56
#> 2 1066 255 175 324 66 1301 24 54
#> 3 585 159 310 229 56 1308 45 67
#> sample_15 sample_16 sample_17 sample_18 sample_19 sample_20 sample_21
#> 1 1099 395 39 136 764 3133 217
#> 2 173 71 32 258 491 896 293
#> 3 369 124 71 286 514 1188 273
#> sample_22 sample_23 sample_24
#> 1 290 508 98
#> 2 292 480 284
#> 3 288 262 197
#>
#> [[3]]
#> genus sample_1 sample_2 sample_3 sample_4 sample_5 sample_6 sample_7
#> 1 unclassified 281 304 1902 564 688 235 2828
#> 2 Dubosiella 583 181 118 145 112 118 665
#> 3 Bacteroides 406 92 91 106 111 185 106
#> sample_8 sample_9 sample_10 sample_11 sample_12 sample_13 sample_14 sample_15
#> 1 370 758 322 228 3085 114 152 1477
#> 2 131 121 117 113 210 120 460 117
#> 3 156 91 93 78 95 97 421 165
#> sample_16 sample_17 sample_18 sample_19 sample_20 sample_21 sample_22
#> 1 611 185 510 1002 3544 611 665
#> 2 105 114 737 728 123 245 220
#> 3 129 102 129 128 170 617 677
#> sample_23 sample_24
#> 1 717 323
#> 2 402 57
#> 3 140 774
#>
#> [[4]]
#> order sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
#> 1 Eubacteriales 100 468 2888 1184 1261 318
#> 2 Bacteroidales 1065 1005 380 800 621 626
#> 3 Erysipelotrichales 588 186 141 157 127 123
#> sample_7 sample_8 sample_9 sample_10 sample_11 sample_12 sample_13 sample_14
#> 1 4338 632 1167 874 426 5858 82 158
#> 2 593 549 258 263 551 270 206 582
#> 3 752 139 129 132 229 249 131 464
#> sample_15 sample_16 sample_17 sample_18 sample_19 sample_20 sample_21
#> 1 1625 529 382 812 1696 4845 783
#> 2 967 696 477 716 1224 656 1745
#> 3 144 135 121 746 752 171 251
#> sample_22 sample_23 sample_24
#> 1 824 1245 521
#> 2 1650 687 1520
#> 3 230 410 62
#>
#> [[5]]
#> phylum sample_1 sample_2 sample_3 sample_4 sample_5 sample_6 sample_7
#> 1 Bacillota 710 790 3214 1471 1573 495 5279
#> 2 Bacteroidota 1071 1007 385 801 627 631 603
#> 3 Pseudomonadota 14 269 475 372 396 29 197
#> sample_8 sample_9 sample_10 sample_11 sample_12 sample_13 sample_14 sample_15
#> 1 836 1506 1117 1109 6689 234 668 1948
#> 2 551 261 267 551 274 207 584 978
#> 3 266 402 395 263 352 413 14 462
#> sample_16 sample_17 sample_18 sample_19 sample_20 sample_21 sample_22
#> 1 712 999 1707 2770 5492 1231 1214
#> 2 708 477 736 1235 670 1764 1671
#> 3 176 409 25 35 516 35 163
#> sample_23 sample_24
#> 1 1862 630
#> 2 704 1532
#> 3 29 18
#>
#> [[6]]
#> species sample_1 sample_2 sample_3 sample_4 sample_5
#> 1 Lachnospiraceae bacterium 14 33 245 224 167
#> 2 Dubosiella newyorkensis 583 181 118 145 112
#> 3 Parasutterella excrementihominis 4 241 431 327 343
#> sample_6 sample_7 sample_8 sample_9 sample_10 sample_11 sample_12 sample_13
#> 1 9 996 71 312 85 91 1731 22
#> 2 118 665 131 121 117 113 210 120
#> 3 21 128 241 367 364 246 255 393
#> sample_14 sample_15 sample_16 sample_17 sample_18 sample_19 sample_20
#> 1 12 221 289 11 35 168 1907
#> 2 460 117 105 114 737 728 123
#> 3 8 386 120 385 7 4 433
#> sample_21 sample_22 sample_23 sample_24
#> 1 42 103 136 28
#> 2 245 220 402 57
#> 3 4 117 10 1
Each of these tables needs the cf function.
We can use the same sample info object for all of these tables, since they all contain the same samples.
#tables[[1]] is the first table in the tables list. all the tables have the same column names
sample_info <- data.frame(sample = colnames(tables[[1]]), group = c(rep("group1", 8), rep("group2", 8), rep("group3", 8)))
sample_info
#> sample group
#> 1 sample_1 group1
#> 2 sample_2 group1
#> 3 sample_3 group1
#> 4 sample_4 group1
#> 5 sample_5 group1
#> 6 sample_6 group1
#> 7 sample_7 group1
#> 8 sample_8 group1
#> 9 sample_9 group2
#> 10 sample_10 group2
#> 11 sample_11 group2
#> 12 sample_12 group2
#> 13 sample_13 group2
#> 14 sample_14 group2
#> 15 sample_15 group2
#> 16 sample_16 group2
#> 17 sample_17 group3
#> 18 sample_18 group3
#> 19 sample_19 group3
#> 20 sample_20 group3
#> 21 sample_21 group3
#> 22 sample_22 group3
#> 23 sample_23 group3
#> 24 sample_24 group3
Now we’ll apply the multigroup_main function to each of these tables, again using lapply.
res <- lapply(tables, multigroup_main, sample_info$group)
#>
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 100% done
#> Dunn Test 100% done
#> Calculating fold changes...
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 60% done...Kruskal-Wallis Test 100% done
#> Dunn Test 60% done...Dunn Test 100% done
#> Calculating fold changes...
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 71% done...Kruskal-Wallis Test 100% done
#> Dunn Test 71% done...Dunn Test 100% done
#> Calculating fold changes...
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 60% done...Kruskal-Wallis Test 100% done
#> Dunn Test 60% done...Dunn Test 100% done
#> Calculating fold changes...
#>
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 100% done
#> Dunn Test 100% done
#> Calculating fold changes...
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 63% done...Kruskal-Wallis Test 100% done
#> Dunn Test 63% done...Dunn Test 100% done
#> Calculating fold changes...
We know from before that multigroup_main returns a list, and we just applied this function to a list of data.frames. So what is res? It’s a list of lists.
#res has 6 elements, one per table we analyzed
length(res)
#> [1] 6
#each element of res has 7 elements
length(res[[1]])
#> [1] 7
names(res[[1]])
#> [1] "anova_results" "welch_results" "kw_results" "foldchanges"
#> [5] "ne" "nu" "nn"
Each element of res is the same as it was when we were only analyzing one table. The only difference is now we’re working with several of them at once.
We’ll look at the results of the first one just to orient ourselves, and we’re only going to look at the first three elements, the anova, welch, and kruskal-wallis results.
#get the first element of res (which is a list of results of the first analysis). This will be the class abundances since it is first alphabetically
x <- res[[1]]
lapply(x[1:3], head)
#> $anova_results
#> Pval Pval_group1_vs_group2
#> Betaproteobacteria 0.20000097 0.8263342
#> Coriobacteriia 0.06770511 0.5440510
#> Negativicutes 0.04907064 0.6193168
#> Tissierellia 0.15297673 0.1656226
#> Alphaproteobacteria 0.08962253 0.9493171
#> Flavobacteriia 0.16490962 0.5901316
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Betaproteobacteria -0.2803858 0.4456629
#> Coriobacteriia -0.5572681 0.3576415
#> Negativicutes 0.6147098 0.2415234
#> Tissierellia 1.2537566 0.9467810
#> Alphaproteobacteria 0.2630344 0.1795866
#> Flavobacteriia -0.8744691 0.1411272
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Betaproteobacteria 0.8590642 0.18512474
#> Coriobacteriia 1.3940111 0.05543407
#> Negativicutes -0.6890709 0.04158533
#> Tissierellia 0.1468414 0.27729400
#> Alphaproteobacteria -1.0000000 0.10326409
#> Flavobacteriia -1.4150375 0.59013161
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Betaproteobacteria 1.1394500 229.000 278.125
#> Coriobacteriia 1.9512792 173.125 254.750
#> Negativicutes -1.3037807 6.125 4.000
#> Tissierellia -1.1069152 3.875 1.625
#> Alphaproteobacteria -1.2630344 1.500 1.250
#> Flavobacteriia -0.5405684 0.750 1.375
#> group3_mean test
#> Betaproteobacteria 126.250 tukey-test
#> Coriobacteriia 65.875 tukey-test
#> Negativicutes 9.875 tukey-test
#> Tissierellia 3.500 tukey-test
#> Alphaproteobacteria 3.000 tukey-test
#> Flavobacteriia 2.000 tukey-test
#>
#> $welch_results
#> Pval Pval_group1_vs_group2 log2FoldChange_group1_vs_group2
#> Bacteroidia 0.03093208 0.2372938 0.5706097
#> Chitinophagia 0.04254664 0.6030114 1.2223924
#> Pval_group1_vs_group3 log2FoldChange_group1_vs_group3
#> Bacteroidia 0.2150544 -0.6241665
#> Chitinophagia 0.2606331 -1.2801079
#> Pval_group2_vs_group3 log2FoldChange_group2_vs_group3 group1_mean
#> Bacteroidia 0.03416268 -1.194776 706.375
#> Chitinophagia 0.05656468 -2.502500 0.875
#> group2_mean group3_mean test
#> Bacteroidia 475.625 1088.750 dunnett-test
#> Chitinophagia 0.375 2.125 dunnett-test
#>
#> $kw_results
#> Pval Pval_group1_vs_group2
#> Clostridia 0.81668648 1
#> Erysipelotrichia 0.65079423 1
#> unclassified 0.73334806 1
#> Verrucomicrobiae 0.12045536 1
#> Bacilli 0.08206044 1
#> Gammaproteobacteria 0.79132952 1
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Clostridia 0.06548115 1.00000000
#> Erysipelotrichia 0.45625701 1.00000000
#> unclassified -0.28237212 1.00000000
#> Verrucomicrobiae -0.47375524 0.19709373
#> Bacilli -2.15324626 0.08117608
#> Gammaproteobacteria -0.46247849 1.00000000
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Clostridia 0.007316499 1.0000000
#> Erysipelotrichia -0.309751171 1.0000000
#> unclassified -0.561999137 1.0000000
#> Verrucomicrobiae -0.652276613 0.2581564
#> Bacilli -2.106153552 0.4871113
#> Gammaproteobacteria -0.173648087 1.0000000
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Clostridia -0.05816465 1408.500 1346.000
#> Erysipelotrichia -0.76600818 276.625 201.625
#> unclassified -0.27962702 145.125 176.500
#> Verrucomicrobiae -0.17852137 82.000 113.875
#> Bacilli 0.04709271 25.375 112.875
#> Gammaproteobacteria 0.28883041 21.500 29.625
#> group3_mean test
#> Clostridia 1401.375 dunn-test
#> Erysipelotrichia 342.875 dunn-test
#> unclassified 214.250 dunn-test
#> Verrucomicrobiae 128.875 dunn-test
#> Bacilli 109.250 dunn-test
#> Gammaproteobacteria 24.250 dunn-test
Since all the results of this analysis were tested in the same way, and all of the results from each taxonomic level have the same columns, we can collate each one into a single data.frame.
What we’ll do is for each list of results in res, we’ll rbind (row bind) the first three elements, save this collated table to a new list and then give it a name.
For the names, we’ll use the result of the list.files function above, but we’ll trim off the .csv extension first.
#gsub .csv with nothing in list a
name_list <- gsub(".csv", "", a)
name_list
#> [1] "class_abundance" "family_abundance" "genus_abundance"
#> [4] "order_abundance" "phylum_abundance" "species_abundance"
Now we’ll collate and save the tables
#define an empty list for our new tables
collated_tables <- list()
#use a for loop to collate and save our tables
for (i in 1:length(res)) {
#the current result were processing
temp <- res[[i]]
#get just the p values and fold change results
first_three <- temp[1:3]
#rbind into one new table
new_table <- rbind(first_three[[1]], first_three[[2]], first_three[[3]])
#save the new table to the collated_tables list
collated_tables[[i]] <- new_table
}
#set the names of collated_tables to be the names from name_list
names(collated_tables) <- name_list
#look at some of the results
lapply(collated_tables, head, 3)
#> $class_abundance
#> Pval Pval_group1_vs_group2
#> Betaproteobacteria 0.20000097 0.8263342
#> Coriobacteriia 0.06770511 0.5440510
#> Negativicutes 0.04907064 0.6193168
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Betaproteobacteria -0.2803858 0.4456629
#> Coriobacteriia -0.5572681 0.3576415
#> Negativicutes 0.6147098 0.2415234
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Betaproteobacteria 0.8590642 0.18512474
#> Coriobacteriia 1.3940111 0.05543407
#> Negativicutes -0.6890709 0.04158533
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Betaproteobacteria 1.139450 229.000 278.125
#> Coriobacteriia 1.951279 173.125 254.750
#> Negativicutes -1.303781 6.125 4.000
#> group3_mean test
#> Betaproteobacteria 126.250 tukey-test
#> Coriobacteriia 65.875 tukey-test
#> Negativicutes 9.875 tukey-test
#>
#> $family_abundance
#> Pval Pval_group1_vs_group2
#> Sutterellaceae 0.197457096 0.8104024
#> Atopobiaceae 0.066157338 0.5388124
#> Prevotellaceae 0.001577777 0.6487139
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Sutterellaceae -0.2966753 0.45486902
#> Atopobiaceae -0.5663191 0.35585660
#> Prevotellaceae 0.7670228 0.01360419
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Sutterellaceae 0.8589077 0.18096687
#> Atopobiaceae 1.4272001 0.05406228
#> Prevotellaceae -1.2857835 0.00171722
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Sutterellaceae 1.155583 219.00 269.000
#> Atopobiaceae 1.993519 168.75 249.875
#> Prevotellaceae -2.052806 24.25 14.250
#> group3_mean test
#> Sutterellaceae 120.750 tukey-test
#> Atopobiaceae 62.750 tukey-test
#> Prevotellaceae 59.125 tukey-test
#>
#> $genus_abundance
#> Pval Pval_group1_vs_group2
#> Parasutterella 0.19869563 0.8113967
#> Muricaecibacterium 0.06812654 0.5464864
#> Anaerotruncus 0.51580712 0.8242216
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Parasutterella -0.2961576 0.4560574
#> Muricaecibacterium -0.5635121 0.3573489
#> Anaerotruncus 0.3156143 0.8353710
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Parasutterella 0.8580840 0.18218500
#> Muricaecibacterium 1.4364879 0.05582506
#> Anaerotruncus -0.2503234 0.48405009
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Parasutterella 1.1542416 218.875 268.750
#> Muricaecibacterium 2.0000000 163.750 242.000
#> Anaerotruncus -0.5659377 35.625 28.625
#> group3_mean test
#> Parasutterella 120.750 tukey-test
#> Muricaecibacterium 60.500 tukey-test
#> Anaerotruncus 42.375 tukey-test
#>
#> $order_abundance
#> Pval Pval_group1_vs_group2
#> Burkholderiales 0.19886795 0.8240061
#> Coriobacteriales 0.06626659 0.5465790
#> Bacillales 0.04799237 0.1124440
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Burkholderiales -0.2828907 0.4457946
#> Coriobacteriales -0.5561823 0.3504407
#> Bacillales 1.1020982 0.9373723
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Burkholderiales 0.8616358 0.18382079
#> Coriobacteriales 1.4224441 0.05427840
#> Bacillales -0.1202942 0.05767428
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Burkholderiales 1.144526 228.500 278.000
#> Coriobacteriales 1.978626 170.875 251.250
#> Bacillales -1.222392 20.125 9.375
#> group3_mean test
#> Burkholderiales 125.750 tukey-test
#> Coriobacteriales 63.750 tukey-test
#> Bacillales 21.875 tukey-test
#>
#> $phylum_abundance
#> Pval Pval_group1_vs_group2
#> Pseudomonadota 0.21409442 0.7870908
#> Actinomycetota 0.07304152 0.5632674
#> Thermodesulfobacteriota 0.80778822 0.8845479
#> log2FoldChange_group1_vs_group2 Pval_group1_vs_group3
#> Pseudomonadota -0.2956677 0.5019108
#> Actinomycetota -0.5316898 0.3622041
#> Thermodesulfobacteriota 0.2995603 0.9863854
#> log2FoldChange_group1_vs_group3 Pval_group2_vs_group3
#> Pseudomonadota 0.71426786 0.19346553
#> Actinomycetota 1.32040063 0.06022461
#> Thermodesulfobacteriota -0.08746284 0.80490509
#> log2FoldChange_group2_vs_group3 group1_mean group2_mean
#> Pseudomonadota 1.0099356 252.25 309.625
#> Actinomycetota 1.8520905 177.00 255.875
#> Thermodesulfobacteriota -0.3870231 2.00 1.625
#> group3_mean test
#> Pseudomonadota 153.750 tukey-test
#> Actinomycetota 70.875 tukey-test
#> Thermodesulfobacteriota 2.125 tukey-test
#>
#> $species_abundance
#> Pval Pval_group1_vs_group2
#> Parasutterella excrementihominis 0.199182490 0.8062039
#> Muricaecibacterium torontonense 0.068126536 0.5464864
#> Muribaculaceae bacterium 0.001032886 0.4923547
#> log2FoldChange_group1_vs_group2
#> Parasutterella excrementihominis -0.3011695
#> Muricaecibacterium torontonense -0.5635121
#> Muribaculaceae bacterium 0.6818240
#> Pval_group1_vs_group3
#> Parasutterella excrementihominis 0.46121140
#> Muricaecibacterium torontonense 0.35734892
#> Muribaculaceae bacterium 0.01419407
#> log2FoldChange_group1_vs_group3
#> Parasutterella excrementihominis 0.8531586
#> Muricaecibacterium torontonense 1.4364879
#> Muribaculaceae bacterium -1.0093379
#> Pval_group2_vs_group3
#> Parasutterella excrementihominis 0.182048409
#> Muricaecibacterium torontonense 0.055825056
#> Muribaculaceae bacterium 0.000975412
#> log2FoldChange_group2_vs_group3 group1_mean
#> Parasutterella excrementihominis 1.154328 217.00
#> Muricaecibacterium torontonense 2.000000 163.75
#> Muribaculaceae bacterium -1.691162 38.50
#> group2_mean group3_mean test
#> Parasutterella excrementihominis 267.375 120.125 tukey-test
#> Muricaecibacterium torontonense 242.000 60.500 tukey-test
#> Muribaculaceae bacterium 24.000 77.500 tukey-test
From here, we could do some further analysis/filtering of the results. One way we could export these results is as follows. We write.csv for each of the elements in collated_tables and give it the same file name as the original abundance table with results_ at the beginning