This basic workflow vignette is intended to demonstate some basic usage of the statomatic package to analyze count data from an experiment.
There will be three different analyses included here, meant to demonstrate how statomatic handles different experiment designs. The three designs demonstrated here will be a multi group (> 2) two factor design, a two group one factor design, and a multi group one factor design.
First, load the statomatic package.
The data analyzed here is from an RNA Sequencing experiement which aimed to identify differentially expressed genes across several experiemental conditions. The details of this experiment won’t be discussed here.
Load the data to be analyzed. Here we analyze the normalized counts, as created by DESeq2.
norm_counts <- read.csv("normalized_counts.csv")
head(norm_counts)
#> X Sample_600 Sample_601 Sample_602 Sample_616 Sample_617
#> 1 ENSMUSG00000051285 169.21935 309.83231 417.709088 214.03944 202.50044
#> 2 ENSMUSG00000057363 47.53353 66.64997 54.805802 41.23696 49.30862
#> 3 ENSMUSG00000100555 23.76676 55.84187 0.000000 48.10978 18.19153
#> 4 ENSMUSG00000078184 50.38554 59.44457 5.924952 12.76382 24.41495
#> 5 ENSMUSG00000061024 273.79311 293.62015 232.554350 144.32935 360.00079
#> 6 ENSMUSG00000025911 154.95929 525.99438 309.578721 208.14845 256.59631
#> Sample_603 Sample_604 Sample_605 Sample_606 Sample_621 Sample_607 Sample_608
#> 1 110.24045 171.38160 61.79141 190.642528 427.33305 228.73359 410.89116
#> 2 57.16171 0.00000 0.00000 103.558904 58.42021 63.76202 88.17407
#> 3 28.58086 151.21906 68.65713 5.884029 30.29196 21.25401 15.87133
#> 4 32.66384 10.08127 0.00000 44.718618 37.86495 19.22981 18.51655
#> 5 145.62627 55.44699 109.85140 222.416283 315.90190 253.02388 319.19013
#> 6 155.15322 171.38160 0.00000 264.781289 529.02750 301.60446 336.82494
#> Sample_609 Sample_610 Sample_623 Sample_611 Sample_612 Sample_613 Sample_614
#> 1 313.05044 412.25499 133.76758 93.303195 248.040131 46.21469 417.35080
#> 2 60.75650 14.45406 99.46820 103.366522 18.445254 0.00000 65.25144
#> 3 45.30993 40.01052 30.86944 5.093783 1.456204 390.25738 0.00000
#> 4 39.13131 36.86833 56.59398 57.025521 49.996347 0.00000 82.04884
#> 5 209.04355 211.36447 221.23100 240.525945 248.040131 302.96296 440.60874
#> 6 375.86649 528.09696 265.82019 148.589375 171.832106 10.26993 323.02694
#> Sample_615
#> 1 513.48498
#> 2 39.36603
#> 3 0.00000
#> 4 14.61838
#> 5 213.52043
#> 6 646.77697
Notice that the row names of norm_counts are numbers, and that the gene ids are actually inside the data.frame. Statomatic will not be able to handle this textual data, so we need to make sure to set the gene ids as the row names and remove the gene ids column from inside the data.frame.
Statomatic has a built in function for this, cf(). This function will set a specified column of a data.frame to be the new row names, and then remove the column from inside the data.frame. The column which should be the row names can be specified by column_index = an integer, but this value defaults to 1 (i.e., the first column).
norm_counts <- cf(norm_counts)
head(norm_counts)
#> Sample_600 Sample_601 Sample_602 Sample_616 Sample_617
#> ENSMUSG00000051285 169.21935 309.83231 417.709088 214.03944 202.50044
#> ENSMUSG00000057363 47.53353 66.64997 54.805802 41.23696 49.30862
#> ENSMUSG00000100555 23.76676 55.84187 0.000000 48.10978 18.19153
#> ENSMUSG00000078184 50.38554 59.44457 5.924952 12.76382 24.41495
#> ENSMUSG00000061024 273.79311 293.62015 232.554350 144.32935 360.00079
#> ENSMUSG00000025911 154.95929 525.99438 309.578721 208.14845 256.59631
#> Sample_603 Sample_604 Sample_605 Sample_606 Sample_621
#> ENSMUSG00000051285 110.24045 171.38160 61.79141 190.642528 427.33305
#> ENSMUSG00000057363 57.16171 0.00000 0.00000 103.558904 58.42021
#> ENSMUSG00000100555 28.58086 151.21906 68.65713 5.884029 30.29196
#> ENSMUSG00000078184 32.66384 10.08127 0.00000 44.718618 37.86495
#> ENSMUSG00000061024 145.62627 55.44699 109.85140 222.416283 315.90190
#> ENSMUSG00000025911 155.15322 171.38160 0.00000 264.781289 529.02750
#> Sample_607 Sample_608 Sample_609 Sample_610 Sample_623
#> ENSMUSG00000051285 228.73359 410.89116 313.05044 412.25499 133.76758
#> ENSMUSG00000057363 63.76202 88.17407 60.75650 14.45406 99.46820
#> ENSMUSG00000100555 21.25401 15.87133 45.30993 40.01052 30.86944
#> ENSMUSG00000078184 19.22981 18.51655 39.13131 36.86833 56.59398
#> ENSMUSG00000061024 253.02388 319.19013 209.04355 211.36447 221.23100
#> ENSMUSG00000025911 301.60446 336.82494 375.86649 528.09696 265.82019
#> Sample_611 Sample_612 Sample_613 Sample_614 Sample_615
#> ENSMUSG00000051285 93.303195 248.040131 46.21469 417.35080 513.48498
#> ENSMUSG00000057363 103.366522 18.445254 0.00000 65.25144 39.36603
#> ENSMUSG00000100555 5.093783 1.456204 390.25738 0.00000 0.00000
#> ENSMUSG00000078184 57.025521 49.996347 0.00000 82.04884 14.61838
#> ENSMUSG00000061024 240.525945 248.040131 302.96296 440.60874 213.52043
#> ENSMUSG00000025911 148.589375 171.832106 10.26993 323.02694 646.77697
Now the data is ready to be analyzed.
We also must provide statomatic with metadata about our samples. In the case of this experiment, our samples come from different experimental groups. The way to do this is by creating another data.frame which contains our sample names and their experimental conditions. We’ll use the column names of norm_counts, since these are the samples, and we’ll specify conditions of the experiment in additional columns. The experimental conditions here are sex and genotype.
sample_info <- data.frame(samples = colnames(norm_counts), sex = c(rep("Male", 10), rep("Female", 10)),
genotype = c(rep("WT", 5), rep("KO", 5), rep("WT", 5), rep("KO", 5)),
group = c(rep("M_WT", 5), rep("M_KO", 5), rep("F_WT", 5), rep("F_KO", 5)))
sample_info
#> samples sex genotype group
#> 1 Sample_600 Male WT M_WT
#> 2 Sample_601 Male WT M_WT
#> 3 Sample_602 Male WT M_WT
#> 4 Sample_616 Male WT M_WT
#> 5 Sample_617 Male WT M_WT
#> 6 Sample_603 Male KO M_KO
#> 7 Sample_604 Male KO M_KO
#> 8 Sample_605 Male KO M_KO
#> 9 Sample_606 Male KO M_KO
#> 10 Sample_621 Male KO M_KO
#> 11 Sample_607 Female WT F_WT
#> 12 Sample_608 Female WT F_WT
#> 13 Sample_609 Female WT F_WT
#> 14 Sample_610 Female WT F_WT
#> 15 Sample_623 Female WT F_WT
#> 16 Sample_611 Female KO F_KO
#> 17 Sample_612 Female KO F_KO
#> 18 Sample_613 Female KO F_KO
#> 19 Sample_614 Female KO F_KO
#> 20 Sample_615 Female KO F_KO
Notice that the final column, group, is a linear combination of the first two columns. This is necessary to statomatic’s process. The final column of the sample_info should always describe the distribution of the rest of the experimental conditions (if there are any).
Now we build the statomatic data set. We’ll include the columns, the sample metdata (sample_info), and specify the experimental design. The experimental design here is sex + genotype.
sds <- make_sds(x = norm_counts, colData = sample_info, design = ~ sex + genotype)
sds
#> class: StatomaticDataset
#> dim: 100 20
#> metadata(0):
#> assays(1): counts
#> rownames(100): ENSMUSG00000051285 ENSMUSG00000057363 ...
#> ENSMUSG00000026384 ENSMUSG00000036975
#> rowData names(0):
#> colnames(20): Sample_600 Sample_601 ... Sample_614 Sample_615
#> colData names(4): samples sex genotype group
By viewing sds we can see information about our data such as the number of rows, column names, etc.
To analyze the data contained in sds, we use the sds_analyze() function. This functions automatically takes information from sds in order to perform the analysis. This function will use the experimental design we specified when we created sds
sds <- sds_analyze(sds)
#> sorting data 100% done...
#> Using two way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 100% done
#> Dunn Test 100% done
#> Calculating fold changes...
sds
#> class: StatomaticDataset
#> dim: 100 20
#> metadata(0):
#> assays(1): counts
#> rownames(100): ENSMUSG00000051285 ENSMUSG00000057363 ...
#> ENSMUSG00000026384 ENSMUSG00000036975
#> rowData names(0):
#> colnames(20): Sample_600 Sample_601 ... Sample_614 Sample_615
#> colData names(4): samples sex genotype group
Notice that sds appears unchanged. The results of the analysis are stored within sds, and we can access them as follows.
To access all of the p-values from the multi-comparison tests and the fold changes, we can use the get_results() function.
res <- get_results(sds)
head(res)
#> Pval_F_KO_vs_F_WT log2FoldChange_F_KO_vs_F_WT
#> ENSMUSG00000051285 0.9790520 -0.1849281
#> ENSMUSG00000057363 0.7955281 -0.5285301
#> ENSMUSG00000078184 0.9699575 0.2579516
#> ENSMUSG00000061024 0.8094858 0.2521327
#> ENSMUSG00000025911 0.8062227 -0.4755035
#> ENSMUSG00000057335 0.8020005 -0.5990134
#> Pval_F_KO_vs_M_KO log2FoldChange_F_KO_vs_M_KO
#> ENSMUSG00000051285 0.8652082 0.4555891
#> ENSMUSG00000057363 0.9998882 0.0472020
#> ENSMUSG00000078184 0.7280883 0.7006521
#> ENSMUSG00000061024 0.1413002 0.7674774
#> ENSMUSG00000025911 0.9883949 0.2151199
#> ENSMUSG00000057335 0.9992630 0.1067049
#> Pval_F_KO_vs_M_WT log2FoldChange_F_KO_vs_M_WT
#> ENSMUSG00000051285 0.9999995 0.005584147
#> ENSMUSG00000057363 0.9899203 -0.196868100
#> ENSMUSG00000078184 0.9050970 0.413461170
#> ENSMUSG00000061024 0.9469754 0.148453255
#> ENSMUSG00000025911 0.9925533 -0.162232725
#> ENSMUSG00000057335 0.7267167 -0.672042886
#> Pval_F_WT_vs_M_KO log2FoldChange_F_WT_vs_M_KO
#> ENSMUSG00000051285 0.6568296 0.6405172
#> ENSMUSG00000057363 0.7597698 0.5757321
#> ENSMUSG00000078184 0.9310571 0.4427005
#> ENSMUSG00000061024 0.5161829 0.5153447
#> ENSMUSG00000025911 0.6260176 0.6906234
#> ENSMUSG00000057335 0.7340525 0.7057183
#> Pval_F_WT_vs_M_WT log2FoldChange_F_WT_vs_M_WT
#> ENSMUSG00000051285 0.9773126 0.19051226
#> ENSMUSG00000057363 0.9258447 0.33166197
#> ENSMUSG00000078184 0.9954529 0.15550958
#> ENSMUSG00000061024 0.9850262 -0.10367949
#> ENSMUSG00000025911 0.9228793 0.31327074
#> ENSMUSG00000057335 0.9990121 -0.07302953
#> Pval_M_KO_vs_M_WT log2FoldChange_M_KO_vs_M_WT F_KO_mean
#> ENSMUSG00000051285 0.8699999 -0.4500049 263.67876
#> ENSMUSG00000057363 0.9820468 -0.2440701 45.28585
#> ENSMUSG00000078184 0.9824965 -0.2871910 40.73782
#> ENSMUSG00000061024 0.3321566 -0.6190242 289.13164
#> ENSMUSG00000025911 0.9330056 -0.3773527 260.09906
#> ENSMUSG00000057335 0.6535879 -0.7787478 189.02091
#> F_WT_mean M_KO_mean M_WT_mean test
#> ENSMUSG00000051285 299.73955 192.27781 262.66013 tukey-test
#> ENSMUSG00000057363 65.32297 43.82817 51.90698 tukey-test
#> ENSMUSG00000078184 34.06800 25.06574 30.58677 tukey-test
#> ENSMUSG00000061024 242.77061 169.84857 260.85955 tukey-test
#> ENSMUSG00000025911 361.64261 224.06872 291.05543 tukey-test
#> ENSMUSG00000057335 286.30626 175.54500 301.17223 tukey-test
Notice that in addition to the p-values and fold changes, the mean of each group as well as the test used is at the end of the data.frame.
If you want to see the anova results, they can be accessed with get_anova().
anova_res <- get_anova(sds)
head(anova_res)
#> sex genotype interaction Pval_F_KO_vs_F_WT
#> ENSMUSG00000051285 0.4181785 0.4267951 0.79596064 0.9790520
#> ENSMUSG00000057363 0.6363190 0.3757298 0.70346317 0.7955281
#> ENSMUSG00000078184 0.3813036 0.9576038 0.57458343 0.9699575
#> ENSMUSG00000061024 0.1879463 0.5524404 0.08030823 0.8094858
#> ENSMUSG00000025911 0.5146527 0.3078191 0.83172345 0.8062227
#> ENSMUSG00000057335 0.9928111 0.1615905 0.85432131 0.8020005
#> log2FoldChange_F_KO_vs_F_WT Pval_F_KO_vs_M_KO
#> ENSMUSG00000051285 -0.1849281 0.8652082
#> ENSMUSG00000057363 -0.5285301 0.9998882
#> ENSMUSG00000078184 0.2579516 0.7280883
#> ENSMUSG00000061024 0.2521327 0.1413002
#> ENSMUSG00000025911 -0.4755035 0.9883949
#> ENSMUSG00000057335 -0.5990134 0.9992630
#> log2FoldChange_F_KO_vs_M_KO Pval_F_KO_vs_M_WT
#> ENSMUSG00000051285 0.4555891 0.9999995
#> ENSMUSG00000057363 0.0472020 0.9899203
#> ENSMUSG00000078184 0.7006521 0.9050970
#> ENSMUSG00000061024 0.7674774 0.9469754
#> ENSMUSG00000025911 0.2151199 0.9925533
#> ENSMUSG00000057335 0.1067049 0.7267167
#> log2FoldChange_F_KO_vs_M_WT Pval_F_WT_vs_M_KO
#> ENSMUSG00000051285 0.005584147 0.6568296
#> ENSMUSG00000057363 -0.196868100 0.7597698
#> ENSMUSG00000078184 0.413461170 0.9310571
#> ENSMUSG00000061024 0.148453255 0.5161829
#> ENSMUSG00000025911 -0.162232725 0.6260176
#> ENSMUSG00000057335 -0.672042886 0.7340525
#> log2FoldChange_F_WT_vs_M_KO Pval_F_WT_vs_M_WT
#> ENSMUSG00000051285 0.6405172 0.9773126
#> ENSMUSG00000057363 0.5757321 0.9258447
#> ENSMUSG00000078184 0.4427005 0.9954529
#> ENSMUSG00000061024 0.5153447 0.9850262
#> ENSMUSG00000025911 0.6906234 0.9228793
#> ENSMUSG00000057335 0.7057183 0.9990121
#> log2FoldChange_F_WT_vs_M_WT Pval_M_KO_vs_M_WT
#> ENSMUSG00000051285 0.19051226 0.8699999
#> ENSMUSG00000057363 0.33166197 0.9820468
#> ENSMUSG00000078184 0.15550958 0.9824965
#> ENSMUSG00000061024 -0.10367949 0.3321566
#> ENSMUSG00000025911 0.31327074 0.9330056
#> ENSMUSG00000057335 -0.07302953 0.6535879
#> log2FoldChange_M_KO_vs_M_WT F_KO_mean F_WT_mean M_KO_mean
#> ENSMUSG00000051285 -0.4500049 263.67876 299.73955 192.27781
#> ENSMUSG00000057363 -0.2440701 45.28585 65.32297 43.82817
#> ENSMUSG00000078184 -0.2871910 40.73782 34.06800 25.06574
#> ENSMUSG00000061024 -0.6190242 289.13164 242.77061 169.84857
#> ENSMUSG00000025911 -0.3773527 260.09906 361.64261 224.06872
#> ENSMUSG00000057335 -0.7787478 189.02091 286.30626 175.54500
#> M_WT_mean test
#> ENSMUSG00000051285 262.66013 tukey-test
#> ENSMUSG00000057363 51.90698 tukey-test
#> ENSMUSG00000078184 30.58677 tukey-test
#> ENSMUSG00000061024 260.85955 tukey-test
#> ENSMUSG00000025911 291.05543 tukey-test
#> ENSMUSG00000057335 301.17223 tukey-test
This data.frame also includes the Tukey multi-comparison results, as well as fold changes.
To access the welch and kruskal-wallis results, use get_welch_test and get_kw.
You can also access the data.frames ne, nu, and nn by using get_ne, get_nu, and get_nn.
To see what other results are stored in sds, you can do this:
names(sds@results)
#> [1] "anova_results" "welch_test_results" "kw_results"
#> [4] "all_results" "ne" "nu"
#> [7] "nn" "foldchange"
You can access any of these results by doing something like sds@results$anova_results.
First load the data. For this example we’ll look at some gene expression data from an RNA-Seq experiment. These counts were normalized by DESeq2. There are two groups in this data set. Group1 is samples 1-4, and group2 is samples 5-8. For the purposes of this example, we’ll say that group1 is “treated” and group2 is “untreated”.
x <- read.csv("two_group_normalized_counts.csv")
head(x)
#> X sample_1 sample_2 sample_3 sample_4 sample_5
#> 1 ENSMUSG00000000001 1707.89280 1843.60723 1687.02686 1501.02388 1806.65191
#> 2 ENSMUSG00000000028 27.84608 30.17596 29.61100 69.66777 48.57703
#> 3 ENSMUSG00000000056 468.74232 517.30211 556.20667 404.54810 455.79719
#> 4 ENSMUSG00000000058 60.33317 81.90617 75.22795 51.45915 116.79157
#> 5 ENSMUSG00000000078 2706.87086 3110.99739 2504.93077 2310.91176 2468.12630
#> 6 ENSMUSG00000000085 158.95470 145.13198 226.48416 192.37806 248.05289
#> sample_6 sample_7 sample_8
#> 1 1916.63101 1984.70503 2100.84931
#> 2 51.32071 42.77382 52.52123
#> 3 468.79492 497.03174 504.00183
#> 4 88.82430 71.86001 68.68161
#> 5 2762.43574 2557.01869 2697.77332
#> 6 224.03462 203.60336 198.97467
We need to format the data so that the gene ids in the first column are rownames, and not inside the data.frame. We can use the cf() function for this.
x <- cf(x)
head(x)
#> sample_1 sample_2 sample_3 sample_4 sample_5
#> ENSMUSG00000000001 1707.89280 1843.60723 1687.02686 1501.02388 1806.65191
#> ENSMUSG00000000028 27.84608 30.17596 29.61100 69.66777 48.57703
#> ENSMUSG00000000056 468.74232 517.30211 556.20667 404.54810 455.79719
#> ENSMUSG00000000058 60.33317 81.90617 75.22795 51.45915 116.79157
#> ENSMUSG00000000078 2706.87086 3110.99739 2504.93077 2310.91176 2468.12630
#> ENSMUSG00000000085 158.95470 145.13198 226.48416 192.37806 248.05289
#> sample_6 sample_7 sample_8
#> ENSMUSG00000000001 1916.63101 1984.70503 2100.84931
#> ENSMUSG00000000028 51.32071 42.77382 52.52123
#> ENSMUSG00000000056 468.79492 497.03174 504.00183
#> ENSMUSG00000000058 88.82430 71.86001 68.68161
#> ENSMUSG00000000078 2762.43574 2557.01869 2697.77332
#> ENSMUSG00000000085 224.03462 203.60336 198.97467
Now, we have to make a sample_info object to map our samples to their experimental factors.
sample_info <- data.frame(sample = colnames(x), group = c(rep("treated", 4), rep("untreated", 4)))
sample_info
#> sample group
#> 1 sample_1 treated
#> 2 sample_2 treated
#> 3 sample_3 treated
#> 4 sample_4 treated
#> 5 sample_5 untreated
#> 6 sample_6 untreated
#> 7 sample_7 untreated
#> 8 sample_8 untreated
Now we build the statomatic data set
sds <- make_sds(x, colData = sample_info, design = ~ group)
sds
#> class: StatomaticDataset
#> dim: 100 8
#> metadata(0):
#> assays(1): counts
#> rownames(100): ENSMUSG00000000001 ENSMUSG00000000028 ...
#> ENSMUSG00000000787 ENSMUSG00000000791
#> rowData names(0):
#> colnames(8): sample_1 sample_2 ... sample_7 sample_8
#> colData names(2): sample group
And we run sds_analyze.
sds <- sds_analyze(sds)
#> sorting data 100% done...
#> t-test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> Calculating fold changes...
#> Wilcox-test 100% done
#> Calculating fold changes...
And we can extract the results by doing:
res <- get_results(sds)
head(res)
#> Pval test log2FoldChange_treated_vs_untreated
#> ENSMUSG00000000001 0.02895439 t-test -0.21245532
#> ENSMUSG00000000056 0.88359469 t-test 0.01577680
#> ENSMUSG00000000058 0.18825794 t-test -0.36421697
#> ENSMUSG00000000078 0.84664860 t-test 0.02026956
#> ENSMUSG00000000085 0.12619557 t-test -0.27483781
#> ENSMUSG00000000088 0.82980304 t-test -0.02965276
#> treated_mean untreated_mean
#> ENSMUSG00000000001 1684.88769 1952.20932
#> ENSMUSG00000000056 486.69980 481.40642
#> ENSMUSG00000000058 67.23161 86.53937
#> ENSMUSG00000000078 2658.42770 2621.33851
#> ENSMUSG00000000085 180.73722 218.66639
#> ENSMUSG00000000088 593.82628 606.15792
You can access most results stored in sds with a function that starts with get_ and ends with the name of the result. To see a full list of results stored in sds, you can do this:
names(sds@results)
#> [1] "t_test_results" "welch_test_results" "wilcox_test_results"
#> [4] "all_results" "ne" "nu"
#> [7] "nn" "foldchange"
And any of these can be accessed by sds@results$result_name
Notice that we used the same function (sds_analyze) to analyze both multi-group and two-group data. The sds_analyze function uses the design information supplied during the creation of sds to decide whether to apply two-group or multi-group tests.
We’re going to use the same sds_analyze function again for the next part of this vignette.
The data we use here is some species level data from a metagenomic analysis. There are 24 samples, and three experimental factors with 8 samples per group. For our purposes here, we’ll call the factors group1, 2, and 3. The pace of this analysis will be a bit faster than the previous one, as most steps are very similar.
#read in data
x <- read.csv("species_abundance.csv")
#fix rownames
x <- cf(x)
head(x)
#> sample_1 sample_2 sample_3 sample_4 sample_5
#> Lachnospiraceae bacterium 14 33 245 224 167
#> Dubosiella newyorkensis 583 181 118 145 112
#> Parasutterella excrementihominis 4 241 431 327 343
#> Barnesiella sp. CU968 261 146 168 138 128
#> Bacteroides thetaiotaomicron 356 74 79 91 95
#> Lachnospiraceae bacterium 28-4 2 2 1176 7 92
#> sample_6 sample_7 sample_8 sample_9 sample_10
#> Lachnospiraceae bacterium 9 996 71 312 85
#> Dubosiella newyorkensis 118 665 131 121 117
#> Parasutterella excrementihominis 21 128 241 367 364
#> Barnesiella sp. CU968 175 251 144 129 129
#> Bacteroides thetaiotaomicron 155 80 129 76 83
#> Lachnospiraceae bacterium 28-4 5 1170 43 19 4
#> sample_11 sample_12 sample_13 sample_14
#> Lachnospiraceae bacterium 91 1731 22 12
#> Dubosiella newyorkensis 113 210 120 460
#> Parasutterella excrementihominis 246 255 393 8
#> Barnesiella sp. CU968 393 130 25 15
#> Bacteroides thetaiotaomicron 75 82 77 371
#> Lachnospiraceae bacterium 28-4 42 272 0 3
#> sample_15 sample_16 sample_17 sample_18
#> Lachnospiraceae bacterium 221 289 11 35
#> Dubosiella newyorkensis 117 105 114 737
#> Parasutterella excrementihominis 386 120 385 7
#> Barnesiella sp. CU968 417 191 138 132
#> Bacteroides thetaiotaomicron 119 91 78 76
#> Lachnospiraceae bacterium 28-4 671 25 3 3
#> sample_19 sample_20 sample_21 sample_22
#> Lachnospiraceae bacterium 168 1907 42 103
#> Dubosiella newyorkensis 728 123 245 220
#> Parasutterella excrementihominis 4 433 4 117
#> Barnesiella sp. CU968 123 116 213 284
#> Bacteroides thetaiotaomicron 75 116 513 582
#> Lachnospiraceae bacterium 28-4 14 461 3 5
#> sample_23 sample_24
#> Lachnospiraceae bacterium 136 28
#> Dubosiella newyorkensis 402 57
#> Parasutterella excrementihominis 10 1
#> Barnesiella sp. CU968 123 420
#> Bacteroides thetaiotaomicron 81 721
#> Lachnospiraceae bacterium 28-4 8 4
#make sample_info
sample_info <- data.frame(sample = colnames(x), group = c(rep("group1", 8), rep("group2", 8), rep("group3", 8)))
sample_info
#> sample group
#> 1 sample_1 group1
#> 2 sample_2 group1
#> 3 sample_3 group1
#> 4 sample_4 group1
#> 5 sample_5 group1
#> 6 sample_6 group1
#> 7 sample_7 group1
#> 8 sample_8 group1
#> 9 sample_9 group2
#> 10 sample_10 group2
#> 11 sample_11 group2
#> 12 sample_12 group2
#> 13 sample_13 group2
#> 14 sample_14 group2
#> 15 sample_15 group2
#> 16 sample_16 group2
#> 17 sample_17 group3
#> 18 sample_18 group3
#> 19 sample_19 group3
#> 20 sample_20 group3
#> 21 sample_21 group3
#> 22 sample_22 group3
#> 23 sample_23 group3
#> 24 sample_24 group3
#make sds
sds <- make_sds(x, colData = sample_info, design = ~ group)
sds <- sds_analyze(sds)
#> sorting data 100% done...
#> Using one way anova
#> anova 100% done
#> Tukey test 100% done
#> Calculating fold changes...
#> Welch test 100% done
#> DunnettT3 test 100% done
#> Calculating fold changes...
#> Kruskal-Wallis Test 63% done...Kruskal-Wallis Test 100% done
#> Dunn Test 63% done...Dunn Test 100% done
#> Calculating fold changes...
Notice here that statomatic tells us that it’s using a one way anova to analyze the normal-equal variance portion of the data. This is because of our specified design.
res <- get_results(sds)
head(res)
#> Pval Pval_group1_vs_group2
#> Parasutterella excrementihominis 0.1991824896 0.8062039
#> Muricaecibacterium torontonense 0.0681265365 0.5464864
#> Muribaculaceae bacterium 0.0010328865 0.4923547
#> Sangeribacter muris 0.3985184169 0.8117231
#> Anaerotruncus colihominis 0.6083336357 0.7787006
#> Bacteroides sp. CAG:927 0.0009232984 1.0000000
#> log2FoldChange_group1_vs_group2
#> Parasutterella excrementihominis -0.3011695
#> Muricaecibacterium torontonense -0.5635121
#> Muribaculaceae bacterium 0.6818240
#> Sangeribacter muris 0.2115041
#> Anaerotruncus colihominis 0.4340958
#> Bacteroides sp. CAG:927 0.0000000
#> Pval_group1_vs_group3
#> Parasutterella excrementihominis 0.461211401
#> Muricaecibacterium torontonense 0.357348922
#> Muribaculaceae bacterium 0.014194068
#> Sangeribacter muris 0.727643346
#> Anaerotruncus colihominis 0.949127148
#> Bacteroides sp. CAG:927 0.002511996
#> log2FoldChange_group1_vs_group3
#> Parasutterella excrementihominis 0.8531586
#> Muricaecibacterium torontonense 1.4364879
#> Muribaculaceae bacterium -1.0093379
#> Sangeribacter muris -0.2250666
#> Anaerotruncus colihominis -0.1610624
#> Bacteroides sp. CAG:927 -1.6455040
#> Pval_group2_vs_group3
#> Parasutterella excrementihominis 0.182048409
#> Muricaecibacterium torontonense 0.055825056
#> Muribaculaceae bacterium 0.000975412
#> Sangeribacter muris 0.366905054
#> Anaerotruncus colihominis 0.593457381
#> Bacteroides sp. CAG:927 0.002511996
#> log2FoldChange_group2_vs_group3 group1_mean
#> Parasutterella excrementihominis 1.1543281 217.000
#> Muricaecibacterium torontonense 2.0000000 163.750
#> Muribaculaceae bacterium -1.6911619 38.500
#> Sangeribacter muris -0.4365707 19.250
#> Anaerotruncus colihominis -0.5951583 15.875
#> Bacteroides sp. CAG:927 -1.6455040 8.750
#> group2_mean group3_mean test
#> Parasutterella excrementihominis 267.375 120.125 tukey-test
#> Muricaecibacterium torontonense 242.000 60.500 tukey-test
#> Muribaculaceae bacterium 24.000 77.500 tukey-test
#> Sangeribacter muris 16.625 22.500 tukey-test
#> Anaerotruncus colihominis 11.750 17.750 tukey-test
#> Bacteroides sp. CAG:927 8.750 27.375 tukey-test