摘要:使用 ggbetweenstats() 包快捷的绘制含有统计检验的发表级别的图。


ggbetweenstats
函数的设计目的是为了方便数据探索,以及制作高度可定制的适合发表的图形,如果需要的话,还可以在图形中包含相关的统计细节。

首先,以下是一些你可能想要使用 ggbetweenstats 的情况:

  • 检查连续变量是否在多个组 / 条件之间有所不同

  • 通过可视化方式比较分布

# 通过 ggbetweenstats 比较两个不同的组

为了说明这个函数的用法,我们将在这个示例中使用 gapminder
数据集。这个数据集提供了每个 142 个国家从 1952 年到 2007 年每隔 5
年的预期寿命、人均 GDP 和人口的值(由 Gapminder
基金会提供)。让我们来看看数据结构。

r
library(gapminder)
Warning: package 'gapminder' was built under R version 4.3.2
r
dplyr::glimpse(gapminder::gapminder)
Rows: 1,704
Columns: 6
$ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

注意:在示例的剩余部分,我们将从分析中排除大洋洲,因为观察到的国家数量太少。

假设我们首先想要检查的是 2007
年一个大陆的国家的预期寿命分布。我们还想知道大陆之间的预期寿命平均差异是否具有统计学意义。

函数调用的最简单形式是:

r
library(ggstatsplot)
Warning: package 'ggstatsplot' was built under R version 4.3.2

You can cite this package as:
     Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
     Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
r
ggbetweenstats(
  data = dplyr::filter(gapminder::gapminder, year == 2007, continent != "Oceania"),
  x = continent,
  y = lifeExp
)

注意:

  • 该函数会根据分组变量的水平数自动决定是使用独立样本 t 检验(对于 2 组)还是单因素方差分析(3 组或更多组)。

  • 该函数的输出是一个 ggplot 对象,这意味着它可以用 ggplot2 函数进一步修改。

从图中可以看出,该函数默认返回贝叶斯因子作为检验结果。如果零假设不能用零假设显著性检验(NHST)方法拒绝,贝叶斯方法可以帮助评估支持零假设的证据。

默认情况下,显示自然对数,因为贝叶斯因子的值有时可能非常大。将值放在对数尺度上也使得比较支持备择假设和零假设的证据变得容易。

我们可以通过使用 ggbetweenstats 中的许多可选参数,使输出更具美感和信息量。我们将添加标题和说明,以及更好的 x 轴和 y 轴标签。我们可以并将改变整体主题以及正在使用的配色。

r
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.2
r
library(ggstatsplot)
ggbetweenstats(
  data = dplyr::filter(gapminder, year == 2007, continent != "Oceania"),
  x = continent, ## grouping/independent variable
  y = lifeExp, ## dependent variables
  type = "robust", ## type of statistics
  xlab = "Continent", ## label for the x-axis
  ylab = "Life expectancy", ## label for the y-axis
  ## turn off messages
  ggtheme = ggplot2::theme_gray(), ## a different theme
  package = "yarrr", ## package from which color palette is to be taken
  palette = "info2", ## choosing a different color palette
  title = "Comparison of life expectancy across continents (Year: 2007)",
  caption = "Source: Gapminder Foundation"
) + ## modifying the plot further
  ggplot2::scale_y_continuous(
    limits = c(35, 85),
    breaks = seq(from = 35, to = 85, by = 5)
  )
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

从效应量(偏 η 平方)为 0.635
可以看出,各大洲的平均预期寿命存在很大的差异。重要的是,这个图也帮助我们了解任何给定大洲内的分布。例如,尽管亚洲国家的表现比非洲国家好得多,但阿富汗在亚洲大陆的平均水平特别低,可能反映了战争和政治动荡的影响。

到目前为止,我们只使用了经典的参数检验和箱形小提琴图,但我们也可以使用其他可用的选项:

  • 测试类型参数也接受以下缩写: “p” (参数), “np” (非参数), “r” (稳健), “bf” (贝叶斯因子)。

  • 也可以修改要显示的图的类型( “box”“violin” ,或 “boxviolin” )。

  • 配色可以被修改。

让我们使用 combine_plots 函数从四个单独的图中制作一个图,以展示所有这些选项。让我们比较所有国家在可用数据的第一年和最后一年(1957
年和 2007 年)的预期寿命。我们将逐一生成图,然后使用 combine_plots
将它们合并到一个带有一些公共标签的图中。

r
library(ggplot2)
library(ggstatsplot)
## selecting subset of the data
df_year <- dplyr::filter(gapminder::gapminder, year == 2007 | year == 1957)
p1 <- ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year",
  ylab = "Life expectancy",
  # to remove violin plot
  violin.args = list(width = 0),
  type = "p",
  conf.level = 0.99,
  title = "Parametric test",
  package = "ggsci",
  palette = "nrc_npg"
)
p2 <- ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year",
  ylab = "Life expectancy",
  # to remove box plot
  boxplot.args = list(width = 0),
  type = "np",
  conf.level = 0.99,
  title = "Non-parametric Test",
  package = "ggsci",
  palette = "uniform_startrek"
)
p3 <- ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year",
  ylab = "Life expectancy",
  type = "r",
  conf.level = 0.99,
  title = "Robust Test",
  tr = 0.005,
  package = "wesanderson",
  palette = "Royal2",
  k = 3
)
## Bayes Factor for parametric t-test and boxviolin plot
p4 <- ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year",
  ylab = "Life expectancy",
  type = "bayes",
  violin.args = list(width = 0),
  boxplot.args = list(width = 0),
  point.args = list(alpha = 0),
  title = "Bayesian Test",
  package = "ggsci",
  palette = "nrc_npg"
)
## combining the individual plots into a single plot
combine_plots(
  list(p1, p2, p3, p4),
  plotgrid.args = list(nrow = 2),
  annotation.args = list(
    title = "Comparison of life expectancy between 1957 and 2007",
    caption = "Source: Gapminder Foundation"
  )
)

# 使用 grouped_ggbetweenstats 进行分组分析

如果我们想要同时按大洲和 1957 年至 2007 年进行分析怎么办?

ggstatsplot 为这种情况提供了一个特殊的辅助函数: grouped_ggbetweenstats 。这只是 combine_plots 函数的一个包装函数。它将 ggbetweenstats 应用于指定分组变量的所有水平,然后将单独的图合并成一个图。请注意,分组变量可以是任何东西:给定研究中的条件,研究样本中的组,不同的研究等。

让我们关注以下年份的同四个大洲:1967 年,1987 年,2007
年。此外,让我们进行成对比较,看看每一对大洲之间是否存在差异。

r
## select part of the dataset and use it for plotting
gapminder::gapminder %>%
  dplyr::filter(year %in% c(1967, 1987, 2007), continent != "Oceania") %>%
  grouped_ggbetweenstats(
    ## arguments relevant for ggbetweenstats
    x = continent,
    y = lifeExp,
    grouping.var = year,
    xlab = "Continent",
    ylab = "Life expectancy",
    pairwise.display = "significant", ## display only significant pairwise comparisons
    p.adjust.method = "fdr", ## adjust p-values for multiple tests using this method
    # ggtheme = ggthemes::theme_tufte(),
    package = "ggsci",
    palette = "default_jco",
    ## arguments relevant for combine_plots
    annotation.args = list(title = "Changes in life expectancy across continents (1967-2007)"),
    plotgrid.args = list(nrow = 3)
  )

从图中可以看出,尽管随着我们从 1967 年到 2007
年,所有大洲的预期寿命都在稳步提高,但这种改善并没有在所有大洲以相同的速度发生。此外,无论我们看哪一年,我们仍然发现大洲之间的预期寿命存在显著差异,这些差异在五十年间的一致性令人惊讶(基于观察到的效应大小)。

# 写作

如果你希望在出版物 / 报告中包含统计分析结果,理想的报告实践将是两种方法的结合:

  • ggstatsplot 方法,其中图包含关于统计模型的视觉和数字摘要

  • 标准的叙述方法,它提供了对报告的统计数据的解释。

r
ggbetweenstats(ToothGrowth, supp, len)

叙述性上下文可以作为图的标题或主文本来补充这个图。

Welch’s t-test revealed that, across 60 guinea pigs, although the
tooth length was higher when the animal received vitamin C via orange
juice as compared to via ascorbic acid, this effect was not
statistically significant. The effect size (g = 0.49) was medium, as
per Cohen’s (1988) conventions. The Bayes Factor for the same analysis
revealed that the data were 1.2 times more probable under the
alternative hypothesis as compared to the null hypothesis. This can be
considered weak evidence (Jeffreys, 1961) in favor of the alternative
hypothesis.

当函数执行单因素 ANOVA 而不是 t 检验时,可以遵循类似的风格。

# 工作环境

r
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 11 x64 (build 22621)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  Chinese (Simplified)_China.utf8
 ctype    Chinese (Simplified)_China.utf8
 tz       Asia/Hong_Kong
 date     2024-01-04
 pandoc   3.1.9 @ C:/Users/HANWAN~1/AppData/Local/Pandoc/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package          * version    date (UTC) lib source
   BayesFactor        0.9.12-4.5 2023-09-21 [1] CRAN (R 4.3.2)
   bayestestR         0.13.1     2023-04-07 [1] CRAN (R 4.3.2)
   BWStest            0.2.3      2023-10-10 [1] CRAN (R 4.3.2)
   cachem             1.0.8      2023-05-01 [1] CRAN (R 4.3.1)
   callr              3.7.3      2022-11-02 [1] CRAN (R 4.3.1)
   cli                3.6.1      2023-03-23 [1] CRAN (R 4.3.1)
   coda               0.19-4     2020-09-30 [1] CRAN (R 4.3.2)
   colorspace         2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
   correlation        0.8.4      2023-04-06 [1] CRAN (R 4.3.2)
   crayon             1.5.2      2022-09-29 [1] CRAN (R 4.3.1)
   datawizard         0.9.0      2023-09-15 [1] CRAN (R 4.3.2)
   devtools           2.4.5      2022-10-11 [1] CRAN (R 4.3.2)
   digest             0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
   dplyr              1.1.3      2023-09-03 [1] CRAN (R 4.3.2)
   effectsize         0.8.6      2023-09-14 [1] CRAN (R 4.3.2)
   ellipsis           0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
   evaluate           0.21       2023-05-05 [1] CRAN (R 4.3.1)
   fansi              1.0.4      2023-01-22 [1] CRAN (R 4.3.1)
   farver             2.1.1      2022-07-06 [1] CRAN (R 4.3.1)
   fastmap            1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
   fs                 1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
   gapminder        * 1.0.0      2023-03-10 [1] CRAN (R 4.3.2)
   generics           0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
   ggplot2          * 3.4.4      2023-10-12 [1] CRAN (R 4.3.2)
   ggrepel            0.9.3      2023-02-03 [1] CRAN (R 4.3.1)
   ggsignif           0.6.4      2022-10-13 [1] CRAN (R 4.3.2)
   ggstatsplot      * 0.12.1     2023-09-20 [1] CRAN (R 4.3.2)
   glue               1.6.2      2022-02-24 [1] CRAN (R 4.3.1)
   gmp                0.7-2      2023-07-01 [1] CRAN (R 4.3.2)
   gtable             0.3.3      2023-03-21 [1] CRAN (R 4.3.1)
   htmltools          0.5.5      2023-03-23 [1] CRAN (R 4.3.1)
   htmlwidgets        1.6.2      2023-03-17 [1] CRAN (R 4.3.1)
   httpuv             1.6.11     2023-05-11 [1] CRAN (R 4.3.1)
   insight            0.19.6     2023-10-12 [1] CRAN (R 4.3.2)
   jsonlite           1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
   knitr              1.45       2023-10-30 [1] CRAN (R 4.3.2)
   kSamples           1.2-10     2023-10-07 [1] CRAN (R 4.3.2)
   labeling           0.4.2      2020-10-20 [1] CRAN (R 4.3.0)
   later              1.3.1      2023-05-02 [1] CRAN (R 4.3.1)
   lattice            0.21-8     2023-04-05 [2] CRAN (R 4.3.1)
   lifecycle          1.0.3      2022-10-07 [1] CRAN (R 4.3.1)
   magrittr           2.0.3      2022-03-30 [1] CRAN (R 4.3.1)
   MASS               7.3-60     2023-05-04 [2] CRAN (R 4.3.1)
   Matrix             1.6-1.1    2023-09-18 [1] CRAN (R 4.3.2)
   MatrixModels       0.5-2      2023-07-10 [1] CRAN (R 4.3.2)
   memoise            2.0.1      2021-11-26 [1] CRAN (R 4.3.1)
   mime               0.12       2021-09-28 [1] CRAN (R 4.3.0)
   miniUI             0.1.1.1    2018-05-18 [1] CRAN (R 4.3.1)
   multcompView       0.1-9      2023-04-09 [1] CRAN (R 4.3.2)
   munsell            0.5.0      2018-06-12 [1] CRAN (R 4.3.1)
   mvtnorm            1.2-3      2023-08-25 [1] CRAN (R 4.3.2)
   paletteer          1.5.0      2022-10-19 [1] CRAN (R 4.3.2)
   parameters         0.21.3     2023-11-02 [1] CRAN (R 4.3.2)
   patchwork          1.1.2      2022-08-19 [1] CRAN (R 4.3.1)
   pbapply            1.7-2      2023-06-27 [1] CRAN (R 4.3.1)
   performance        0.10.8     2023-10-30 [1] CRAN (R 4.3.2)
   pillar             1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
   pkgbuild           1.4.2      2023-06-26 [1] CRAN (R 4.3.1)
   pkgconfig          2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
   pkgload            1.3.2.1    2023-07-08 [1] CRAN (R 4.3.1)
   plyr               1.8.8      2022-11-11 [1] CRAN (R 4.3.1)
   PMCMRplus          1.9.8      2023-10-09 [1] CRAN (R 4.3.2)
   prettyunits        1.1.1      2020-01-24 [1] CRAN (R 4.3.1)
   prismatic          1.1.1      2022-08-15 [1] CRAN (R 4.3.2)
   processx           3.8.2      2023-06-30 [1] CRAN (R 4.3.1)
   profvis            0.3.8      2023-05-02 [1] CRAN (R 4.3.1)
   promises           1.2.0.1    2021-02-11 [1] CRAN (R 4.3.1)
   ps                 1.7.5      2023-04-18 [1] CRAN (R 4.3.1)
   purrr              1.0.2      2023-08-10 [1] CRAN (R 4.3.2)
   R6                 2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
   Rcpp               1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
 D RcppParallel       5.1.7      2023-02-27 [1] CRAN (R 4.3.2)
   rematch2           2.1.2      2020-05-01 [1] CRAN (R 4.3.1)
   remotes            2.4.2.1    2023-07-18 [1] CRAN (R 4.3.1)
   reshape            0.8.9      2022-04-12 [1] CRAN (R 4.3.2)
   rlang              1.1.1      2023-04-28 [1] CRAN (R 4.3.1)
   rmarkdown          2.23       2023-07-01 [1] CRAN (R 4.3.1)
   Rmpfr              0.9-3      2023-08-08 [1] CRAN (R 4.3.1)
   rstantools         2.3.1.1    2023-07-18 [1] CRAN (R 4.3.2)
   rstudioapi         0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
   scales             1.2.1      2022-08-20 [1] CRAN (R 4.3.1)
   sessioninfo        1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
   shiny              1.7.4.1    2023-07-06 [1] CRAN (R 4.3.1)
   statsExpressions   1.5.2      2023-09-12 [1] CRAN (R 4.3.2)
   stringi            1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
   stringr            1.5.0      2022-12-02 [1] CRAN (R 4.3.1)
   SuppDists          1.1-9.7    2022-01-03 [1] CRAN (R 4.3.2)
   tibble             3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
   tidyr              1.3.0      2023-01-24 [1] CRAN (R 4.3.1)
   tidyselect         1.2.0      2022-10-10 [1] CRAN (R 4.3.1)
   urlchecker         1.0.1      2021-11-30 [1] CRAN (R 4.3.1)
   usethis            2.2.2      2023-07-06 [1] CRAN (R 4.3.1)
   utf8               1.2.3      2023-01-31 [1] CRAN (R 4.3.1)
   vctrs              0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
   withr              2.5.0      2022-03-03 [1] CRAN (R 4.3.1)
   WRS2               1.1-5      2023-10-30 [1] CRAN (R 4.3.2)
   xfun               0.39       2023-04-20 [1] CRAN (R 4.3.1)
   xtable             1.8-4      2019-04-21 [1] CRAN (R 4.3.1)
   yaml               2.3.7      2023-01-23 [1] CRAN (R 4.3.0)
   zeallot            0.1.0      2018-01-28 [1] CRAN (R 4.3.2)

 [1] C:/Users/Han Wang/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.1/library

 D ── DLL MD5 mismatch, broken installation.

──────────────────────────────────────────────────────────────────────────────
更新于