生成stargazer的子组摘要统计信息。

huangapple go评论96阅读模式
英文:

Generate summary statistics by subgroup with stargazer

问题

我尝试使用R中的stargazer生成Latex的摘要统计表。表格应该按三个子组(卢旺达/洪都拉斯/尼泊尔)对摘要统计进行排序。

当我仅为子组生成单独的表格时,它运行正常。我认为可能是国家变量引起的问题。

all_summary数据框的结构如下:

  1. structure(list(country = structure(c("Honduras", "Nepal", "Rwanda"), label = "Country", format.stata = "%8s"), headGender = structure(c(0, 1, 0), label = "head_gender", format.stata = "%9.0g"), femaleEduc = structure(c(1, 2, 2), label = "female_educ", format.stata = "%9.0g"), maleEduc = structure(c(1, 1, 2), label = "male_educ", format.stata = "%9.0g"), wVispeople = structure(c(0, 1, 0), label = "w_visitpeople", format.stata = "%9.0g"), wVismarket = structure(c(0, 1, 1), label = "w_vismarket", format.stata = "%9.0g"), wLeavevill = structure(c(0, 1, 0), label = "w_leavevill", format.stata = "%9.0g"), fridge = structure(c(1, 0, 0), label = "fridge_owned_desired", format.stata = "%9.0g"), radio = structure(c(1, 1, 1), label = "radio_owned_desired", format.stata = "%9.0g"), fan = structure(c(0, 0, 0), label = "fan_owned_desired", format.stata = "%9.0g"), pc = structure(c(0, 0, 0), label = "pc_owned_desired", format.stata = "%9.0g"), tv = structure(c(1, 0, 1), label = "tv_owned_desired", format.stata = "%9.0g"), minutesSolid = structure(c(3, 2, 448), label = "stoveuseminutes_solids", format.stata = "%9.0g"), minutesClean = structure(c(0, 0, 0), label = "stoveuseminutes_clean", format.stata = "%9.0g"), stoveClean = structure(c(0, 0, 0), label = "stove_clean", format.stata = "%9.0g")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"), label = "Written by R.")

以下是代码的样子:

  1. all_summary <- allcountries %>%
  2. select(Country, head_gender, female_educ, male_educ, w_visitpeople, w_vismarket, w_leavevill, fridge_owned_desired, radio_owned_desired, fan_owned_desired, pc_owned_desired, tv_owned_desired, stoveuseminutes_solids, stoveuseminutes_clean, stove_clean) %>%
  3. rename(country = Country,
  4. headGender = head_gender,
  5. femaleEduc = female_educ,
  6. maleEduc = male_educ,
  7. wVispeople = w_visitpeople,
  8. wVismarket = w_vismarket,
  9. wLeavevill = w_leavevill,
  10. fridge = fridge_owned_desired,
  11. radio = radio_owned_desired,
  12. fan = fan_owned_desired,
  13. pc = pc_owned_desired,
  14. tv = tv_owned_desired,
  15. minutesSolid = stoveuseminutes_solids,
  16. minutesClean = stoveuseminutes_clean,
  17. stoveClean = stove_clean)
  18. # 按国家分组
  19. all_summary_grouped <- all_summary %>% group_by(country)
  20. sumstats_all_grouped <-
  21. all_summary_grouped %>%
  22. summarise_each(funs(
  23. n = sum(!is.na(.)),
  24. min = min(., na.rm = TRUE),
  25. max = max(., na.rm = TRUE),
  26. mean = mean(., na.rm = TRUE)
  27. ))
  28. # 重塑数据
  29. sumstatsA <- sumstats_all_grouped %>%
  30. gather(stat, val) %>%
  31. separate(stat, into = c ("var", "stat"), sep = "_") %>%
  32. spread(stat, val) %>%
  33. select(var, n, min, max, mean)
  34. # 四舍五入
  35. sumstatsA = sumstatsA %>%
  36. mutate(mean = round(as.numeric(mean),2))
  37. # 生成表格
  38. stargazer(
  39. sumstatsA,
  40. summary = F,
  41. type = "text",
  42. digits = 2,
  43. header = F ,
  44. title = "Summary statistics for Honduras, Nepal and Rwanda",
  45. rownames = F,
  46. out = "Manuscript/Tables/SummaryAll_grouped.tex")

错误发生在# 重塑数据命令部分之后。
(在spread()中的错误:
! 输出的每一行必须由唯一的键组合标识。
键在171行之间共享:

  • 112, 113, 114
  • 91, 92, 93
  • 106, 107, 108)
英文:

I try to generate a summary statistics table for latex in R with stargazer. The table should contain the summary statistics sorted by three subgroups (Rwanda/Honduras/Nepal).

It worked out fine, when I did seperate tables only for the subgroup. I thought maybe the country variable is the problem.

The all_summary data frame looks like this:

  1. structure(list(country = structure(c(&quot;Honduras&quot;, &quot;Nepal&quot;, &quot;Rwanda&quot;
  2. ), label = &quot;Country&quot;, format.stata = &quot;%8s&quot;), headGender = structure(c(0,
  3. 1, 0), label = &quot;head_gender&quot;, format.stata = &quot;%9.0g&quot;), femaleEduc = structure(c(1,
  4. 2, 2), label = &quot;female_educ&quot;, format.stata = &quot;%9.0g&quot;), maleEduc = structure(c(1,
  5. 1, 2), label = &quot;male_educ&quot;, format.stata = &quot;%9.0g&quot;), wVispeople = structure(c(0,
  6. 1, 0), label = &quot;w_visitpeople&quot;, format.stata = &quot;%9.0g&quot;), wVismarket = structure(c(0,
  7. 1, 1), label = &quot;w_vismarket&quot;, format.stata = &quot;%9.0g&quot;), wLeavevill = structure(c(0,
  8. 1, 0), label = &quot;w_leavevill&quot;, format.stata = &quot;%9.0g&quot;), fridge = structure(c(1,
  9. 0, 0), label = &quot;fridge_owned_desired&quot;, format.stata = &quot;%9.0g&quot;),
  10. radio = structure(c(1, 1, 1), label = &quot;radio_owned_desired&quot;, format.stata = &quot;%9.0g&quot;),
  11. fan = structure(c(0, 0, 0), label = &quot;fan_owned_desired&quot;, format.stata = &quot;%9.0g&quot;),
  12. pc = structure(c(0, 0, 0), label = &quot;pc_owned_desired&quot;, format.stata = &quot;%9.0g&quot;),
  13. tv = structure(c(1, 0, 1), label = &quot;tv_owned_desired&quot;, format.stata = &quot;%9.0g&quot;),
  14. minutesSolid = structure(c(3, 2, 448), label = &quot;stoveuseminutes_solids&quot;, format.stata = &quot;%9.0g&quot;),
  15. minutesClean = structure(c(0, 0, 0), label = &quot;stoveuseminutes_clean&quot;, format.stata = &quot;%9.0g&quot;),
  16. stoveClean = structure(c(0, 0, 0), label = &quot;stove_clean&quot;, format.stata = &quot;%9.0g&quot;)), row.names = c(NA,
  17. -3L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), label = &quot;Written by R.&quot;)

This is how the code looks like:

  1. all_summary &lt;- allcountries %&gt;%
  2. select(Country, head_gender, female_educ, male_educ, w_visitpeople, w_vismarket, w_leavevill, fridge_owned_desired, radio_owned_desired, fan_owned_desired, pc_owned_desired, tv_owned_desired, stoveuseminutes_solids, stoveuseminutes_clean, stove_clean) %&gt;%
  3. rename(country = Country,
  4. headGender = head_gender,
  5. femaleEduc = female_educ,
  6. maleEduc = male_educ,
  7. wVispeople = w_visitpeople,
  8. wVismarket = w_vismarket,
  9. wLeavevill = w_leavevill,
  10. fridge = fridge_owned_desired,
  11. radio = radio_owned_desired,
  12. fan = fan_owned_desired,
  13. pc = pc_owned_desired,
  14. tv = tv_owned_desired,
  15. minutesSolid = stoveuseminutes_solids,
  16. minutesClean = stoveuseminutes_clean,
  17. stoveClean = stove_clean)
  18. #Group by country
  19. all_summary_grouped &lt;- all_summary %&gt;% group_by(country)
  20. sumstats_all_grouped &lt;-
  21. all_summary_grouped %&gt;%
  22. summarise_each(funs(
  23. n = sum(!is.na(.)),
  24. min = min(., na.rm = TRUE),
  25. max = max(., na.rm = TRUE),
  26. mean = mean(., na.rm = TRUE)
  27. ))
  28. #Reshape data
  29. sumstatsA &lt;- sumstats_all_grouped %&gt;%
  30. gather(stat, val) %&gt;%
  31. separate(stat, into = c (&quot;var&quot;, &quot;stat&quot;), sep = &quot;_&quot;) %&gt;%
  32. spread(stat, val) %&gt;%
  33. select(var, n, min, max, mean)
  34. #Round
  35. sumstatsA = sumstatsA %&gt;%
  36. mutate(mean = round(as.numeric(mean),2))
  37. #produce table
  38. stargazer(
  39. sumstatsA,
  40. summary = F,
  41. type = &quot;text&quot;,
  42. digits = 2,
  43. header = F ,
  44. title = &quot;Summary statistics for Honduras, Nepal and Rwanda&quot;,
  45. rownames = F,
  46. out = &quot;Manuscript/Tables/SummaryAll_grouped.tex&quot;)

The error happens after the #Reshape data command section
(Error in spread():
! Each row of output must be identified by a unique combination of keys.
Keys are shared for 171 rows:

  • 112, 113, 114
  • 91, 92, 93
  • 106, 107, 108
    (to be continued))

答案1

得分: 0

以下是一些最小的原始数据:

  1. library(tidyverse)
  2. data(mtcars)
  3. as_tibble(mtcars)

这是gtsummary包,它可以生成每个组(例如,cyl)的简单摘要统计信息,默认以HTML格式呈现(您可以对表格进行各种修改):

  1. library(gtsummary)
  2. mtcars %>% tbl_summary(by = cyl)

有选项可以将其打印为.tex文件:

  1. # 打印为.tex文件
  2. as_kable_extra(mtcars %>% tbl_summary(by = cyl), format = "latex")

以下是可以通过选项指定的各种统计数据,现在显示非缺失值的数量,均值,中位数,P25和P75以及从最小值到最大值的范围:

  1. mtcars %>% tbl_summary(by = am,
  2. type = all_continuous() ~ "continuous2",
  3. statistic = all_continuous() ~ c(
  4. "{N_nonmiss}",
  5. "{mean}",
  6. "{median} ({p25}, {p75})",
  7. "{min}, {max}"
  8. ))

请注意,以上是一些R代码的片段和描述,无法提供完整的交互式演示。如果您需要更多详细信息或特定翻译,请提出具体问题。

英文:

Here is some minimal raw data:

  1. library(tidyverse)
  2. data(mtcars)
  3. as_tibble(mtcars)
  4. # A tibble: 32 &#215; 11
  5. mpg cyl disp hp drat wt qsec vs am gear carb
  6. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  7. 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
  8. 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
  9. 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
  10. 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
  11. 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
  12. 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
  13. 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
  14. 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
  15. 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
  16. 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
  17. # … with 22 more rows
  18. # ℹ Use `print(n = ...)` to see more rows

This is the gtsummary package that produces easy summary statistics per group (e.g. cyl) in default html format (you can have various modifications for the table):

  1. library(gtsummary)
  2. mtcars %&gt;% tbl_summary(by = cyl)

生成stargazer的子组摘要统计信息。

There are options to print this as .tex:

  1. # print as tex
  2. as_kable_extra(mtcars %&gt;% tbl_summary(by = cyl), format = &quot;latex&quot;)
  3. \begin{tabular}{l|c|c|c}
  4. \hline
  5. \textbf{Characteristic} &amp; \textbf{4}, N = 11 &amp; \textbf{6}, N = 7 &amp; \textbf{8}, N = 14\\
  6. \hline
  7. mpg &amp; 26.0 (22.8, 30.4) &amp; 19.7 (18.6, 21.0) &amp; 15.2 (14.4, 16.2)\\
  8. \hline
  9. disp &amp; 108 (79, 121) &amp; 168 (160, 196) &amp; 350 (302, 390)\\
  10. \hline
  11. hp &amp; 91 (66, 96) &amp; 110 (110, 123) &amp; 192 (176, 241)\\
  12. \hline
  13. drat &amp; 4.08 (3.81, 4.16) &amp; 3.90 (3.35, 3.91) &amp; 3.12 (3.07, 3.22)\\
  14. \hline
  15. wt &amp; 2.20 (1.89, 2.62) &amp; 3.21 (2.82, 3.44) &amp; 3.76 (3.53, 4.01)\\
  16. \hline
  17. qsec &amp; 18.90 (18.56, 19.95) &amp; 18.30 (16.74, 19.17) &amp; 17.18 (16.10, 17.56)\\
  18. \hline
  19. vs &amp; 10 (91\%) &amp; 4 (57\%) &amp; 0 (0\%)\\
  20. \hline
  21. am &amp; 8 (73\%) &amp; 3 (43\%) &amp; 2 (14\%)\\
  22. \hline
  23. gear &amp; &amp; &amp; \\
  24. \hline
  25. \hspace{1em}3 &amp; 1 (9.1\%) &amp; 2 (29\%) &amp; 12 (86\%)\\
  26. \hline
  27. \hspace{1em}4 &amp; 8 (73\%) &amp; 4 (57\%) &amp; 0 (0\%)\\
  28. \hline
  29. \hspace{1em}5 &amp; 2 (18\%) &amp; 1 (14\%) &amp; 2 (14\%)\\
  30. \hline
  31. carb &amp; &amp; &amp; \\
  32. \hline
  33. \hspace{1em}1 &amp; 5 (45\%) &amp; 2 (29\%) &amp; 0 (0\%)\\
  34. \hline
  35. \hspace{1em}2 &amp; 6 (55\%) &amp; 0 (0\%) &amp; 4 (29\%)\\
  36. \hline
  37. \hspace{1em}3 &amp; 0 (0\%) &amp; 0 (0\%) &amp; 3 (21\%)\\
  38. \hline
  39. \hspace{1em}4 &amp; 0 (0\%) &amp; 4 (57\%) &amp; 6 (43\%)\\
  40. \hline
  41. \hspace{1em}6 &amp; 0 (0\%) &amp; 1 (14\%) &amp; 0 (0\%)\\
  42. \hline
  43. \hspace{1em}8 &amp; 0 (0\%) &amp; 0 (0\%) &amp; 1 (7.1\%)\\
  44. \hline
  45. \multicolumn{4}{l}{\rule{0pt}{1em}\textsuperscript{1} Median (IQR); n (\%)}\\
  46. \end{tabular}

Update

Here is a list of various statistics that you can specify via options. It now shows the number of non-missings, the mean, the median, p25 and p75 and the range from min to max.

  1. mtcars %&gt;% tbl_summary(by = am,
  2. type = all_continuous() ~ &quot;continuous2&quot;,
  3. statistic = all_continuous() ~ c(
  4. &quot;{N_nonmiss}&quot;,
  5. &quot;{mean}&quot;,
  6. &quot;{median} ({p25}, {p75})&quot;,
  7. &quot;{min}, {max}&quot;
  8. ))

生成stargazer的子组摘要统计信息。

huangapple
  • 本文由 发表于 2023年2月7日 03:59:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75366004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定