在R中生成带有多个垂直变量、子变量、面板、子面板等的汇总统计表。

huangapple go评论78阅读模式
英文:

Summary statistics table in R with multiple vertical variables, subvariables, panels, subpanels, etc

问题

抱歉,以下是您提供的文本的翻译:

抱歉问题比较笼统,但我尝试使用stargazer和其他包,但仍然无法构建R中的高级摘要统计表。我正在使用以下数据集:

> str(df_All)
tibble [5,064 × 29] (S3: tbl_df/tbl/data.frame)
 $ Net_IRR              : num [1:5064] 15.9 1.75 46 20 18.4 ...
 $ Age                  : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Ln_Age               : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Fund_Sequence        : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Ln_Fund_Sequence     : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Fund_Size            : num [1:5064] 50 46 423 96.9 81 ...
 $ Ln_Fund_Size         : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
 $ Nr_Funds             : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ HHI_Industry         : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
 $ HHI_Region           : num [1:5064] 1 1 1 1 1 ...
 $ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
 $ GDP_Growth           : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
 $ Net_Multiple         : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
 $ Ln_Fund_Size^2       : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
 $ Size_Q1              : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
 $ Size_Q2              : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
 $ Size_Q3              : num [1:5064] 211 206 756 624 302 ...
 $ Size_Q4              : num [1:5064] 1000 1500 6114 5887 2600 ...
 $ Size_Spline_1        : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
 $ Size_Spline_2        : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
 $ Size_Spline_3        : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
 $ Size_Spline_4        : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Dummy_Industry       : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
 $ Dummy_Region         : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Fund_ID              : num [1:5064] 8360 3491 5576 48689 6016 ...
 $ Vintage_Year         : num [1:5064] 2002 2004 2000 1997 2006 ...
 $ Asset_Class          : chr [1:5064] "Venture Capital" "Venture Capital" "Private Equity" "Private Equity" ...
 $ Region_Focus         : chr [1:5064] "North America" "North America" "North America" "Europe" ...

我想要构建一个带有Latex格式的摘要统计表。数据应该按照以下垂直/水平分组/子组来汇总:

  • 垂直维度:
    • 基金数量
    • 基金规模(百万美元)
    • IRR(%)
    • 多样性(x)
  • 垂直子维度(对于每个变量):
    • 中位数
    • 平均值
    • 最小值
    • 最大值
    • 标准差
  • 水平面板(要分成子样本的数据):
    • 整体样本
    • 私募股权
    • 私人债务
    • 房地产
    • 基础设施
  • 水平子维度(对于每个面板):
    • 区域关注
      • 北美
      • 欧洲
      • 其他(计算所有其他地区)
    • 基金规模
      • <$100百万
      • $100到$500百万
      • $500百万到$10亿
      • 超过$10亿
    • 基金序列
      • 1
      • 2-3
      • 4-5
      • 超过5
    • #基金
    • 年龄
    • HHI行业
    • HHI地区

希望我的问题清晰,非常感谢任何帮助!
您期望的输出大致如下:

点击这里查看图像描述

英文:

Apologies for the rather general question, but I tried to use stargazer and other packages and I still cannot work my way around to build an advanced summary statistics table in R. I am working with the following dataset:

&gt; str(df_All)
tibble [5,064 &#215; 29] (S3: tbl_df/tbl/data.frame)
$ Net_IRR              : num [1:5064] 15.9 1.75 46 20 18.4 ...
$ Age                  : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Age               : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Sequence        : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Fund_Sequence     : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Size            : num [1:5064] 50 46 423 96.9 81 ...
$ Ln_Fund_Size         : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
$ Nr_Funds             : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ HHI_Industry         : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
$ HHI_Region           : num [1:5064] 1 1 1 1 1 ...
$ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
$ GDP_Growth           : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
$ Net_Multiple         : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
$ Ln_Fund_Size^2       : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
$ Size_Q1              : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
$ Size_Q2              : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
$ Size_Q3              : num [1:5064] 211 206 756 624 302 ...
$ Size_Q4              : num [1:5064] 1000 1500 6114 5887 2600 ...
$ Size_Spline_1        : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
$ Size_Spline_2        : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
$ Size_Spline_3        : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
$ Size_Spline_4        : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Dummy_Industry       : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
$ Dummy_Region         : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Fund_ID              : num [1:5064] 8360 3491 5576 48689 6016 ...
$ Vintage_Year         : num [1:5064] 2002 2004 2000 1997 2006 ...
$ Asset_Class          : chr [1:5064] &quot;Venture Capital&quot; &quot;Venture Capital&quot; &quot;Private Equity&quot; &quot;Private Equity&quot; ...
$ Region_Focus         : chr [1:5064] &quot;North America&quot; &quot;North America&quot; &quot;North America&quot; &quot;Europe&quot; ...

I would like to build a summary statistics table with a Latex format. The data should be aggregated by the following vertical/horizontal groups/sub groups:

  • Vertical dimensions:
    • Number of funds
    • Fund size ($mn)
    • IRR (%)
    • Multiple (x)
  • Vertical subdimensions (for each variable):
    • Median
    • Mean
    • Min
    • Max
    • Standard Deviation
  • Horizontal panels (data to divide in sub samples):
    • Whole sample
    • Private Equity
    • Private Debt
    • Real Estate
    • Infrastructure
  • Horizontal subdimensions (for each panel):
    • Regional focus
      • North America
      • Europe
      • Other (counting all other regions)
    • Fund size
      • < $100 mn
      • 100 to $500 mn
      • $500mn to $1bn
      • more than $1bn
    • Fund sequence
      • 1
      • 2–3
      • 4-5
      • more than 5
    • #Funds
    • Age
    • HHI Industry
    • HHI Region

I hope my issue is clear and would highly appreciate any help!

The aimed output is something along these lines:

enter image description here

答案1

得分: 1

我推荐modelsummary包中的datasummary。您可以轻松地拆分垂直和水平。如果您需要更多帮助,请提供您数据的最小示例。当您提供数据时,更容易复制您所需的内容。

library(modelsummary)

data(mtcars)

datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD), 
            data = mtcars)

在R中生成带有多个垂直变量、子变量、面板、子面板等的汇总统计表。


[1]: https://i.stack.imgur.com/uNTey.png
<details>
<summary>英文:</summary>
I recommend the `datasummary` from `modelsummary` package. You can split vertical and horizontal easily. Please provide a minimal example of your data, if you need more help. It would be easier to replicate what you ask for, when you provide the data. 
library(modelsummary)
data(mtcars)
datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD), 
data = mtcars)
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/uNTey.png
</details>

huangapple
  • 本文由 发表于 2023年7月3日 21:33:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605246.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定