英文:
Summary statistics table in R with multiple vertical variables, subvariables, panels, subpanels, etc
问题
抱歉,以下是您提供的文本的翻译:
抱歉问题比较笼统,但我尝试使用stargazer和其他包,但仍然无法构建R中的高级摘要统计表。我正在使用以下数据集:
> str(df_All)
tibble [5,064 × 29] (S3: tbl_df/tbl/data.frame)
$ Net_IRR : num [1:5064] 15.9 1.75 46 20 18.4 ...
$ Age : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Age : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Sequence : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Fund_Sequence : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Size : num [1:5064] 50 46 423 96.9 81 ...
$ Ln_Fund_Size : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
$ Nr_Funds : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ HHI_Industry : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
$ HHI_Region : num [1:5064] 1 1 1 1 1 ...
$ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
$ GDP_Growth : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
$ Net_Multiple : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
$ Ln_Fund_Size^2 : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
$ Size_Q1 : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
$ Size_Q2 : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
$ Size_Q3 : num [1:5064] 211 206 756 624 302 ...
$ Size_Q4 : num [1:5064] 1000 1500 6114 5887 2600 ...
$ Size_Spline_1 : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
$ Size_Spline_2 : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
$ Size_Spline_3 : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
$ Size_Spline_4 : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Dummy_Industry : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
$ Dummy_Region : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Fund_ID : num [1:5064] 8360 3491 5576 48689 6016 ...
$ Vintage_Year : num [1:5064] 2002 2004 2000 1997 2006 ...
$ Asset_Class : chr [1:5064] "Venture Capital" "Venture Capital" "Private Equity" "Private Equity" ...
$ Region_Focus : chr [1:5064] "North America" "North America" "North America" "Europe" ...
我想要构建一个带有Latex格式的摘要统计表。数据应该按照以下垂直/水平分组/子组来汇总:
- 垂直维度:
- 基金数量
- 基金规模(百万美元)
- IRR(%)
- 多样性(x)
- 垂直子维度(对于每个变量):
- 中位数
- 平均值
- 最小值
- 最大值
- 标准差
- 水平面板(要分成子样本的数据):
- 整体样本
- 私募股权
- 私人债务
- 房地产
- 基础设施
- 水平子维度(对于每个面板):
- 区域关注
- 北美
- 欧洲
- 其他(计算所有其他地区)
- 基金规模
- <$100百万
- $100到$500百万
- $500百万到$10亿
- 超过$10亿
- 基金序列
- 1
- 2-3
- 4-5
- 超过5
- #基金
- 年龄
- HHI行业
- HHI地区
- 区域关注
希望我的问题清晰,非常感谢任何帮助!
您期望的输出大致如下:
英文:
Apologies for the rather general question, but I tried to use stargazer and other packages and I still cannot work my way around to build an advanced summary statistics table in R. I am working with the following dataset:
> str(df_All)
tibble [5,064 × 29] (S3: tbl_df/tbl/data.frame)
$ Net_IRR : num [1:5064] 15.9 1.75 46 20 18.4 ...
$ Age : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Age : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Sequence : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Fund_Sequence : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Size : num [1:5064] 50 46 423 96.9 81 ...
$ Ln_Fund_Size : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
$ Nr_Funds : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ HHI_Industry : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
$ HHI_Region : num [1:5064] 1 1 1 1 1 ...
$ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
$ GDP_Growth : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
$ Net_Multiple : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
$ Ln_Fund_Size^2 : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
$ Size_Q1 : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
$ Size_Q2 : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
$ Size_Q3 : num [1:5064] 211 206 756 624 302 ...
$ Size_Q4 : num [1:5064] 1000 1500 6114 5887 2600 ...
$ Size_Spline_1 : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
$ Size_Spline_2 : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
$ Size_Spline_3 : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
$ Size_Spline_4 : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Dummy_Industry : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
$ Dummy_Region : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Fund_ID : num [1:5064] 8360 3491 5576 48689 6016 ...
$ Vintage_Year : num [1:5064] 2002 2004 2000 1997 2006 ...
$ Asset_Class : chr [1:5064] "Venture Capital" "Venture Capital" "Private Equity" "Private Equity" ...
$ Region_Focus : chr [1:5064] "North America" "North America" "North America" "Europe" ...
I would like to build a summary statistics table with a Latex format. The data should be aggregated by the following vertical/horizontal groups/sub groups:
- Vertical dimensions:
- Number of funds
- Fund size ($mn)
- IRR (%)
- Multiple (x)
- Vertical subdimensions (for each variable):
- Median
- Mean
- Min
- Max
- Standard Deviation
- Horizontal panels (data to divide in sub samples):
- Whole sample
- Private Equity
- Private Debt
- Real Estate
- Infrastructure
- Horizontal subdimensions (for each panel):
- Regional focus
- North America
- Europe
- Other (counting all other regions)
- Fund size
- < $100 mn
- 100 to $500 mn
- $500mn to $1bn
- more than $1bn
- Fund sequence
- 1
- 2–3
- 4-5
- more than 5
- #Funds
- Age
- HHI Industry
- HHI Region
- Regional focus
I hope my issue is clear and would highly appreciate any help!
The aimed output is something along these lines:
答案1
得分: 1
我推荐modelsummary
包中的datasummary
。您可以轻松地拆分垂直和水平。如果您需要更多帮助,请提供您数据的最小示例。当您提供数据时,更容易复制您所需的内容。
library(modelsummary)
data(mtcars)
datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD),
data = mtcars)
[1]: https://i.stack.imgur.com/uNTey.png
<details>
<summary>英文:</summary>
I recommend the `datasummary` from `modelsummary` package. You can split vertical and horizontal easily. Please provide a minimal example of your data, if you need more help. It would be easier to replicate what you ask for, when you provide the data.
library(modelsummary)
data(mtcars)
datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD),
data = mtcars)
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/uNTey.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论