2023年7月3日 21:33:44go评论113阅读模式

英文:

Summary statistics table in R with multiple vertical variables, subvariables, panels, subpanels, etc

问题

抱歉，以下是您提供的文本的翻译：

抱歉问题比较笼统，但我尝试使用stargazer和其他包，但仍然无法构建R中的高级摘要统计表。我正在使用以下数据集：

&gt; str(df_All)
tibble [5,064 &#215; 29] (S3: tbl_df/tbl/data.frame)
 $ Net_IRR              : num [1:5064] 15.9 1.75 46 20 18.4 ...
 $ Age                  : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Ln_Age               : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Fund_Sequence        : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Ln_Fund_Sequence     : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Fund_Size            : num [1:5064] 50 46 423 96.9 81 ...
 $ Ln_Fund_Size         : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
 $ Nr_Funds             : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ HHI_Industry         : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
 $ HHI_Region           : num [1:5064] 1 1 1 1 1 ...
 $ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
 $ GDP_Growth           : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
 $ Net_Multiple         : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
 $ Ln_Fund_Size^2       : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
 $ Size_Q1              : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
 $ Size_Q2              : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
 $ Size_Q3              : num [1:5064] 211 206 756 624 302 ...
 $ Size_Q4              : num [1:5064] 1000 1500 6114 5887 2600 ...
 $ Size_Spline_1        : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
 $ Size_Spline_2        : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
 $ Size_Spline_3        : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
 $ Size_Spline_4        : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
 $ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Dummy_Industry       : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
 $ Dummy_Region         : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
 $ Fund_ID              : num [1:5064] 8360 3491 5576 48689 6016 ...
 $ Vintage_Year         : num [1:5064] 2002 2004 2000 1997 2006 ...
 $ Asset_Class          : chr [1:5064] &quot;Venture Capital&quot; &quot;Venture Capital&quot; &quot;Private Equity&quot; &quot;Private Equity&quot; ...
 $ Region_Focus         : chr [1:5064] &quot;North America&quot; &quot;North America&quot; &quot;North America&quot; &quot;Europe&quot; ...

我想要构建一个带有Latex格式的摘要统计表。数据应该按照以下垂直/水平分组/子组来汇总：

垂直维度：
- 基金数量
- 基金规模（百万美元）
- IRR（%）
- 多样性（x）
垂直子维度（对于每个变量）：
- 中位数
- 平均值
- 最小值
- 最大值
- 标准差
水平面板（要分成子样本的数据）：
- 整体样本
- 私募股权
- 私人债务
- 房地产
- 基础设施
水平子维度（对于每个面板）：
- 区域关注
  - 北美
  - 欧洲
  - 其他（计算所有其他地区）
- 基金规模
  - <$100百万
  - $100到$500百万
  - $500百万到$10亿
  - 超过$10亿
- 基金序列
  - 1
  - 2-3
  - 4-5
  - 超过5
- #基金
- 年龄
- HHI行业
- HHI地区

希望我的问题清晰，非常感谢任何帮助！
您期望的输出大致如下：

点击这里查看图像描述

英文:

Apologies for the rather general question, but I tried to use stargazer and other packages and I still cannot work my way around to build an advanced summary statistics table in R. I am working with the following dataset:

&gt; str(df_All)
tibble [5,064 &#215; 29] (S3: tbl_df/tbl/data.frame)
$ Net_IRR              : num [1:5064] 15.9 1.75 46 20 18.4 ...
$ Age                  : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Age               : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Sequence        : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Ln_Fund_Sequence     : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Fund_Size            : num [1:5064] 50 46 423 96.9 81 ...
$ Ln_Fund_Size         : num [1:5064] 3.91 3.83 6.05 4.57 4.39 ...
$ Nr_Funds             : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ HHI_Industry         : num [1:5064] 0.427 0.243 0.36 0.333 1 ...
$ HHI_Region           : num [1:5064] 1 1 1 1 1 ...
$ Stock_Market_Returns : num [1:5064] 0.11936 -0.00711 -0.00643 -0.03869 -0.01931 ...
$ GDP_Growth           : num [1:5064] 0.0284 0.0245 0.0261 0.0304 0.0104 ...
$ Net_Multiple         : num [1:5064] 3.3 1.09 4.04 2.73 1.95 ...
$ Ln_Fund_Size^2       : num [1:5064] 15.3 14.7 36.6 20.9 19.3 ...
$ Size_Q1              : num [1:5064] 41.5 42.5 123.8 109.8 85.5 ...
$ Size_Q2              : num [1:5064] 125.8 92.8 325.5 232 177.3 ...
$ Size_Q3              : num [1:5064] 211 206 756 624 302 ...
$ Size_Q4              : num [1:5064] 1000 1500 6114 5887 2600 ...
$ Size_Spline_1        : num [1:5064] 0 0 0 1 1 1 0 1 0 0 ...
$ Size_Spline_2        : num [1:5064] 1 1 0 0 0 0 0 0 1 0 ...
$ Size_Spline_3        : num [1:5064] 0 0 1 0 0 0 1 0 0 1 ...
$ Size_Spline_4        : num [1:5064] 0 0 0 0 0 0 0 0 0 0 ...
$ Dummy_First_Time_Fund: num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Dummy_Industry       : num [1:5064] 1 0 0 0 1 0 0 1 1 0 ...
$ Dummy_Region         : num [1:5064] 1 1 1 1 1 1 1 1 1 1 ...
$ Fund_ID              : num [1:5064] 8360 3491 5576 48689 6016 ...
$ Vintage_Year         : num [1:5064] 2002 2004 2000 1997 2006 ...
$ Asset_Class          : chr [1:5064] &quot;Venture Capital&quot; &quot;Venture Capital&quot; &quot;Private Equity&quot; &quot;Private Equity&quot; ...
$ Region_Focus         : chr [1:5064] &quot;North America&quot; &quot;North America&quot; &quot;North America&quot; &quot;Europe&quot; ...

I would like to build a summary statistics table with a Latex format. The data should be aggregated by the following vertical/horizontal groups/sub groups:

Vertical dimensions:
- Number of funds
- Fund size ($mn)
- IRR (%)
- Multiple (x)
Vertical subdimensions (for each variable):
- Median
- Mean
- Min
- Max
- Standard Deviation
Horizontal panels (data to divide in sub samples):
- Whole sample
- Private Equity
- Private Debt
- Real Estate
- Infrastructure
Horizontal subdimensions (for each panel):
- Regional focus
  - North America
  - Europe
  - Other (counting all other regions)
- Fund size
  - < $100 mn
  - 100 to $500 mn
  - $500mn to $1bn
  - more than $1bn
- Fund sequence
  - 1
  - 2–3
  - 4-5
  - more than 5
- #Funds
- Age
- HHI Industry
- HHI Region

I hope my issue is clear and would highly appreciate any help!

The aimed output is something along these lines:

enter image description here

答案1

得分: 1

我推荐modelsummary包中的datasummary。您可以轻松地拆分垂直和水平。如果您需要更多帮助，请提供您数据的最小示例。当您提供数据时，更容易复制您所需的内容。

library(modelsummary)
data(mtcars)
datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD), 
            data = mtcars)


[1]: https://i.stack.imgur.com/uNTey.png
<details>
<summary>英文:</summary>
I recommend the `datasummary` from `modelsummary` package. You can split vertical and horizontal easily. Please provide a minimal example of your data, if you need more help. It would be easier to replicate what you ask for, when you provide the data. 
library(modelsummary)
data(mtcars)
datasummary(factor(cyl) * (mpg + drat) ~ factor(vs)*(Mean + Min + Max + SD), 
data = mtcars)
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/uNTey.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中生成带有多个垂直变量、子变量、面板、子面板等的汇总统计表。

问题

答案1

在ggplot2中并排绘制因子。

我的Python代码在beamer幻灯片中显示不好。

如何/使用哪种格式创建具有精确几何/布局的多页可打印文档？

连接类别时间序列的相邻点 – ggplot

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。