2023年2月19日 14:13:45go评论83阅读模式

英文:

r create observations from frequency counts

问题

我根据您提供的内容进行翻译如下：

我有基于三个变量 y，Col1，Col2 的频率计数，如下所示：

     Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5

我如何扩展这个表格，以便数据集的行数如 Col1，Col2和y 中的响应组合中的 n 列所示？

例如，数据集应该有13行 Col1=Good, Col2=Rich, y=1，8行 Col1=Good, Col2=Rich, y=0，以此类推。

英文:

I have frequency counts based on three variables y , Col1, Col2 as shown below

     Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5

How do I expand this table such that the dataset has number of rows, as indicated in column n for combination of responses in Col1, Col2 & y ?

For example the dataset should have 13 rows of Col1=Good, Col2=Rich, y=1, 8 rows of Col1=Good, Col2=Rich, y=0 so on.

答案1

得分: 1

使用rep函数重复行名，并用其结果进行子集操作。
在下面的第一个例子中，我明确创建了一个索引i，在第二个例子中，一行代码解决了这个问题。

另外，在第一个例子中，输出中的行重复了（正如所要求的），行名显示了哪些行是哪些行的副本。在第二个例子中，通过将行名设置为NULL，它们被重新创建为从1开始的连续数字。

df1 <- "Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5"
df1 <- read.table(text = df1, header = TRUE)

i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#>     Col1 Col2 y  n
#> 3   Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13

df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#>   Col1 Col2 y  n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13

^{创建于2023-02-19，使用reprex v2.0.2}

英文:

Use rep to repeat the row names and subset with its result.
In the first example below I explicitly create an index i, in the second a one-liner solves the problem.

Also, in the first example the output duplicates (as asked for) rows and the row names show which rows are duplicates of which. In the second example by setting the row names to NULL they are recreated to become consecutive numbers starting at 1.

df1 &lt;- &quot;Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5&quot;
df1 &lt;- read.table(text = df1, header = TRUE)

i &lt;- rep(row.names(df1), df1$n)
df2 &lt;- df1[i, ]
head(df2)
#&gt;     Col1 Col2 y  n
#&gt; 3   Good Rich 1 13
#&gt; 3.1 Good Rich 1 13
#&gt; 3.2 Good Rich 1 13
#&gt; 3.3 Good Rich 1 13
#&gt; 3.4 Good Rich 1 13
#&gt; 3.5 Good Rich 1 13

df2 &lt;- df1[rep(row.names(df1), df1$n), ]
row.names(df2) &lt;- NULL
head(df2)
#&gt;   Col1 Col2 y  n
#&gt; 1 Good Rich 1 13
#&gt; 2 Good Rich 1 13
#&gt; 3 Good Rich 1 13
#&gt; 4 Good Rich 1 13
#&gt; 5 Good Rich 1 13
#&gt; 6 Good Rich 1 13

<sup>Created on 2023-02-19 with reprex v2.0.2</sup>

答案2

得分: 1

你可以使用 uncount 函数：

tidyr::uncount(df, n)

       Col1 Col2 y
    1  Good Rich 1
    2  Good Rich 1
    3  Good Rich 1
    4  Good Rich 1
    5  Good Rich 1
    6  Good Rich 1
    7  Good Rich 1
    8  Good Rich 1
    9  Good Rich 1
    :   :    :   :
    :   :    :   :

问题是为什么你需要这样做？你意识到即使在计数之前，你仍然可以分析数据吧？如果每行有数百万次计数，取消计数数据是不明智的。

英文:

You could use uncount:

tidyr::uncount(df,n)

   Col1 Col2 y
1  Good Rich 1
2  Good Rich 1
3  Good Rich 1
4  Good Rich 1
5  Good Rich 1
6  Good Rich 1
7  Good Rich 1
8  Good Rich 1
9  Good Rich 1
:   :    :   :
:   :    :   :

The question is why do you need this? You do realize you can still analyze the data the way it is before the counts. What if there were millions of counts for each row? It will not be wise to uncount the data.

答案3

得分: 1

这里是一个使用splitstackshape包中的expandRows函数的替代方法：

library(splitstackshape)

expandRows(df, "n")

   Col1 Col2 y
3    Good Rich 1
3.1  Good Rich 1
3.2  Good Rich 1
3.3  Good Rich 1
3.4  Good Rich 1
3.5  Good Rich 1
3.6  Good Rich 1
3.7  Good Rich 1
3.8  Good Rich 1
....

注意：代码部分不进行翻译。

英文:

Here is an alternative using expandRows function from splitstackshape package:

library(splitstackshape)

expandRows(df, &quot;n&quot;)

   Col1 Col2 y
3    Good Rich 1
3.1  Good Rich 1
3.2  Good Rich 1
3.3  Good Rich 1
3.4  Good Rich 1
3.5  Good Rich 1
3.6  Good Rich 1
3.7  Good Rich 1
3.8  Good Rich 1
....

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从频率计数创建观察

问题

答案1

答案2

答案3

purrr::map函数为什么不能正确地将一个函数映射到拆分数据框的每个部分？

基于唯一观察结果筛选数据表输出

N个样本使用rnorm()函数生成。

返回每次旋转的阿基米德螺旋弯曲给定臂间距和总长度的弧长

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论