从频率计数创建观察

huangapple go评论54阅读模式
英文:

r create observations from frequency counts

问题

我根据您提供的内容进行翻译如下:

我有基于三个变量 y,Col1,Col2 的频率计数,如下所示:

     Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5

我如何扩展这个表格,以便数据集的行数如 Col1,Col2和y 中的响应组合中的 n 列所示?

例如,数据集应该有13行 Col1=Good, Col2=Rich, y=1,8行 Col1=Good, Col2=Rich, y=0,以此类推。

英文:

I have frequency counts based on three variables y , Col1, Col2 as shown below

     Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5

How do I expand this table such that the dataset has number of rows, as indicated in column n for combination of responses in Col1, Col2 & y ?

For example the dataset should have 13 rows of Col1=Good, Col2=Rich, y=1, 8 rows of Col1=Good, Col2=Rich, y=0 so on.

答案1

得分: 1

使用rep函数重复行名,并用其结果进行子集操作。
在下面的第一个例子中,我明确创建了一个索引i,在第二个例子中,一行代码解决了这个问题。

另外,在第一个例子中,输出中的行重复了(正如所要求的),行名显示了哪些行是哪些行的副本。在第二个例子中,通过将行名设置为NULL,它们被重新创建为从1开始的连续数字。

df1 <- "Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5"
df1 <- read.table(text = df1, header = TRUE)

i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#>     Col1 Col2 y  n
#> 3   Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13

df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#>   Col1 Col2 y  n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13

创建于2023-02-19,使用reprex v2.0.2

英文:

Use rep to repeat the row names and subset with its result.
In the first example below I explicitly create an index i, in the second a one-liner solves the problem.

Also, in the first example the output duplicates (as asked for) rows and the row names show which rows are duplicates of which. In the second example by setting the row names to NULL they are recreated to become consecutive numbers starting at 1.

df1 &lt;- &quot;Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5&quot;
df1 &lt;- read.table(text = df1, header = TRUE)

i &lt;- rep(row.names(df1), df1$n)
df2 &lt;- df1[i, ]
head(df2)
#&gt;     Col1 Col2 y  n
#&gt; 3   Good Rich 1 13
#&gt; 3.1 Good Rich 1 13
#&gt; 3.2 Good Rich 1 13
#&gt; 3.3 Good Rich 1 13
#&gt; 3.4 Good Rich 1 13
#&gt; 3.5 Good Rich 1 13

df2 &lt;- df1[rep(row.names(df1), df1$n), ]
row.names(df2) &lt;- NULL
head(df2)
#&gt;   Col1 Col2 y  n
#&gt; 1 Good Rich 1 13
#&gt; 2 Good Rich 1 13
#&gt; 3 Good Rich 1 13
#&gt; 4 Good Rich 1 13
#&gt; 5 Good Rich 1 13
#&gt; 6 Good Rich 1 13

<sup>Created on 2023-02-19 with reprex v2.0.2</sup>

答案2

得分: 1

你可以使用 uncount 函数:

tidyr::uncount(df, n)

       Col1 Col2 y
    1  Good Rich 1
    2  Good Rich 1
    3  Good Rich 1
    4  Good Rich 1
    5  Good Rich 1
    6  Good Rich 1
    7  Good Rich 1
    8  Good Rich 1
    9  Good Rich 1
    :   :    :   :
    :   :    :   :

问题是为什么你需要这样做?你意识到即使在计数之前,你仍然可以分析数据吧?如果每行有数百万次计数,取消计数数据是不明智的。

英文:

You could use uncount:

tidyr::uncount(df,n)

   Col1 Col2 y
1  Good Rich 1
2  Good Rich 1
3  Good Rich 1
4  Good Rich 1
5  Good Rich 1
6  Good Rich 1
7  Good Rich 1
8  Good Rich 1
9  Good Rich 1
:   :    :   :
:   :    :   :

The question is why do you need this? You do realize you can still analyze the data the way it is before the counts. What if there were millions of counts for each row? It will not be wise to uncount the data.

答案3

得分: 1

这里是一个使用splitstackshape包中的expandRows函数的替代方法:

library(splitstackshape)

expandRows(df, "n")
   Col1 Col2 y
3    Good Rich 1
3.1  Good Rich 1
3.2  Good Rich 1
3.3  Good Rich 1
3.4  Good Rich 1
3.5  Good Rich 1
3.6  Good Rich 1
3.7  Good Rich 1
3.8  Good Rich 1
....

注意:代码部分不进行翻译。

英文:

Here is an alternative using expandRows function from splitstackshape package:

library(splitstackshape)

expandRows(df, &quot;n&quot;)
   Col1 Col2 y
3    Good Rich 1
3.1  Good Rich 1
3.2  Good Rich 1
3.3  Good Rich 1
3.4  Good Rich 1
3.5  Good Rich 1
3.6  Good Rich 1
3.7  Good Rich 1
3.8  Good Rich 1
....

huangapple
  • 本文由 发表于 2023年2月19日 14:13:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498343.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定