英文:
r create observations from frequency counts
问题
我根据您提供的内容进行翻译如下:
我有基于三个变量 y,Col1,Col2
的频率计数,如下所示:
Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5
我如何扩展这个表格,以便数据集的行数如 Col1,Col2和y
中的响应组合中的 n
列所示?
例如,数据集应该有13行 Col1=Good, Col2=Rich, y=1
,8行 Col1=Good, Col2=Rich, y=0
,以此类推。
英文:
I have frequency counts based on three variables y , Col1, Col2
as shown below
Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5
How do I expand this table such that the dataset has number of rows, as indicated in column n
for combination of responses in Col1, Col2 & y
?
For example the dataset should have 13 rows of Col1=Good, Col2=Rich, y=1
, 8 rows of Col1=Good, Col2=Rich, y=0
so on.
答案1
得分: 1
使用rep
函数重复行名,并用其结果进行子集操作。
在下面的第一个例子中,我明确创建了一个索引i
,在第二个例子中,一行代码解决了这个问题。
另外,在第一个例子中,输出中的行重复了(正如所要求的),行名显示了哪些行是哪些行的副本。在第二个例子中,通过将行名设置为NULL
,它们被重新创建为从1开始的连续数字。
df1 <- "Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5"
df1 <- read.table(text = df1, header = TRUE)
i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#> Col1 Col2 y n
#> 3 Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13
df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#> Col1 Col2 y n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13
创建于2023-02-19,使用reprex v2.0.2
英文:
Use rep
to repeat the row names and subset with its result.
In the first example below I explicitly create an index i
, in the second a one-liner solves the problem.
Also, in the first example the output duplicates (as asked for) rows and the row names show which rows are duplicates of which. In the second example by setting the row names to NULL
they are recreated to become consecutive numbers starting at 1.
df1 <- "Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5"
df1 <- read.table(text = df1, header = TRUE)
i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#> Col1 Col2 y n
#> 3 Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13
df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#> Col1 Col2 y n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13
<sup>Created on 2023-02-19 with reprex v2.0.2</sup>
答案2
得分: 1
你可以使用 uncount
函数:
tidyr::uncount(df, n)
Col1 Col2 y
1 Good Rich 1
2 Good Rich 1
3 Good Rich 1
4 Good Rich 1
5 Good Rich 1
6 Good Rich 1
7 Good Rich 1
8 Good Rich 1
9 Good Rich 1
: : : :
: : : :
问题是为什么你需要这样做?你意识到即使在计数之前,你仍然可以分析数据吧?如果每行有数百万次计数,取消计数数据是不明智的。
英文:
You could use uncount
:
tidyr::uncount(df,n)
Col1 Col2 y
1 Good Rich 1
2 Good Rich 1
3 Good Rich 1
4 Good Rich 1
5 Good Rich 1
6 Good Rich 1
7 Good Rich 1
8 Good Rich 1
9 Good Rich 1
: : : :
: : : :
The question is why do you need this? You do realize you can still analyze the data the way it is before the counts. What if there were millions of counts for each row? It will not be wise to uncount the data.
答案3
得分: 1
这里是一个使用splitstackshape
包中的expandRows
函数的替代方法:
library(splitstackshape)
expandRows(df, "n")
Col1 Col2 y
3 Good Rich 1
3.1 Good Rich 1
3.2 Good Rich 1
3.3 Good Rich 1
3.4 Good Rich 1
3.5 Good Rich 1
3.6 Good Rich 1
3.7 Good Rich 1
3.8 Good Rich 1
....
注意:代码部分不进行翻译。
英文:
Here is an alternative using expandRows
function from splitstackshape
package:
library(splitstackshape)
expandRows(df, "n")
Col1 Col2 y
3 Good Rich 1
3.1 Good Rich 1
3.2 Good Rich 1
3.3 Good Rich 1
3.4 Good Rich 1
3.5 Good Rich 1
3.6 Good Rich 1
3.7 Good Rich 1
3.8 Good Rich 1
....
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论