英文:
'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)
问题
我有一个名为'google_removal'的数据集,包含5列和14,468行。我试图根据指定每行观测频率的值来扩展行数。
然而,我一直收到'未定义的列被选择'的错误消息。据我所见,这不是因为忘记添加逗号的问题,我已检查了相关列是否有我使用的特定标题。
频率在'总计'列中指定,是数值型的,其他变量是字符型的。我是一个完全初学者,如果问题结构不好,请原谅。
数据看起来像这样。如果需要更多信息,我可以提供。
'期末日期' '国家' '代码' '产品' '原因' '总计' '年份'
2011-06-30 阿根廷 AR 博客 诽谤 4 2011
2011-06-30 阿根廷 AR 博客 隐私 1 2011
2011-06-30 阿根廷 AR GoogleAd 诽谤 1 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 6 2011
我尝试实现以下目标:
'期末日期' '国家' '代码' '产品' '原因' '年份'
2011-06-30 阿根廷 AR 博客 诽谤 2011
2011-06-30 阿根廷 AR 博客 诽谤 2011
2011-06-30 阿根廷 AR 博客 诽谤 2011
2011-06-30 阿根廷 AR 博客 诽谤 2011
2011-06-30 阿根廷 AR 博客 隐私 2011
2011-06-30 阿根廷 AR GoogleAd 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
2011-06-30 阿根廷 AR WebSearch 诽谤 2011
我尝试了以下操作('total'是我想要用来扩展行数的列的标题),并且是从一个关于扩展行的问题的答案中进行了适应。
google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]
我期望其中一个能返回扩展后的数据框,但实际上我收到了变化的错误消息:
"Error in
[.data.frame
(google_removal, rep(row.names("total"),
google_removal$total), : undefined columns selected" 对于两者都是如此。
感谢任何帮助。我尝试解决这个问题时,是根据以下答案进行的
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column
以下是一些数据以供尝试:
dput(head(google_removal))
" 2011-06-30 "," 2011-06-30 "," 2011-06-30 "," 2011-06-30 "),国家= c("阿根廷",
"阿根廷","阿根廷","阿根廷","阿根廷","阿根廷"
),代码= c(" AR "," AR "," AR "," AR "," AR "," AR "),产品= c(" Blogger ",
" Blogger "," Google Ads "," Web Search "," Web Search "," Web Search "
),原因= c(" 诽谤 "," 隐私与安全 "," 诽谤 ",
" 诽谤 "," 其他 "," 隐私与安全 "),总数= c(4L,
1L,1L,6L,1L,8L),年= c(" 2011 "," 2011 "," 2011 "," 2011 ",
" 2011 "," 2011 ")),行名= c(NA,6L),类= "数据框")
<details>
<summary>英文:</summary>
I have a data set titled 'google_removal' with 5 columns and 14,468 rows. I am trying to expand the number of rows based on the values which specifies the frequency for observations in each row.
However, I keep getting the 'undefined columns selected' error message. As far as I can see it is not a problem that stems from forgetting to add a comma and I have checked that the column in question has the particular heading I am using.
Frequency is specified in the the column 'total'. It is numerical. The other variables are character. Very much a beginner here so apologies if question is poorly structured.
Data looks like this. Happy to provide more if needed.
'period ending' 'country' 'code' 'product' 'reason' 'total' 'year'
2011-06-30 Argentina AR Blogger Defamation 4 2011
2011-06-30 Argentina AR Blogger Privacy 1 2011
2011-06-30 Argentina AR GoogleAd Defamation 1 2011
2011-06-30 Argentina AR WebSearch Defamation 6 2011
I was trying to achieve the following:
'period_ending' 'country' 'code' 'product' 'reason' 'year'
2011-06-30 Argentina AR Blogger Defamation 2011
2011-06-30 Argentina AR Blogger Defamation 2011
2011-06-30 Argentina AR Blogger Defamation 2011
2011-06-30 Argentina AR Blogger Defamation 2011
2011-06-30 Argentina AR Blogger Privacy 2011
2011-06-30 Argentina AR GoogleAd Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
2011-06-30 Argentina AR WebSearch Defamation 2011
I have tried the following ('total' is the heading of the column I want to use for expanding the number of rows) adapting it from an answer to a question on expanding rows.
google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]
I was expecting either of these to return the expanded data frame but I instead received variations of
> "Error in `[.data.frame`(google_removal, rep(row.names("total"),
> google_removal$total), : undefined columns selected" for both.
Grateful for any assistance. My attempt to solve the problem adapted the answers in
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column
Here is some data to try it with:
dput(head(google_removal))
structure(list(period_ending = c("2011-06-30", "2011-06-30",
"2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"), country = c("Argentina",
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina"
), code = c("AR", "AR", "AR", "AR", "AR", "AR"), product = c("Blogger",
"Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"
), reason = c("Defamation", "Privacy and Security", "Defamation",
"Defamation", "Other", "Privacy and Security"), total = c(4L,
1L, 1L, 6L, 1L, 8L), year = c("2011", "2011", "2011", "2011",
"2011", "2011")), row.names = c(NA, 6L), class = "data.frame")
答案1
得分: 1
有一个tidyr
动词可用于扩展类似你的频率数据的数据:
library(tidyr)
google_removal %>%
uncount(total)
英文:
There is a tidyr
verb for expanding frequency data such as yours:
library(tidyr)
google_removal %>%
uncount(total)
答案2
得分: 0
你遇到的问题是由于表达式中的 1:13000
部分引起的,因为您的数据框有 7 列(总共排除 6 列),所以如果您的数据框是这样的:
google_removal <- data.frame(
period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"),
code = c("AR", "AR", "AR", "AR", "AR", "AR"),
product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"),
reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"),
total = c(4L, 1L, 1L, 6L, 1L, 8L),
year = c("2011", "2011", "2011", "2011", "2011", "2011"))
然后这个:
google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]
将为您提供正确的结果,行数与 total
字段中指定的一样多。
英文:
The issue you are having is due to the 1:13000
part of your expression, because your data frame has 7 columns (6 removing total), so if your data frame is:
google_removal <- data.frame(
period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"),
code = c("AR", "AR", "AR", "AR", "AR", "AR"),
product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"),
reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"),
total = c(4L, 1L, 1L, 6L, 1L, 8L),
year = c("2011", "2011", "2011", "2011", "2011", "2011"))
Then this:
google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]
Will get you the correct result with as many rows as total
field says.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论