'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

huangapple go评论114阅读模式
英文:

'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

问题

我有一个名为'google_removal'的数据集,包含5列和14,468行。我试图根据指定每行观测频率的值来扩展行数。
然而,我一直收到'未定义的列被选择'的错误消息。据我所见,这不是因为忘记添加逗号的问题,我已检查了相关列是否有我使用的特定标题。

频率在'总计'列中指定,是数值型的,其他变量是字符型的。我是一个完全初学者,如果问题结构不好,请原谅。

数据看起来像这样。如果需要更多信息,我可以提供。

  1. '期末日期' '国家' '代码' '产品' '原因' '总计' '年份'
  2. 2011-06-30 阿根廷 AR 博客 诽谤 4 2011
  3. 2011-06-30 阿根廷 AR 博客 隐私 1 2011
  4. 2011-06-30 阿根廷 AR GoogleAd 诽谤 1 2011
  5. 2011-06-30 阿根廷 AR WebSearch 诽谤 6 2011

我尝试实现以下目标:

  1. '期末日期' '国家' '代码' '产品' '原因' '年份'
  2. 2011-06-30 阿根廷 AR 博客 诽谤 2011
  3. 2011-06-30 阿根廷 AR 博客 诽谤 2011
  4. 2011-06-30 阿根廷 AR 博客 诽谤 2011
  5. 2011-06-30 阿根廷 AR 博客 诽谤 2011
  6. 2011-06-30 阿根廷 AR 博客 隐私 2011
  7. 2011-06-30 阿根廷 AR GoogleAd 诽谤 2011
  8. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011
  9. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011
  10. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011
  11. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011
  12. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011
  13. 2011-06-30 阿根廷 AR WebSearch 诽谤 2011

我尝试了以下操作('total'是我想要用来扩展行数的列的标题),并且是从一个关于扩展行的问题的答案中进行了适应。

  1. google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
  2. google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]

我期望其中一个能返回扩展后的数据框,但实际上我收到了变化的错误消息:

"Error in [.data.frame(google_removal, rep(row.names("total"),
google_removal$total), : undefined columns selected" 对于两者都是如此。

感谢任何帮助。我尝试解决这个问题时,是根据以下答案进行的
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column

以下是一些数据以供尝试:

  1. dput(head(google_removal))
  1. &quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;),国家= c(&quot;阿根廷&quot;,
  2. &quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;
  3. ),代码= c(&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;),产品= c(&quot; Blogger &quot;,
  4. &quot; Blogger &quot;,&quot; Google Ads &quot;,&quot; Web Search &quot;,&quot; Web Search &quot;,&quot; Web Search &quot;
  5. ),原因= c(&quot; 诽谤 &quot;,&quot; 隐私与安全 &quot;,&quot; 诽谤 &quot;,
  6. &quot; 诽谤 &quot;,&quot; 其他 &quot;,&quot; 隐私与安全 &quot;),总数= c4L
  7. 1L1L6L1L8L),年= c(&quot; 2011 &quot;,&quot; 2011 &quot;,&quot; 2011 &quot;,&quot; 2011 &quot;,
  8. &quot; 2011 &quot;,&quot; 2011 &quot;)),行名= cNA6L),类= &quot;数据框&quot;)
  9. <details>
  10. <summary>英文:</summary>
  11. I have a data set titled &#39;google_removal&#39; with 5 columns and 14,468 rows. I am trying to expand the number of rows based on the values which specifies the frequency for observations in each row.
  12. However, I keep getting the &#39;undefined columns selected&#39; error message. As far as I can see it is not a problem that stems from forgetting to add a comma and I have checked that the column in question has the particular heading I am using.
  13. Frequency is specified in the the column &#39;total&#39;. It is numerical. The other variables are character. Very much a beginner here so apologies if question is poorly structured.
  14. Data looks like this. Happy to provide more if needed.
  15. &#39;period ending&#39; &#39;country&#39; &#39;code&#39; &#39;product&#39; &#39;reason&#39; &#39;total&#39; &#39;year&#39;
  16. 2011-06-30 Argentina AR Blogger Defamation 4 2011
  17. 2011-06-30 Argentina AR Blogger Privacy 1 2011
  18. 2011-06-30 Argentina AR GoogleAd Defamation 1 2011
  19. 2011-06-30 Argentina AR WebSearch Defamation 6 2011
  20. I was trying to achieve the following:
  21. &#39;period_ending&#39; &#39;country&#39; &#39;code&#39; &#39;product&#39; &#39;reason&#39; &#39;year&#39;
  22. 2011-06-30 Argentina AR Blogger Defamation 2011
  23. 2011-06-30 Argentina AR Blogger Defamation 2011
  24. 2011-06-30 Argentina AR Blogger Defamation 2011
  25. 2011-06-30 Argentina AR Blogger Defamation 2011
  26. 2011-06-30 Argentina AR Blogger Privacy 2011
  27. 2011-06-30 Argentina AR GoogleAd Defamation 2011
  28. 2011-06-30 Argentina AR WebSearch Defamation 2011
  29. 2011-06-30 Argentina AR WebSearch Defamation 2011
  30. 2011-06-30 Argentina AR WebSearch Defamation 2011
  31. 2011-06-30 Argentina AR WebSearch Defamation 2011
  32. 2011-06-30 Argentina AR WebSearch Defamation 2011
  33. 2011-06-30 Argentina AR WebSearch Defamation 2011
  34. I have tried the following (&#39;total&#39; is the heading of the column I want to use for expanding the number of rows) adapting it from an answer to a question on expanding rows.
  35. google_removal_expanded &lt;- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
  36. google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]
  37. I was expecting either of these to return the expanded data frame but I instead received variations of
  38. &gt; &quot;Error in `[.data.frame`(google_removal, rep(row.names(&quot;total&quot;),
  39. &gt; google_removal$total), : undefined columns selected&quot; for both.
  40. Grateful for any assistance. My attempt to solve the problem adapted the answers in
  41. https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column
  42. Here is some data to try it with:
  43. dput(head(google_removal))

structure(list(period_ending = c("2011-06-30", "2011-06-30",
"2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"), country = c("Argentina",
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina"
), code = c("AR", "AR", "AR", "AR", "AR", "AR"), product = c("Blogger",
"Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"
), reason = c("Defamation", "Privacy and Security", "Defamation",
"Defamation", "Other", "Privacy and Security"), total = c(4L,
1L, 1L, 6L, 1L, 8L), year = c("2011", "2011", "2011", "2011",
"2011", "2011")), row.names = c(NA, 6L), class = "data.frame")

答案1

得分: 1

有一个tidyr动词可用于扩展类似你的频率数据的数据:

  1. library(tidyr)
  2. google_removal %&gt;%
  3. uncount(total)
英文:

There is a tidyr verb for expanding frequency data such as yours:

  1. library(tidyr)
  2. google_removal %&gt;%
  3. uncount(total)

答案2

得分: 0

你遇到的问题是由于表达式中的 1:13000 部分引起的,因为您的数据框有 7 列(总共排除 6 列),所以如果您的数据框是这样的:

  1. google_removal <- data.frame(
  2. period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
  3. country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"),
  4. code = c("AR", "AR", "AR", "AR", "AR", "AR"),
  5. product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"),
  6. reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"),
  7. total = c(4L, 1L, 1L, 6L, 1L, 8L),
  8. year = c("2011", "2011", "2011", "2011", "2011", "2011"))

然后这个:

  1. google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

将为您提供正确的结果,行数与 total 字段中指定的一样多。

英文:

The issue you are having is due to the 1:13000 part of your expression, because your data frame has 7 columns (6 removing total), so if your data frame is:

  1. google_removal &lt;- data.frame(
  2. period_ending = c(&quot;2011-06-30&quot;, &quot;2011-06-30&quot;,&quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;),
  3. country = c(&quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;),
  4. code = c(&quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;),
  5. product = c(&quot;Blogger&quot;,&quot;Blogger&quot;, &quot;Google Ads&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;),
  6. reason = c(&quot;Defamation&quot;, &quot;Privacy and Security&quot;, &quot;Defamation&quot;, &quot;Defamation&quot;, &quot;Other&quot;, &quot;Privacy and Security&quot;),
  7. total = c(4L, 1L, 1L, 6L, 1L, 8L),
  8. year = c(&quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;))

Then this:

  1. google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

Will get you the correct result with as many rows as total field says.

huangapple
  • 本文由 发表于 2023年6月29日 18:13:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580094.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定