'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

huangapple go评论99阅读模式
英文:

'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

问题

我有一个名为'google_removal'的数据集,包含5列和14,468行。我试图根据指定每行观测频率的值来扩展行数。
然而,我一直收到'未定义的列被选择'的错误消息。据我所见,这不是因为忘记添加逗号的问题,我已检查了相关列是否有我使用的特定标题。

频率在'总计'列中指定,是数值型的,其他变量是字符型的。我是一个完全初学者,如果问题结构不好,请原谅。

数据看起来像这样。如果需要更多信息,我可以提供。

'期末日期'    '国家'    '代码'    '产品'    '原因'    '总计'    '年份'
2011-06-30       阿根廷   AR    博客    诽谤    4      2011
2011-06-30       阿根廷   AR    博客    隐私    1      2011 
2011-06-30       阿根廷   AR    GoogleAd 诽谤    1      2011  
2011-06-30       阿根廷   AR    WebSearch 诽谤    6      2011 

我尝试实现以下目标:

'期末日期'    '国家'    '代码'    '产品'    '原因'        '年份'
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    隐私           2011 
2011-06-30       阿根廷   AR    GoogleAd 诽谤           2011  
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 

我尝试了以下操作('total'是我想要用来扩展行数的列的标题),并且是从一个关于扩展行的问题的答案中进行了适应。

google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]

google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]

我期望其中一个能返回扩展后的数据框,但实际上我收到了变化的错误消息:

"Error in [.data.frame(google_removal, rep(row.names("total"),
google_removal$total), : undefined columns selected" 对于两者都是如此。

感谢任何帮助。我尝试解决这个问题时,是根据以下答案进行的
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column

以下是一些数据以供尝试:

dput(head(google_removal))
&quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;,&quot; 2011-06-30 &quot;),国家= c(&quot;阿根廷&quot;, 
&quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;,&quot;阿根廷&quot;
),代码= c(&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;,&quot; AR &quot;),产品= c(&quot; Blogger &quot;, 
&quot; Blogger &quot;,&quot; Google Ads &quot;,&quot; Web Search &quot;,&quot; Web Search &quot;,&quot; Web Search &quot;
),原因= c(&quot; 诽谤 &quot;,&quot; 隐私与安全 &quot;,&quot; 诽谤 &quot;, 
&quot; 诽谤 &quot;,&quot; 其他 &quot;,&quot; 隐私与安全 &quot;),总数= c(4L, 
1L,1L,6L,1L,8L),年= c(&quot; 2011 &quot;,&quot; 2011 &quot;,&quot; 2011 &quot;,&quot; 2011 &quot;, 
&quot; 2011 &quot;,&quot; 2011 &quot;)),行名= c(NA,6L),类= &quot;数据框&quot;)

<details>
<summary>英文:</summary>

I have a data set titled &#39;google_removal&#39; with 5 columns and 14,468 rows. I am trying to expand the number of rows based on the values which specifies the frequency for observations in each row. 
However, I keep getting the &#39;undefined columns selected&#39; error message. As far as I can see it is not a problem that stems from forgetting to add a comma and I have checked that the column in question has the particular heading I am using. 

Frequency is specified in the the column &#39;total&#39;. It is numerical. The other variables are character. Very much a beginner here so apologies if question is poorly structured.

Data looks like this. Happy to provide more if needed. 

    &#39;period ending&#39;  &#39;country&#39; &#39;code&#39; &#39;product&#39;  &#39;reason&#39; &#39;total&#39; &#39;year&#39;
    2011-06-30       Argentina   AR    Blogger   Defamation 4      2011
    2011-06-30       Argentina   AR    Blogger   Privacy    1      2011 
    2011-06-30       Argentina   AR    GoogleAd  Defamation 1      2011  
    2011-06-30       Argentina   AR    WebSearch Defamation 6      2011 

I was trying to achieve the following: 

    &#39;period_ending&#39;  &#39;country&#39; &#39;code&#39; &#39;product&#39;  &#39;reason&#39;        &#39;year&#39;
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Privacy          2011 
    2011-06-30       Argentina   AR    GoogleAd  Defamation       2011  
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 

I have tried the following (&#39;total&#39; is the heading of the column I want to use for expanding the number of rows) adapting it from an answer to a question on expanding rows.

    google_removal_expanded &lt;- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
    
    google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]

I was expecting either of these to return the expanded data frame but I instead received variations of

&gt; &quot;Error in `[.data.frame`(google_removal, rep(row.names(&quot;total&quot;),
&gt; google_removal$total),  :    undefined columns selected&quot; for both.

Grateful for any assistance. My attempt to solve the problem adapted the answers in 
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column

Here is some data to try it with:

    dput(head(google_removal))

structure(list(period_ending = c("2011-06-30", "2011-06-30",
"2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"), country = c("Argentina",
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina"
), code = c("AR", "AR", "AR", "AR", "AR", "AR"), product = c("Blogger",
"Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"
), reason = c("Defamation", "Privacy and Security", "Defamation",
"Defamation", "Other", "Privacy and Security"), total = c(4L,
1L, 1L, 6L, 1L, 8L), year = c("2011", "2011", "2011", "2011",
"2011", "2011")), row.names = c(NA, 6L), class = "data.frame")

答案1

得分: 1

有一个tidyr动词可用于扩展类似你的频率数据的数据:

library(tidyr)
google_removal %&gt;% 
  uncount(total)
英文:

There is a tidyr verb for expanding frequency data such as yours:

library(tidyr)
google_removal %&gt;% 
  uncount(total)

答案2

得分: 0

你遇到的问题是由于表达式中的 1:13000 部分引起的,因为您的数据框有 7 列(总共排除 6 列),所以如果您的数据框是这样的:

google_removal <- data.frame(
  period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
  country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"), 
  code = c("AR", "AR", "AR", "AR", "AR", "AR"),
  product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"), 
  reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"), 
  total = c(4L, 1L, 1L, 6L, 1L, 8L),
  year = c("2011", "2011", "2011", "2011", "2011", "2011"))

然后这个:

google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

将为您提供正确的结果,行数与 total 字段中指定的一样多。

英文:

The issue you are having is due to the 1:13000 part of your expression, because your data frame has 7 columns (6 removing total), so if your data frame is:

google_removal &lt;- data.frame(
  period_ending = c(&quot;2011-06-30&quot;, &quot;2011-06-30&quot;,&quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;),
  country = c(&quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;), 
  code = c(&quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;),
  product = c(&quot;Blogger&quot;,&quot;Blogger&quot;, &quot;Google Ads&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;), 
  reason = c(&quot;Defamation&quot;, &quot;Privacy and Security&quot;, &quot;Defamation&quot;, &quot;Defamation&quot;, &quot;Other&quot;, &quot;Privacy and Security&quot;), 
  total = c(4L, 1L, 1L, 6L, 1L, 8L),
  year = c(&quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;))

Then this:

google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

Will get you the correct result with as many rows as total field says.

huangapple
  • 本文由 发表于 2023年6月29日 18:13:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580094.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定