2023年6月29日 18:13:31go评论114阅读模式

英文:

'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

问题

我有一个名为'google_removal'的数据集，包含5列和14,468行。我试图根据指定每行观测频率的值来扩展行数。
然而，我一直收到'未定义的列被选择'的错误消息。据我所见，这不是因为忘记添加逗号的问题，我已检查了相关列是否有我使用的特定标题。

频率在'总计'列中指定，是数值型的，其他变量是字符型的。我是一个完全初学者，如果问题结构不好，请原谅。

数据看起来像这样。如果需要更多信息，我可以提供。

'期末日期'    '国家'    '代码'    '产品'    '原因'    '总计'    '年份'
2011-06-30       阿根廷   AR    博客    诽谤    4      2011
2011-06-30       阿根廷   AR    博客    隐私    1      2011 
2011-06-30       阿根廷   AR    GoogleAd 诽谤    1      2011  
2011-06-30       阿根廷   AR    WebSearch 诽谤    6      2011

我尝试实现以下目标:

'期末日期'    '国家'    '代码'    '产品'    '原因'        '年份'
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    诽谤           2011
2011-06-30       阿根廷   AR    博客    隐私           2011 
2011-06-30       阿根廷   AR    GoogleAd 诽谤           2011  
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011 
2011-06-30       阿根廷   AR    WebSearch 诽谤           2011

我尝试了以下操作（'total'是我想要用来扩展行数的列的标题），并且是从一个关于扩展行的问题的答案中进行了适应。

google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]

我期望其中一个能返回扩展后的数据框，但实际上我收到了变化的错误消息：

"Error in [.data.frame(google_removal, rep(row.names("total"),
google_removal$total), : undefined columns selected" 对于两者都是如此。

感谢任何帮助。我尝试解决这个问题时，是根据以下答案进行的
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column

以下是一些数据以供尝试：

dput(head(google_removal))

&quot; 2011-06-30 &quot;，&quot; 2011-06-30 &quot;，&quot; 2011-06-30 &quot;，&quot; 2011-06-30 &quot;），国家= c（&quot;阿根廷&quot;， 
&quot;阿根廷&quot;，&quot;阿根廷&quot;，&quot;阿根廷&quot;，&quot;阿根廷&quot;，&quot;阿根廷&quot;
），代码= c（&quot; AR &quot;，&quot; AR &quot;，&quot; AR &quot;，&quot; AR &quot;，&quot; AR &quot;，&quot; AR &quot;），产品= c（&quot; Blogger &quot;， 
&quot; Blogger &quot;，&quot; Google Ads &quot;，&quot; Web Search &quot;，&quot; Web Search &quot;，&quot; Web Search &quot;
），原因= c（&quot; 诽谤 &quot;，&quot; 隐私与安全 &quot;，&quot; 诽谤 &quot;， 
&quot; 诽谤 &quot;，&quot; 其他 &quot;，&quot; 隐私与安全 &quot;），总数= c（4L， 
1L，1L，6L，1L，8L），年= c（&quot; 2011 &quot;，&quot; 2011 &quot;，&quot; 2011 &quot;，&quot; 2011 &quot;， 
&quot; 2011 &quot;，&quot; 2011 &quot;）），行名= c（NA，6L），类= &quot;数据框&quot;）
<details>
<summary>英文:</summary>
I have a data set titled &#39;google_removal&#39; with 5 columns and 14,468 rows. I am trying to expand the number of rows based on the values which specifies the frequency for observations in each row. 
However, I keep getting the &#39;undefined columns selected&#39; error message. As far as I can see it is not a problem that stems from forgetting to add a comma and I have checked that the column in question has the particular heading I am using. 
Frequency is specified in the the column &#39;total&#39;. It is numerical. The other variables are character. Very much a beginner here so apologies if question is poorly structured.
Data looks like this. Happy to provide more if needed. 
    &#39;period ending&#39;  &#39;country&#39; &#39;code&#39; &#39;product&#39;  &#39;reason&#39; &#39;total&#39; &#39;year&#39;
    2011-06-30       Argentina   AR    Blogger   Defamation 4      2011
    2011-06-30       Argentina   AR    Blogger   Privacy    1      2011 
    2011-06-30       Argentina   AR    GoogleAd  Defamation 1      2011  
    2011-06-30       Argentina   AR    WebSearch Defamation 6      2011 
I was trying to achieve the following: 
    &#39;period_ending&#39;  &#39;country&#39; &#39;code&#39; &#39;product&#39;  &#39;reason&#39;        &#39;year&#39;
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Defamation       2011
    2011-06-30       Argentina   AR    Blogger   Privacy          2011 
    2011-06-30       Argentina   AR    GoogleAd  Defamation       2011  
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
    2011-06-30       Argentina   AR    WebSearch Defamation       2011 
I have tried the following (&#39;total&#39; is the heading of the column I want to use for expanding the number of rows) adapting it from an answer to a question on expanding rows.
    google_removal_expanded &lt;- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]
    
    google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]
I was expecting either of these to return the expanded data frame but I instead received variations of
&gt; &quot;Error in `[.data.frame`(google_removal, rep(row.names(&quot;total&quot;),
&gt; google_removal$total),  :    undefined columns selected&quot; for both.
Grateful for any assistance. My attempt to solve the problem adapted the answers in 
https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column
Here is some data to try it with:
    dput(head(google_removal))

structure(list(period_ending = c("2011-06-30", "2011-06-30",
"2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"), country = c("Argentina",
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina"
), code = c("AR", "AR", "AR", "AR", "AR", "AR"), product = c("Blogger",
"Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"
), reason = c("Defamation", "Privacy and Security", "Defamation",
"Defamation", "Other", "Privacy and Security"), total = c(4L,
1L, 1L, 6L, 1L, 8L), year = c("2011", "2011", "2011", "2011",
"2011", "2011")), row.names = c(NA, 6L), class = "data.frame")

答案1

得分: 1

有一个tidyr动词可用于扩展类似你的频率数据的数据：

library(tidyr)
google_removal %&gt;% 
  uncount(total)

英文:

There is a tidyr verb for expanding frequency data such as yours:

library(tidyr)
google_removal %&gt;% 
  uncount(total)

答案2

得分: 0

你遇到的问题是由于表达式中的 1:13000 部分引起的，因为您的数据框有 7 列（总共排除 6 列），所以如果您的数据框是这样的：

google_removal <- data.frame(
  period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
  country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"), 
  code = c("AR", "AR", "AR", "AR", "AR", "AR"),
  product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"), 
  reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"), 
  total = c(4L, 1L, 1L, 6L, 1L, 8L),
  year = c("2011", "2011", "2011", "2011", "2011", "2011"))

然后这个：

google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

将为您提供正确的结果，行数与 total 字段中指定的一样多。

英文:

The issue you are having is due to the 1:13000 part of your expression, because your data frame has 7 columns (6 removing total), so if your data frame is:

google_removal &lt;- data.frame(
  period_ending = c(&quot;2011-06-30&quot;, &quot;2011-06-30&quot;,&quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;, &quot;2011-06-30&quot;),
  country = c(&quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;, &quot;Argentina&quot;), 
  code = c(&quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;, &quot;AR&quot;),
  product = c(&quot;Blogger&quot;,&quot;Blogger&quot;, &quot;Google Ads&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;, &quot;Web Search&quot;), 
  reason = c(&quot;Defamation&quot;, &quot;Privacy and Security&quot;, &quot;Defamation&quot;, &quot;Defamation&quot;, &quot;Other&quot;, &quot;Privacy and Security&quot;), 
  total = c(4L, 1L, 1L, 6L, 1L, 8L),
  year = c(&quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;, &quot;2011&quot;))

Then this:

google_removal_expanded &lt;- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

Will get you the correct result with as many rows as total field says.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

问题

答案1

答案2

使用模式进行 gsub 的方法

Group and merge rows by ID when there are identical start and end date fields in R columns

Summarise node table

在R中合并矩阵的行/列名称

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。