2023年2月7日 04:23:55go评论65阅读模式

英文:

Count occurrences across multiple columns and group by year

问题

我有一个电影数据集，其中有一列表示年份，另外三列表示不同的电影类型。

以下是您提供的示例数据：

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class = "data.frame"
)

我正在尝试为每一年计算所有三种电影类型的数量。期望的结果如下（示例）：

genre | year | count
Action | 2008 | 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

我尝试了以下代码：

genre_years_test<-genre_structure %>%
  group_by(genre1, genre2, genre3, year) %>%
  summarise(total=n(), .groups = "drop")

但是它在每次在该年份发布新的类型时重复年份。

英文:

I have a movies dataset which has a column for year and three for genres.

Here's an example:

genre_structure&lt;-structure(
list(
year = c(
&quot;2008&quot;,
&quot;2003&quot;,
&quot;2010&quot;,
&quot;2001&quot;,
&quot;2002&quot;,
&quot;1999&quot;,
&quot;1980&quot;,
&quot;2020&quot;,
&quot;1977&quot;,
&quot;1991&quot;,
&quot;1954&quot;,
&quot;2022&quot;,
&quot;1962&quot;,
&quot;2000&quot;,
&quot;1994&quot;,
&quot;2019&quot;,
&quot;2019&quot;,
&quot;1981&quot;,
&quot;2012&quot;,
&quot;2003&quot;
),
genre1 = c(
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;,
&quot;Action&quot;
),
genre2 = c(
&quot;Crime&quot;,
&quot;Adventure&quot;,
&quot;Adventure&quot;,
&quot;Adventure&quot;,
&quot;Adventure&quot;,
&quot;SciFi&quot;,
&quot;Adventure&quot;,
&quot;Drama&quot;,
&quot;Adventure&quot;,
&quot;SciFi&quot;,
&quot;Drama&quot;,
&quot;Drama&quot;,
&quot;Drama&quot;,
&quot;Adventure&quot;,
&quot;Crime&quot;,
&quot;Adventure&quot;,
&quot;Adventure&quot;,
&quot;Adventure&quot;,
&quot;Drama&quot;,
&quot;Drama&quot;
),
genre3 = c(
&quot;Drama&quot;,
&quot;Drama&quot;,
&quot;SciFi&quot;,
&quot;Drama&quot;,
&quot;Drama&quot;,
&quot;&quot;,
&quot;Fantasy&quot;,
&quot;&quot;,
&quot;Fantasy&quot;,
&quot;&quot;,
&quot;&quot;,
&quot;Mystery&quot;,
&quot;Mystery&quot;,
&quot;Drama&quot;,
&quot;Drama&quot;,
&quot;Crime&quot;,
&quot;Drama&quot;,
&quot;&quot;,
&quot;&quot;,
&quot;Mystery&quot;
)
),
row.names = c(NA,-20L),
class =  &quot;data.frame&quot;
)

I am trying to count all 3 genres for each year. Expected result is (example):

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

I tried:

genre_years_test&lt;-genre_structure %&gt;% 
group_by(genre1, genre2, genre3, year) %&gt;% 
summarise(total=n(), .groups = &quot;drop&quot;)

But it is repeating the years each time a new genre is released in that year.

答案1

得分: 3

我们可以重塑为 'long' 形式，并获取 'count'。

library(dplyr)
library(tidyr)
genre_structure %>%
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')

英文:

We may reshape to 'long' and get the count

library(dplyr)
library(tidyr)
genre_structure %&gt;% 
pivot_longer(cols = -year, values_to = &#39;genre&#39;) %&gt;%
count(year, genre, name = &#39;count&#39;)

答案2

得分: 3

以下是代码的中文翻译：

在base中的解决方案：

# 仅用于娱乐，以下是一个在“基础”中的解决方案：
subset(as.data.frame(
        table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
                    ), Freq != 0)

在data.table中的解决方案：

# 使用“data.table”：
library(data.table)

melt(setDT(genre_structure), id.vars = c("year"),
                             variable.name = "genre")[, list(Freq = .N), 
                                                       .(year, value)]

希望这些翻译对您有帮助。

英文:

Just for laffs, here's a solution in base:

subset(as.data.frame(
table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
), Freq != 0)

And in data.table:

library(data.table)
melt(setDT(genre_structure), id.vars = c(&quot;year&quot;),
variable.name = &quot;genre&quot;)[, list(Freq =.N), 
.(year, value)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

统计跨多列的发生次数，并按年份分组

问题

答案1

答案2

如何制作具有分组变量的散点图？

有没有办法使用R计算列中的个别方程？

使用R中的数组进行矩阵切片

匹配数组数据与子数组数据

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论