英文:
Count occurrences across multiple columns and group by year
问题
我有一个电影数据集,其中有一列表示年份,另外三列表示不同的电影类型。
以下是您提供的示例数据:
genre_structure<-structure(
list(
year = c(
"2008",
"2003",
"2010",
"2001",
"2002",
"1999",
"1980",
"2020",
"1977",
"1991",
"1954",
"2022",
"1962",
"2000",
"1994",
"2019",
"2019",
"1981",
"2012",
"2003"
),
genre1 = c(
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action"
),
genre2 = c(
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Adventure",
"SciFi",
"Adventure",
"Drama",
"Adventure",
"SciFi",
"Drama",
"Drama",
"Drama",
"Adventure",
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Drama",
"Drama"
),
genre3 = c(
"Drama",
"Drama",
"SciFi",
"Drama",
"Drama",
"",
"Fantasy",
"",
"Fantasy",
"",
"",
"Mystery",
"Mystery",
"Drama",
"Drama",
"Crime",
"Drama",
"",
"",
"Mystery"
)
),
row.names = c(NA,-20L),
class = "data.frame"
)
我正在尝试为每一年计算所有三种电影类型的数量。期望的结果如下(示例):
genre | year | count
Action | 2008 | 1
Comedy | 2008 | 3
Drama | 2008 | 4
...
我尝试了以下代码:
genre_years_test<-genre_structure %>%
group_by(genre1, genre2, genre3, year) %>%
summarise(total=n(), .groups = "drop")
但是它在每次在该年份发布新的类型时重复年份。
英文:
I have a movies dataset which has a column for year and three for genres.
Here's an example:
genre_structure<-structure(
list(
year = c(
"2008",
"2003",
"2010",
"2001",
"2002",
"1999",
"1980",
"2020",
"1977",
"1991",
"1954",
"2022",
"1962",
"2000",
"1994",
"2019",
"2019",
"1981",
"2012",
"2003"
),
genre1 = c(
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action"
),
genre2 = c(
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Adventure",
"SciFi",
"Adventure",
"Drama",
"Adventure",
"SciFi",
"Drama",
"Drama",
"Drama",
"Adventure",
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Drama",
"Drama"
),
genre3 = c(
"Drama",
"Drama",
"SciFi",
"Drama",
"Drama",
"",
"Fantasy",
"",
"Fantasy",
"",
"",
"Mystery",
"Mystery",
"Drama",
"Drama",
"Crime",
"Drama",
"",
"",
"Mystery"
)
),
row.names = c(NA,-20L),
class = "data.frame"
)
I am trying to count all 3 genres for each year. Expected result is (example):
genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...
I tried:
genre_years_test<-genre_structure %>%
group_by(genre1, genre2, genre3, year) %>%
summarise(total=n(), .groups = "drop")
But it is repeating the years each time a new genre is released in that year.
答案1
得分: 3
我们可以重塑为 'long' 形式,并获取 'count'。
library(dplyr)
library(tidyr)
genre_structure %>%
pivot_longer(cols = -year, values_to = 'genre') %>%
count(year, genre, name = 'count')
英文:
We may reshape to 'long' and get the count
library(dplyr)
library(tidyr)
genre_structure %>%
pivot_longer(cols = -year, values_to = 'genre') %>%
count(year, genre, name = 'count')
答案2
得分: 3
以下是代码的中文翻译:
在base
中的解决方案:
# 仅用于娱乐,以下是一个在“基础”中的解决方案:
subset(as.data.frame(
table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
), Freq != 0)
在data.table
中的解决方案:
# 使用“data.table”:
library(data.table)
melt(setDT(genre_structure), id.vars = c("year"),
variable.name = "genre")[, list(Freq = .N),
.(year, value)]
希望这些翻译对您有帮助。
英文:
Just for laffs, here's a solution in base
:
subset(as.data.frame(
table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
), Freq != 0)
And in data.table
:
library(data.table)
melt(setDT(genre_structure), id.vars = c("year"),
variable.name = "genre")[, list(Freq =.N),
.(year, value)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论