基于多个条件压缩/合并 R 数据框中的行:

huangapple go评论88阅读模式
英文:

Compressing/combining rows in r df based on multiple conditions

问题

我有一个包含一个简单游戏输出的数据框。每30行,数据框切换到一组新的玩家。以下是一个示例,仅包含前两组玩家:

  1. > dput(stackdf)
  2. structure(list(id = structure(list(`dataset_single$id` = c(20230420,
  3. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  4. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  5. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  6. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  7. 20230420, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  8. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  9. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  10. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  11. 20230424, 20230424)), row.names = c(NA, -60L), class = "data.frame"),
  12. round = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
  13. "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
  14. "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
  15. "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11",
  16. "12", "13", "14", "15", "16", "17", "18", "19", "20", "21",
  17. "22", "23", "24", "25", "26", "27", "28", "29", "30"), win = c(1,
  18. 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0,
  19. 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1,
  20. 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0,
  21. 0, 0), sound = c(1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 3,
  22. 5, 6, 7, 8, 8, 1, 4, 8, 6, 7, 8, 5, 4, 8, 8, 8, 3, 4, 5,
  23. 5, 6, 5, 2, 5, 7, 4, 7, 6, 8, 5, 1, 8, 5, 3, 7, 6, 3, 6,
  24. 6, 5, 5, 8, 1, 6, 6, 5)), row.names = c(NA, -60L), class = c("tbl_df",
  25. "tbl", "data.frame"))

"win"列表示赢家(p1=1,p2=0),"sound"列表示该轮赢家的得分。如您所见,有一些连胜局,p1可能连续赢了几轮,然后p2赢了几轮,依此类推。当有连胜局时,我希望将这些局压缩为一行,其中"sound"列取该连胜局中得分的平均值。例如,这个表格:

win sound
1 3
0 4
0 1
1 3
1 2
1 6

将变成:

win sound
1 3
0 2.5
1 3.67

有两个让这变得复杂的因素:1.) 整个数据框非常大,因此我正在寻找自动化的解决方案(可能是一个for循环?);2.) 我需要将程序限制在一组玩家中,以便它不会认为27:31行(例如)是一个大的连胜局(因为第31行实际上是一组新的玩家)。任何帮助都将不胜感激 基于多个条件压缩/合并 R 数据框中的行:

英文:

I have a df containing the outputs of a simple game. Every 30 rows, the df switches to a new set of players. Here is a sample with just the first 2 sets of players:

  1. > dput(stackdf)
  2. structure(list(id = structure(list(`dataset_single$id` = c(20230420,
  3. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  4. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  5. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  6. 20230420, 20230420, 20230420, 20230420, 20230420, 20230420, 20230420,
  7. 20230420, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  8. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  9. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  10. 20230424, 20230424, 20230424, 20230424, 20230424, 20230424, 20230424,
  11. 20230424, 20230424, 20230424)), row.names = c(NA, -60L), class = "data.frame"),
  12. round = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
  13. "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
  14. "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
  15. "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11",
  16. "12", "13", "14", "15", "16", "17", "18", "19", "20", "21",
  17. "22", "23", "24", "25", "26", "27", "28", "29", "30"), win = c(1,
  18. 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0,
  19. 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1,
  20. 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0,
  21. 0, 0), sound = c(1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 3,
  22. 5, 6, 7, 8, 8, 1, 4, 8, 6, 7, 8, 5, 4, 8, 8, 8, 3, 4, 5,
  23. 5, 6, 5, 2, 5, 7, 4, 7, 6, 8, 5, 1, 8, 5, 3, 7, 6, 3, 6,
  24. 6, 5, 5, 8, 1, 6, 6, 5)), row.names = c(NA, -60L), class = c("tbl_df",
  25. "tbl", "data.frame"))

The "win" column indicates the winner (p1=1, p2=0), and the "sound" column indicates the score of the winner for that round. As you can see, there are streaks where p1 might win a few rounds in a row, and then p2 wins several, etc. When there is a streak, I want to compress the streak so that it becomes one row, where the "sound" column takes a mean of the sound scores in that streak. For example, this:

win sound
1 3
0 4
0 1
1 3
1 2
1 6

would become:

win sound
1 3
0 2.5
1 3.67

There are 2 things making this tricky: 1.) the full df is huge, so I am seeking an automated solution (probably a for loop?); and 2.) I need to confine the program to one set of players at a time so that it does not think that rows 27:31 (for example) is one big streak (as row 31 is actually a new set of players).

Any help would be great 基于多个条件压缩/合并 R 数据框中的行:

答案1

得分: 2

代码部分不需要翻译,以下是翻译好的内容:

"It seems the first id column is a nested dataframe inside a column. I have corrected it to a regular column."

"这个似乎是第一个 id 列是嵌套在一个列内的数据框。我已经将其更正为一个常规列。"

dplyr -

"dplyr -"

"Here are two option using dplyr and data.table which follows the same logic. Use data.table option if you need more speed."

"以下是两种使用 dplyrdata.table 的选项,它们遵循相同的逻辑。如果需要更快的速度,可以使用 data.table 选项。"

"dplyr -"

"dplyr -"

"library(dplyr)"

"library(dplyr)"

"stackdf %>%"

"stackdf %>%"

"group_by(id, cons_id = consecutive_id(win)) %>%"

"按(id, cons_id = consecutive_id(win))分组 %>%"

"summarise(win = first(win),"

"summarise(win = first(win),"

"sound = mean(sound),.groups = 'drop')"

"sound = mean(sound),.groups = 'drop')"

"data.table -"

"data.table -"

"library(data.table)"

"library(data.table)"

"setDT(stackdf)"

"setDT(stackdf)"

"stackdf[, .(win = first(win), sound = mean(sound)), .(id, cons_id = rleid(win))]"

"stackdf[, .(win = first(win), sound = mean(sound)), .(id, cons_id = rleid(win))]"

"consecutive_id in dplyr and rleid in data.table creates a unique identifier which increments every time the value changes."

"dplyr 中的 consecutive_iddata.table 中的 rleid 创建了一个唯一标识符,每当值更改时,它都会递增。"

英文:

It seems the first id column is a nested dataframe inside a column. I have corrected it to a regular column.

  1. stackdf$id <- stackdf$id$`dataset_single$id`

Here are two option using dplyr and data.table which follows the same logic. Use data.table option if you need more speed.

dplyr -

  1. library(dplyr)
  2. stackdf %>%
  3. group_by(id, cons_id = consecutive_id(win)) %>%
  4. summarise(win = first(win),
  5. sound = mean(sound),.groups = "drop")

data.table -

  1. library(data.table)
  2. setDT(stackdf)
  3. stackdf[, .(win = first(win), sound = mean(sound)), .(id, cons_id = rleid(win))]

consecutive_id in dplyr and rleid in data.table creates a unique identifier which increments every time the value changes.

答案2

得分: 1

最终,它看起来像一个分组和汇总任务:
注意第一列看起来像一个嵌套列表:

  1. library(tidyr)
  2. library(dplyr) #> 1.1.0
  3. stackdf %>%
  4. unnest(id) %>%
  5. rename(id = `dataset_single$id`) %>%
  6. mutate(x = consecutive_id(win), .by=id) %>%
  7. summarise(mean_sound = mean(sound), .by = c(id, x, win)) %>%
  8. select(-x)
  1. id win mean_sound
  2. 1 20230420 1 1
  3. 2 20230420 0 1
  4. 3 20230420 1 1.5
  5. 4 20230420 0 1.5
  6. 5 20230420 1 2
  7. 6 20230420 0 1.5
  8. 7 20230420 1 2.75
  9. 8 20230420 0 6
  10. 9 20230420 1 7
  11. 10 20230420 0 8
  12. # … with 23 more rows
  13. # ℹ Use `print(n = ...)` to see more rows
英文:

Finally it looks as a group and summarise task:
Note the first column looks like a nested list:

  1. library(tidyr)
  2. library(dplyr) #> 1.1.0
  3. stackdf %>%
  4. unnest(id) %>%
  5. rename(id = `dataset_single$id`) %>%
  6. mutate(x = consecutive_id(win), .by=id) %>%
  7. summarise(mean_sound = mean(sound), .by = c(id, x, win)) %>%
  8. select(-x)
  1. id win mean_sound
  2. <dbl> <dbl> <dbl>
  3. 1 20230420 1 1
  4. 2 20230420 0 1
  5. 3 20230420 1 1.5
  6. 4 20230420 0 1.5
  7. 5 20230420 1 2
  8. 6 20230420 0 1.5
  9. 7 20230420 1 2.75
  10. 8 20230420 0 6
  11. 9 20230420 1 7
  12. 10 20230420 0 8
  13. # … with 23 more rows
  14. # ℹ Use `print(n = ...)` to see more rows

huangapple
  • 本文由 发表于 2023年5月28日 17:17:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76350769.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定