2023年2月8日 15:54:39go评论104阅读模式

英文:

R: How to give the same id number to each person in a data frame based on a different data frame?

问题

我有两个数据集，df 和 df2。这是两个非常庞大且混乱的数据框的过于简化版本。

在原始的 df 中，我通过将它们按腰带（belt）和体重（weight）进行分组为每个人创建了一个唯一的 id。我希望在 df2 中，与 df 中相同的人具有相同的 id 号码。它们需要具有相同的名称，并且应该按腰带和体重进行分组。请注意，df2 中有一些在 df 中没有的人。

简化的 df 外观如下：

     belt     weight rank id        name
1  purple open class    1 55  Tom Cruise
2   black    rooster    2 79 Emma Watson
3    blue    feather    3 63    John Doe
4    blue    feather    4 63    John Doe
5  purple open class    5 55  Tom Cruise
6   brown      heavy    6  3  James Bond
7  purple open class    7 55  Tom Cruise
8  purple      heavy    8 61  Tom Cruise
9   black open class    9 70    Jane Doe
10 purple      heavy   10 61  Tom Cruise

第二个数据框的外观如下。在 df2 中的人，但不在 df 中的人应该为他们的 id 收到 NA。请注意，id 必须按腰带和体重分配，因为有些人根据他们参加的体重分组有不同的积分。

    belt2    weight2 rank2        name points
1  purple open class     1  Tom Cruise    100
2   black    rooster     2 Emma Watson     30
3    blue    feather     3    John Doe     50
4    blue    feather     4    John Doe     50
5  purple open class     5  Tom Cruise    100
6   brown      heavy     6  James Bond    200
7   black    rooster     7 Jon Snow     92
8  purple      heavy     8  Tom Cruise     77
9   black open class     9 Jane Doe     88
10 purple      heavy    10 Tom Cruise     77

这是我希望 df2 外观的方式：

    belt2    weight2 rank2 id           name points
1  purple open class     1 55     Tom Cruise    100
2   black    rooster     2 79    Emma Watson     30
3    blue    feather     3 63       John Doe     50
4    blue    feather     4 63       John Doe     50
5  purple open class     5 55     Tom Cruise    100
6   brown      heavy     6  3     James Bond    200
7   black    rooster     7 NA       Jon Snow     92
8  purple      heavy     8 61     Tom Cruise     77
9   black open class     9 70       Jane Doe     88
10 purple      heavy    10 61     Tom Cruise     77

基本上，我希望 df2 中的 id 号码与 df 中的 id 号码匹配。如果没有匹配，用 NA 填充。

以下是 df 和 df2 的数据：

# df
belt <- c("purple", "black", "blue", "blue", "purple", "brown", "purple", "purple", "black", "purple")
weight <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "open class", "heavy", "open class", "heavy")
rank <- 1:10
id <- c(55, 79, 63, 63, 55, 3, 55, 61, 70, 61)
names <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Tom Cruise", "Tom Cruise", "Jane Doe", "Tom Cruise")
(df <- data.frame(belt, weight, rank, id, name = names))
# df2
belt2 <- c("purple", "black", "blue", "blue", "purple", "brown", "black", "purple", "black", "purple")
weight2 <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "rooster", "heavy", "open class", "heavy")
rank2 <- 1:10
names2 <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Jon Snow", "Tom Cruise", "Jane Doe", "Tom Cruise")
points <- c(100, 30, 50, 50, 100, 200, 92, 77, 88, 77)
(df2 <- data.frame(belt2, weight2, rank2, name = names2, points))

英文:

I have two data sets, df and df2. This is an oversimplified version of two very large and messy data frames.

In the original df, I created a unique id for each person by grouping them by belt and weight. I want the same id numbers each person has in df to be given to the same people in df2. They need to have the same name, and should be grouped by belt and weight. Note that there are people in df2 who are not in df.

The simplified df looks something like below

     belt     weight rank id        name
1  purple open class    1 55  Tom Cruise
2   black    rooster    2 79 Emma Watson
3    blue    feather    3 63    John Doe
4    blue    feather    4 63    John Doe
5  purple open class    5 55  Tom Cruise
6   brown      heavy    6  3  James Bond
7  purple open class    7 55  Tom Cruise
8  purple      heavy    8 61  Tom Cruise
9   black open class    9 70    Jane Doe
10 purple      heavy   10 61  Tom Cruise

The second data frame looks something like this. A person who is in df2, but not in df should receive an NA for their id. Note that the id's must be given by belt and weight because some people have different points depending on which weight division they competed in

    belt2    weight2 rank2        name points
1  purple open class     1  Tom Cruise    100
2   black    rooster     2 Emma Watson     30
3    blue    feather     3    John Doe     50
4    blue    feather     4    John Doe     50
5  purple open class     5  Tom Cruise    100
6   brown      heavy     6  James Bond    200
7   black    rooster     7    Jon Snow     92
8  purple      heavy     8  Tom Cruise     77
9   black open class     9    Jane Doe     88
10 purple      heavy    10  Tom Cruise     77

This is what I would like df2 to look like:

    belt2    weight2 rank2 id           name points
1  purple open class     1 55     Tom Cruise    100
2   black    rooster     2 79    Emma Watson     30
3    blue    feather     3 63       John Doe     50
4    blue    feather     4 63       John Doe     50
5  purple open class     5 55     Tom Cruise    100
6   brown      heavy     6  3     James Bond    200
7   black    rooster     7 NA       Jon Snow     92
8  purple      heavy     8 61     Tom Cruise     77
9   black open class     9 70       Jane Doe     88
10 purple      heavy    10 61     Tom Cruise     77

Basically, I want the id numbers in df2 to match the id numbers in df. If there is not a match, fill with NA.

# df
belt &lt;- c(&quot;purple&quot;, &quot;black&quot;, &quot;blue&quot;, &quot;blue&quot;, &quot;purple&quot;, &quot;brown&quot;, &quot;purple&quot;, &quot;purple&quot;, &quot;black&quot;, &quot;purple&quot;)
weight &lt;- c(&quot;open class&quot;, &quot;rooster&quot;, &quot;feather&quot;, &quot;feather&quot;, &quot;open class&quot;, &quot;heavy&quot;, &quot;open class&quot;, &quot;heavy&quot;, &quot;open class&quot;, &quot;heavy&quot;)
rank &lt;- 1:10
id &lt;- c(55, 79, 63, 63, 55, 3, 55, 61, 70, 61)
names &lt;- c(&quot;Tom Cruise&quot;, &quot;Emma Watson&quot;, &quot;John Doe&quot;, &quot;John Doe&quot;, &quot;Tom Cruise&quot;, &quot;James Bond&quot;, &quot;Tom Cruise&quot;, &quot;Tom Cruise&quot;, &quot;Jane Doe&quot;, &quot;Tom Cruise&quot;)
(df &lt;- data.frame(belt, weight, rank, id, name = names))
#df2
belt2 &lt;- c(&quot;purple&quot;, &quot;black&quot;, &quot;blue&quot;, &quot;blue&quot;, &quot;purple&quot;, &quot;brown&quot;, &quot;black&quot;, &quot;purple&quot;, &quot;black&quot;, &quot;purple&quot;)
weight2 &lt;- c(&quot;open class&quot;, &quot;rooster&quot;, &quot;feather&quot;, &quot;feather&quot;, &quot;open class&quot;, &quot;heavy&quot;, &quot;rooster&quot;, &quot;heavy&quot;, &quot;open class&quot;, &quot;heavy&quot;)
rank2 &lt;- 1:10
names2 &lt;- c(&quot;Tom Cruise&quot;, &quot;Emma Watson&quot;, &quot;John Doe&quot;, &quot;John Doe&quot;, &quot;Tom Cruise&quot;, &quot;James Bond&quot;, &quot;Jon Snow&quot;, &quot;Tom Cruise&quot;, &quot;Jane Doe&quot;, &quot;Tom Cruise&quot;)
points &lt;- c(100, 30, 50, 50, 100, 200, 92, 77, 88, 77)
(df2 &lt;- data.frame(belt2, weight2, rank2, name = names2, points))

答案1

得分: 3

以下是代码部分的翻译：

This can be solved with a right join and removing the duplicates after it. I will use base function merge.

df3 &lt;- merge(
  df, df2, 
  by.x = c(&quot;belt&quot;, &quot;weight&quot;, &quot;rank&quot;, &quot;name&quot;), 
  by.y = c(&quot;belt2&quot;, &quot;weight2&quot;, &quot;rank2&quot;, &quot;name&quot;),
  all.y = TRUE
)
df3 &lt;- df3[!duplicated(df3),]
df3[order(df3$rank),]
#&gt;      belt     weight rank        name id points
#&gt; 9  purple open class    1  Tom Cruise 55    100
#&gt; 2   black    rooster    2 Emma Watson 79     30
#&gt; 4    blue    feather    3    John Doe 63     50
#&gt; 5    blue    feather    4    John Doe 63     50
#&gt; 10 purple open class    5  Tom Cruise 55    100
#&gt; 6   brown      heavy    6  James Bond  3    200
#&gt; 3   black    rooster    7    Jon Snow NA     92
#&gt; 7  purple      heavy    8  Tom Cruise 61     77
#&gt; 1   black open class    9    Jane Doe 70     88
#&gt; 8  purple      heavy   10  Tom Cruise 61     77

A dplyr right join is

suppressPackageStartupMessages({
  library(dplyr)
})
df %&gt;%
right_join(df2, 
             by = c(&quot;belt&quot; = &quot;belt2&quot;, &quot;weight&quot; = &quot;weight2&quot;, &quot;rank&quot; = &quot;rank2&quot;),
             suffix = c(&quot;.x&quot;, &quot;&quot;)) %&gt;%
  select(-name.x) %&gt;%
  arrange(rank)
#&gt;      belt     weight rank id        name points
#&gt; 1  purple open class    1 55  Tom Cruise    100
#&gt; 2   black    rooster    2 79 Emma Watson     30
#&gt; 3    blue    feather    3 63    John Doe     50
#&gt; 4    blue    feather    4 63    John Doe     50
#&gt; 5  purple open class    5 55  Tom Cruise    100
#&gt; 6   brown      heavy    6  3  James Bond    200
#&gt; 7   black    rooster    7 NA    Jon Snow     92
#&gt; 8  purple      heavy    8 61  Tom Cruise     77
#&gt; 9   black open class    9 70    Jane Doe     88
#&gt; 10 purple      heavy   10 61  Tom Cruise     77

希望这些翻译对您有所帮助。

英文:

This can be solved with a right join and removing the duplicates after it. I will use base function merge.

df3 &lt;- merge(
  df, df2, 
  by.x = c(&quot;belt&quot;, &quot;weight&quot;, &quot;rank&quot;, &quot;name&quot;), 
  by.y = c(&quot;belt2&quot;, &quot;weight2&quot;, &quot;rank2&quot;, &quot;name&quot;),
  all.y = TRUE
)
df3 &lt;- df3[!duplicated(df3),]
df3[order(df3$rank),]
#&gt;      belt     weight rank        name id points
#&gt; 9  purple open class    1  Tom Cruise 55    100
#&gt; 2   black    rooster    2 Emma Watson 79     30
#&gt; 4    blue    feather    3    John Doe 63     50
#&gt; 5    blue    feather    4    John Doe 63     50
#&gt; 10 purple open class    5  Tom Cruise 55    100
#&gt; 6   brown      heavy    6  James Bond  3    200
#&gt; 3   black    rooster    7    Jon Snow NA     92
#&gt; 7  purple      heavy    8  Tom Cruise 61     77
#&gt; 1   black open class    9    Jane Doe 70     88
#&gt; 8  purple      heavy   10  Tom Cruise 61     77

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

A dplyr right join is

suppressPackageStartupMessages({
  library(dplyr)
})
df %&gt;%
  right_join(df2, 
             by = c(&quot;belt&quot; = &quot;belt2&quot;, &quot;weight&quot; = &quot;weight2&quot;, &quot;rank&quot; = &quot;rank2&quot;),
             suffix = c(&quot;.x&quot;, &quot;&quot;)) %&gt;%
  select(-name.x) %&gt;%
  arrange(rank)
#&gt;      belt     weight rank id        name points
#&gt; 1  purple open class    1 55  Tom Cruise    100
#&gt; 2   black    rooster    2 79 Emma Watson     30
#&gt; 3    blue    feather    3 63    John Doe     50
#&gt; 4    blue    feather    4 63    John Doe     50
#&gt; 5  purple open class    5 55  Tom Cruise    100
#&gt; 6   brown      heavy    6  3  James Bond    200
#&gt; 7   black    rooster    7 NA    Jon Snow     92
#&gt; 8  purple      heavy    8 61  Tom Cruise     77
#&gt; 9   black open class    9 70    Jane Doe     88
#&gt; 10 purple      heavy   10 61  Tom Cruise     77

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

答案2

得分: 3

你可以通过在这两个数据框之间使用left join来完成此任务。

df2 = df2 %>% left_join(df, by = c("belt2" = "belt", "weight2" = "weight", "name" = "name")) %>% select(belt2, weight2, rank2, name, points, id)

翻译后的结果如下：

df2 = df2 %>% left_join(df, by = c("belt2" = "belt", "weight2" = "weight", "name" = "name")) %>% select(belt2, weight2, rank2, name, points, id)

英文:

You can achieve this task by using a left join between the two dataframes.

df2 = df2 %&gt;% left_join(df, by = c(&quot;belt2&quot; = &quot;belt&quot;, &quot;weight2&quot; = &quot;weight&quot;, &quot;name&quot; = &quot;name&quot;)) %&gt;% select(belt2, weight2, rank2, name, points, id)
    belt2    weight2 rank2        name points id
1  purple open class     1  Tom Cruise    100 55
2  purple open class     1  Tom Cruise    100 55
3  purple open class     1  Tom Cruise    100 55
4   black    rooster     2 Emma Watson     30 79
5    blue    feather     3    John Doe     50 63
6    blue    feather     3    John Doe     50 63
7    blue    feather     4    John Doe     50 63
8    blue    feather     4    John Doe     50 63
9  purple open class     5  Tom Cruise    100 55
10 purple open class     5  Tom Cruise    100 55
11 purple open class     5  Tom Cruise    100 55
12  brown      heavy     6  James Bond    200  3
13  black    rooster     7    Jon Snow     92 NA
14 purple      heavy     8  Tom Cruise     77 61
15 purple      heavy     8  Tom Cruise     77 61
16  black open class     9    Jane Doe     88 70
17 purple      heavy    10  Tom Cruise     77 61
18 purple      heavy    10  Tom Cruise     77 61

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在基于不同数据框的情况下，为每个人分配相同的ID编号？

问题

答案1

答案2

TypeError: 字符串索引必须是整数，应该怎么办？

如何在将数据框转换为矩阵时取消列出？

数据框架：根据条件替换值及其周围的值。

将一个DataFrame的多层索引中的一部分作为列应用，是否可能？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。