purrr::map函数为什么不能正确地将一个函数映射到拆分数据框的每个部分?

huangapple go评论69阅读模式
英文:

Why is the purrr::map function not correctly mapping a function to each piece of a split dataframe?

问题

以下是您提供的代码的翻译部分:

我有以下的数据框,我们可以称之为df_all

我有以下的数据框,我们可以称之为df_alt

我有以下的函数,它查找df_all和df_alt之间的共同/交集的Points值。

我试图使用以下的map语法应用int_value函数。

这是返回的输出,这不是期望的输出。

这是期望的输出和我预期返回的内容。

map函数似乎没有遵循基于Book列的隐含分组。我漏掉了什么?

请注意,我已经忽略了代码的部分,只提供了翻译的内容。如果您需要进一步的帮助或解释,请随时告诉我。

英文:

I have the following dataframe that we can call df_all

structure(list(ID = c("1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385"
), Book = c("Bovada", "Bovada", "LowVig.ag", "LowVig.ag"), Home = c("Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide"
), Away = c("San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs"), Team = c("Alabama Crimson Tide", "San Diego St Aztecs", 
"Alabama Crimson Tide", "San Diego St Aztecs"), Price = c(-110, 
-110, -111, -101), Points = c(-7.5, 7.5, -7, 7)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

and I have the following dataframe that we can call df_alt

structure(list(ID = c("1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385", 
"1738c0c7214e7fced61c1caa479a5385", "1738c0c7214e7fced61c1caa479a5385"
), Book = c("Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", 
"Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", 
"Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", "Pinnacle", 
"Pinnacle"), Home = c("Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide"), Away = c("San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs"), Team = c("Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"Alabama Crimson Tide", "Alabama Crimson Tide", "Alabama Crimson Tide", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "San Diego St Aztecs", 
"San Diego St Aztecs", "San Diego St Aztecs", "Alabama Crimson Tide", 
"San Diego St Aztecs"), Price = c(-149, -138, -126, -115, 105, 
114, 122, 132, 128, 119, 110, 102, -119, -131, -142, -154, -104, 
-108), Points = c(-5.5, -6, -6.5, -7, -8, -8.5, -9, -9.5, 5.5, 
6, 6.5, 7, 8, 8.5, 9, 9.5, -7.5, 7.5)), row.names = c(NA, -18L
), class = c("tbl_df", "tbl", "data.frame"))

I have the following function which looks for common/intersecting Points values between df_all and df_alt.

int_value <- function(df){
    
    df %>% 
            dplyr::select(c(ID, Team, Points)) %>%  
            dplyr::intersect(df_alt %>% dplyr::select(c(ID, Team,Points))) %>% 
            mutate(Book = 'Pinnacle')
    
    df %>% full_join(df_int)%>% left_join(df_alt %>% rename(price=Price)) %>% 
            mutate(Price=ifelse(is.na(price),Price,price))%>% 
            select(-price)
}

I am trying to apply int_value using the following map syntax.

df_all %>% 
    group_split(ID, Book) %>% 
    map(int_value)

This is the output that is returned which is not the desired output.

[[1]]
# A tibble: 8 × 7
ID                               Book      Home                 Away                Team                 
Price Points
<chr>                            <chr>     <chr>                <chr>               <chr>                
<dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 Bovada    Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -110   -7.5
2 1738c0c7214e7fced61c1caa479a5385 Bovada    Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -110    7.5
3 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -111   -7  
4 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -101    7  
5 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -104   -7.5
6 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -108    7.5
7 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -115   -7  
8 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs    102    7  

[[2]]
# A tibble: 8 × 7
ID                               Book      Home                 Away                Team                 
Price Points
<chr>                            <chr>     <chr>                <chr>               <chr>                
<dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 Bovada    Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -110   -7.5
2 1738c0c7214e7fced61c1caa479a5385 Bovada    Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -110    7.5
3 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -111   -7  
4 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -101    7  
5 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -104   -7.5
6 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -108    7.5
7 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -115   -7  
8 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs    102    7  

This is the desired output and what I expected to be returned.

[[1]]
# A tibble: 6 × 7
ID                               Book     Home                 Away                Team                 
Price Points
<chr>                            <chr>    <chr>                <chr>               <chr>                
<dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -110   -7.5
2 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St Aztecs San Diego 
St Aztecs   -110    7.5
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -104   -7.5
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St Aztecs San Diego 
St Aztecs   -108    7.5


[[2]]
# A tibble: 6 × 7
ID                               Book      Home                 Away                Team                 
Price Points
<chr>                            <chr>     <chr>                <chr>               <chr>                
<dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -111   -7  
2 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs   -101    7  
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs Alabama 
Crimson Tide  -115   -7  
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St Aztecs San 
Diego St Aztecs    102    7 

The map function doesn't appear to be honoring the implied group_by based on the Book column. What am I missing?

答案1

得分: 1

以下是翻译的部分:

解决方案如@stefan建议的那样。 在定义df_int并分配必要变量之后,输出是准确的。 这是更新后的函数

int_value <- function(df){
    
    df_int <- df %>% 
            dplyr::select(c(ID, Home, Away, Team, Points)) %>%  
            dplyr::intersect(df_alt %>% dplyr::select(c(ID, Home, Away, Team, 
Points))) %>% 
            mutate(Book = 'Pinnacle')
    
    df_join <- df %>% full_join(df_int)
    
    df_final <- df_join %>% left_join(df_alt %>% rename(price=Price)) %>% 
            mutate(Price=ifelse(is.na(price),Price,price))%>% 
            select(-price)
    
}

这是更新后的输出

[[1]]
# A tibble: 4 × 7
ID                               Book     Home                 Away                
Team                 Price Points
<chr>                            <chr>    <chr>                <chr>               
<chr>                <dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -110   -7.5
2 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -110    7.5
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -104   -7.5
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -108    7.5

[[2]]
# A tibble: 4 × 7
ID                               Book      Home                 Away                
Team                 Price Points
<chr>                            <chr>     <chr>                <chr>               
<chr>                <dbl>  <dbl>
1 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -111     -7
2 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -101      7
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -115     -7
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs    102      7
英文:

The solution was as @stefan had recommended. After defining df_int and assigning the necessary variables the output is accurate. Here is the updated function

int_value &lt;- function(df){
    
    df_int &lt;- df %&gt;% 
            dplyr::select(c(ID, Home, Away, Team, Points)) %&gt;%  
            dplyr::intersect(df_alt %&gt;% dplyr::select(c(ID, Home, Away, Team, 
Points))) %&gt;% 
            mutate(Book = &#39;Pinnacle&#39;)
    
    df_join &lt;- df %&gt;% full_join(df_int)
    
    df_final &lt;- df_join %&gt;% left_join(df_alt %&gt;% rename(price=Price)) %&gt;% 
            mutate(Price=ifelse(is.na(price),Price,price))%&gt;% 
            select(-price)
    
}

And here is the updated output

[[1]]
# A tibble: 4 &#215; 7
ID                               Book     Home                 Away                
Team                 Price Points
&lt;chr&gt;                            &lt;chr&gt;    &lt;chr&gt;                &lt;chr&gt;               
&lt;chr&gt;                &lt;dbl&gt;  &lt;dbl&gt;
1 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -110   -7.5
2 1738c0c7214e7fced61c1caa479a5385 Bovada   Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -110    7.5
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -104   -7.5
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -108    7.5

[[2]]
# A tibble: 4 &#215; 7
ID                               Book      Home                 Away                
Team                 Price Points
&lt;chr&gt;                            &lt;chr&gt;     &lt;chr&gt;                &lt;chr&gt;               
&lt;chr&gt;                &lt;dbl&gt;  &lt;dbl&gt;
1 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -111     -7
2 1738c0c7214e7fced61c1caa479a5385 LowVig.ag Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs   -101      7
3 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St 
Aztecs Alabama Crimson Tide  -115     -7
4 1738c0c7214e7fced61c1caa479a5385 Pinnacle  Alabama Crimson Tide San Diego St 
Aztecs San Diego St Aztecs    102      7

huangapple
  • 本文由 发表于 2023年3月23日 11:32:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75819037.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定