在R中,通过列表对数据框中的列进行分组,并进行变换。

huangapple go评论67阅读模式
英文:

Group columns in dataframe by list and mutate in R

问题

我有一个包含二进制列的大型数据框。以下是列名的列表:

[1] "imagetag_logos_position_Apple_BOTTOM_CENTER"         "imagetag_logos_position_Apple_BOTTOM_LEFT"           "imagetag_logos_position_Apple_BOTTOM_RIGHT"          "imagetag_logos_position_Apple_CENTER"                "imagetag_logos_position_Apple_CENTER_LEFT"          
[6] "imagetag_logos_position_Apple_CENTER_RIGHT"          "imagetag_logos_position_Apple_TOP_CENTER"            "imagetag_logos_position_Apple_TOP_LEFT"              "imagetag_logos_position_Apple_TOP_RIGHT"             "imagetag_logos_position_Banana_BOTTOM_CENTER"       
[11] "imagetag_logos_position_Banana_BOTTOM_LEFT"          "imagetag_logos_position_Banana_BOTTOM_RIGHT"         "imagetag_logos_position_Banana_CENTER_LEFT"          "imagetag_logos_position_Banana_CENTER_RIGHT"         "imagetag_logos_position_Banana_TOP_RIGHT"           
[16] "imagetag_logos_position_Pear_BOTTOM_CENTER"      "imagetag_logos_position_Pear_BOTTOM_LEFT"        "imagetag_logos_position_Pear_BOTTOM_RIGHT"       "imagetag_logos_position_Pear_CENTER"             "imagetag_logos_position_Pear_CENTER_LEFT"       
[21] "imagetag_logos_position_Pear_CENTER_RIGHT"       "imagetag_logos_position_Pear_TOP_RIGHT"          "imagetag_logos_position_Kiwi_BOTTOM_CENTER"    "imagetag_logos_position_Kiwi_BOTTOM_LEFT"      "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"    
[26] "imagetag_logos_position_Kiwi_CENTER_LEFT"      "imagetag_logos_position_Kiwi_CENTER_RIGHT"     "imagetag_logos_position_Kiwi_TOP_LEFT"         "Product_position_Product_0"                         "Product_position_Product_BOTTOM_CENTER"            
[31] "Product_position_Product_BOTTOM_LEFT"               "Product_position_Product_BOTTOM_RIGHT"              "Product_position_Product_CENTER"                    "Product_position_Product_CENTER_LEFT"               "Product_position_Product_CENTER_RIGHT"             
[36] "Product_position_Product_TOP_CENTER"                "Product_position_Product_TOP_LEFT"                  "Product_position_Product_TOP_RIGHT"                 "Person_position_Person_0"                           "Person_position_Person_BOTTOM_CENTER"              
[41] "Person_position_Person_BOTTOM_LEFT"                 "Person_position_Person_BOTTOM_RIGHT"                "Person_position_Person_CENTER"                      "Person_position_Person_CENTER_LEFT"                 "Person_position_Person_CENTER_RIGHT"               
[46] "Person_position_Person_TOP_CENTER"                  "Person_position_Person_TOP_LEFT"                    "Person_position_Person_TOP_RIGHT"                   "Logo_position_Logo_0"                               "Logo_position_Logo_BOTTOM_CENTER"                  
[51] "Logo_position_Logo_BOTTOM_LEFT"                     "Logo_position_Logo_BOTTOM_RIGHT"                    "Logo_position_Logo_CENTER"                          "Logo_position_Logo_CENTER_LEFT"                     "Logo_position_Logo_CENTER_RIGHT"                    
[56] "Logo_position_Logo_TOP_CENTER"                      "Logo_position_Logo_TOP_LEFT"                        "Logo_position_Logo_TOP_RIGHT"                       "CTA_ShopNow_position_Shop Now_0"                    "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"       
[61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT"          "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT"         "CTA_ShopNow_position_Shop Now_CENTER"               "CTA_ShopNow_position_Shop Now_CENTER_LEFT"          "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"        
[66] "CTA_ShopNow_position_Shop Now_TOP_CENTER"           "CTA_ShopNow_position_Shop Now_TOP_RIGHT"            "CTA_JoinNow_position_Join Now_0"                    "CTA_JoinNow_position_Join Now_BOTTOM_CENTER"        "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"         
[71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT"         "CTA_JoinNow_position_Join Now_CENTER"               "CTA_JoinNow_position_Join Now_CENTER_RIGHT"         "CTA_JoinNow_position_Join Now_TOP_CENTER"           "CTA_JoinNow_position_Join Now_TOP_RIGHT"           
[76] "CTA_position_CTA_0"                                 "CTA_position_CTA_BOTTOM_CENTER"                     "CTA_position_CTA_BOTTOM_LEFT"                       "CTA_position_CTA_BOTTOM_RIGHT"                      "CTA_position_CTA_CENTER"                           
[81] "CTA_position_CTA_CENTER_LEFT"                       "CTA_position_CTA_CENTER_RIGHT"                      "CTA_position_CTA_TOP_CENTER"                        "CTA_position_CTA_TOP_LEFT"                          "CTA_position_CTA_TOP_RIGHT"                        
[86] "Text_position_text_BOTTOM_CENTER"                   "Text_position_text_BOTTOM_LEFT"                     "Text_position_text_BOTTOM_RIGHT"                    "Text_position_text_CENTER"                          "Text_position_text_CENTER_LEFT"                    
[91] "Text_position_text_CENTER_RIGHT"                    "Text_position_text_TOP_CENTER"                      "Text_position_text_TOP_LEFT"                        "Text_position_text_TOP_RIGHT"                       "Product_position_Product_0_LF"                     
[96] "Product_position_Product_BOTTOM_CENTER_LF"          "Product_position_Product_BOTTOM_LEFT_LF"            "Product_position_Product_BOTTOM_RIGHT_LF"           "Product_position_Product_CENTER_LF"                 "Product_position_Product_CENTER_LEFT_LF"           
[101] "Product_position_Product_CENTER_RIGHT_LF"           "Product_position_Product_TOP_CENTER_LF"             "Product_position_Product_TOP_LEFT_LF"               "Product_position_Product_TOP_RIGHT_LF"              "Logo_position_Logo_0_LF"

我想对其中一些列进行分组,例如对包含 "BOTTOM_CENTER"、"BOTTOM_RIGHT"、"BOTTOM_LEFT" 的列进行求和。但是,我需要在每个匹配的前缀内进行分组,例如,仅对 imagetag_logos_position_Apple 进行求和,对 imagetag_logos_position_Banana 进行单独求和。

我已经尝试过不同的方法来使数据框按列表中的字符串进行分组,以便执行列的相加操作,但是似乎无法弄清楚如何进行此操作。%in% 操作符不支持部分匹配,所以我不确定要使用哪个其他函数。

谢谢!

for(i in prefix_list1){
  sapply(positionsdf, function(x) i %in% x)
}
英文:

I have a large dataframe containing binary columns. Here is a list of the column names:

[1] "imagetag_logos_position_Apple_BOTTOM_CENTER"         "imagetag_logos_position_Apple_BOTTOM_LEFT"           "imagetag_logos_position_Apple_BOTTOM_RIGHT"          "imagetag_logos_position_Apple_CENTER"                "imagetag_logos_position_Apple_CENTER_LEFT"          
[6] "imagetag_logos_position_Apple_CENTER_RIGHT"          "imagetag_logos_position_Apple_TOP_CENTER"            "imagetag_logos_position_Apple_TOP_LEFT"              "imagetag_logos_position_Apple_TOP_RIGHT"             "imagetag_logos_position_Banana_BOTTOM_CENTER"       
[11] "imagetag_logos_position_Banana_BOTTOM_LEFT"          "imagetag_logos_position_Banana_BOTTOM_RIGHT"         "imagetag_logos_position_Banana_CENTER_LEFT"          "imagetag_logos_position_Banana_CENTER_RIGHT"         "imagetag_logos_position_Banana_TOP_RIGHT"           
[16] "imagetag_logos_position_Pear_BOTTOM_CENTER"      "imagetag_logos_position_Pear_BOTTOM_LEFT"        "imagetag_logos_position_Pear_BOTTOM_RIGHT"       "imagetag_logos_position_Pear_CENTER"             "imagetag_logos_position_Pear_CENTER_LEFT"       
[21] "imagetag_logos_position_Pear_CENTER_RIGHT"       "imagetag_logos_position_Pear_TOP_RIGHT"          "imagetag_logos_position_Kiwi_BOTTOM_CENTER"    "imagetag_logos_position_Kiwi_BOTTOM_LEFT"      "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"    
[26] "imagetag_logos_position_Kiwi_CENTER_LEFT"      "imagetag_logos_position_Kiwi_CENTER_RIGHT"     "imagetag_logos_position_Kiwi_TOP_LEFT"         "Product_position_Product_0"                         "Product_position_Product_BOTTOM_CENTER"            
[31] "Product_position_Product_BOTTOM_LEFT"               "Product_position_Product_BOTTOM_RIGHT"              "Product_position_Product_CENTER"                    "Product_position_Product_CENTER_LEFT"               "Product_position_Product_CENTER_RIGHT"             
[36] "Product_position_Product_TOP_CENTER"                "Product_position_Product_TOP_LEFT"                  "Product_position_Product_TOP_RIGHT"                 "Person_position_Person_0"                           "Person_position_Person_BOTTOM_CENTER"              
[41] "Person_position_Person_BOTTOM_LEFT"                 "Person_position_Person_BOTTOM_RIGHT"                "Person_position_Person_CENTER"                      "Person_position_Person_CENTER_LEFT"                 "Person_position_Person_CENTER_RIGHT"               
[46] "Person_position_Person_TOP_CENTER"                  "Person_position_Person_TOP_LEFT"                    "Person_position_Person_TOP_RIGHT"                   "Logo_position_Logo_0"                               "Logo_position_Logo_BOTTOM_CENTER"                  
[51] "Logo_position_Logo_BOTTOM_LEFT"                     "Logo_position_Logo_BOTTOM_RIGHT"                    "Logo_position_Logo_CENTER"                          "Logo_position_Logo_CENTER_LEFT"                     "Logo_position_Logo_CENTER_RIGHT"                   
[56] "Logo_position_Logo_TOP_CENTER"                      "Logo_position_Logo_TOP_LEFT"                        "Logo_position_Logo_TOP_RIGHT"                       "CTA_ShopNow_position_Shop Now_0"                    "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"       
[61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT"          "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT"         "CTA_ShopNow_position_Shop Now_CENTER"               "CTA_ShopNow_position_Shop Now_CENTER_LEFT"          "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"        
[66] "CTA_ShopNow_position_Shop Now_TOP_CENTER"           "CTA_ShopNow_position_Shop Now_TOP_RIGHT"            "CTA_JoinNow_position_Join Now_0"                    "CTA_JoinNow_position_Join Now_BOTTOM_CENTER"        "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"         
[71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT"         "CTA_JoinNow_position_Join Now_CENTER"               "CTA_JoinNow_position_Join Now_CENTER_RIGHT"         "CTA_JoinNow_position_Join Now_TOP_CENTER"           "CTA_JoinNow_position_Join Now_TOP_RIGHT"           
[76] "CTA_position_CTA_0"                                 "CTA_position_CTA_BOTTOM_CENTER"                     "CTA_position_CTA_BOTTOM_LEFT"                       "CTA_position_CTA_BOTTOM_RIGHT"                      "CTA_position_CTA_CENTER"                           
[81] "CTA_position_CTA_CENTER_LEFT"                       "CTA_position_CTA_CENTER_RIGHT"                      "CTA_position_CTA_TOP_CENTER"                        "CTA_position_CTA_TOP_LEFT"                          "CTA_position_CTA_TOP_RIGHT"                        
[86] "Text_position_text_BOTTOM_CENTER"                   "Text_position_text_BOTTOM_LEFT"                     "Text_position_text_BOTTOM_RIGHT"                    "Text_position_text_CENTER"                          "Text_position_text_CENTER_LEFT"                    
[91] "Text_position_text_CENTER_RIGHT"                    "Text_position_text_TOP_CENTER"                      "Text_position_text_TOP_LEFT"                        "Text_position_text_TOP_RIGHT"                       "Product_position_Product_0_LF"                     
[96] "Product_position_Product_BOTTOM_CENTER_LF"          "Product_position_Product_BOTTOM_LEFT_LF"            "Product_position_Product_BOTTOM_RIGHT_LF"           "Product_position_Product_CENTER_LF"                 "Product_position_Product_CENTER_LEFT_LF"           
[101] "Product_position_Product_CENTER_RIGHT_LF"           "Product_position_Product_TOP_CENTER_LF"             "Product_position_Product_TOP_LEFT_LF"               "Product_position_Product_TOP_RIGHT_LF"              "Logo_position_Logo_0_LF"                           

I want to group some of these columns, for example sum the columns that contain "BOTTOM_CENTER", "BOTTOM_RIGHT", "BOTTOM_LEFT". However I need to group them within each prefix that matches, for example, only sum for imagetag_logos_position_Apple, and a separate sum for imagetag_logos_position_Banana.

I did this to create a list of the unique prefixes:

prefix_list <- str_extract(colnames(positionsdf),".+?(?=([A-Z])([A-Z])([A-Z]))")
prefix_list1 <- unique(prefix_list)
> prefix_list1
[1] "imagetag_logos_position_Apple_"      "imagetag_logos_position_Banana_"     "imagetag_logos_position_Kiwi_"   "imagetag_logos_position_Pear_" NA                                   "Product_position_Product_"          "Person_position_Person_"           
[8] "Logo_position_Logo_"                "CTA_ShopNow_position_Shop Now_"     "CTA_JoinNow_position_Join Now_"     "CTA_position_"                      "Text_position_text_"                "CTA_LearnMore_position_Learn More_" "Person_position_"  

I have tried different ways to get the dataframe to group by the string in the list so that I can perform the addition of columns but can not seem to figure out how to go about this. %in% will not support partial match so I am not sure what other function to use
Thanks!

for(i in prefix_list1){
sapply(positionsdf, function(x) i %in% x)
}

答案1

得分: 0

可能执行以下操作:

sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})


<details>
<summary>英文:</summary>
We may do

sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})


</details>

huangapple
  • 本文由 发表于 2023年2月10日 04:49:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404292.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定