在R中,通过列表对数据框中的列进行分组,并进行变换。

huangapple go评论106阅读模式
英文:

Group columns in dataframe by list and mutate in R

问题

我有一个包含二进制列的大型数据框。以下是列名的列表:

  1. [1] "imagetag_logos_position_Apple_BOTTOM_CENTER" "imagetag_logos_position_Apple_BOTTOM_LEFT" "imagetag_logos_position_Apple_BOTTOM_RIGHT" "imagetag_logos_position_Apple_CENTER" "imagetag_logos_position_Apple_CENTER_LEFT"
  2. [6] "imagetag_logos_position_Apple_CENTER_RIGHT" "imagetag_logos_position_Apple_TOP_CENTER" "imagetag_logos_position_Apple_TOP_LEFT" "imagetag_logos_position_Apple_TOP_RIGHT" "imagetag_logos_position_Banana_BOTTOM_CENTER"
  3. [11] "imagetag_logos_position_Banana_BOTTOM_LEFT" "imagetag_logos_position_Banana_BOTTOM_RIGHT" "imagetag_logos_position_Banana_CENTER_LEFT" "imagetag_logos_position_Banana_CENTER_RIGHT" "imagetag_logos_position_Banana_TOP_RIGHT"
  4. [16] "imagetag_logos_position_Pear_BOTTOM_CENTER" "imagetag_logos_position_Pear_BOTTOM_LEFT" "imagetag_logos_position_Pear_BOTTOM_RIGHT" "imagetag_logos_position_Pear_CENTER" "imagetag_logos_position_Pear_CENTER_LEFT"
  5. [21] "imagetag_logos_position_Pear_CENTER_RIGHT" "imagetag_logos_position_Pear_TOP_RIGHT" "imagetag_logos_position_Kiwi_BOTTOM_CENTER" "imagetag_logos_position_Kiwi_BOTTOM_LEFT" "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"
  6. [26] "imagetag_logos_position_Kiwi_CENTER_LEFT" "imagetag_logos_position_Kiwi_CENTER_RIGHT" "imagetag_logos_position_Kiwi_TOP_LEFT" "Product_position_Product_0" "Product_position_Product_BOTTOM_CENTER"
  7. [31] "Product_position_Product_BOTTOM_LEFT" "Product_position_Product_BOTTOM_RIGHT" "Product_position_Product_CENTER" "Product_position_Product_CENTER_LEFT" "Product_position_Product_CENTER_RIGHT"
  8. [36] "Product_position_Product_TOP_CENTER" "Product_position_Product_TOP_LEFT" "Product_position_Product_TOP_RIGHT" "Person_position_Person_0" "Person_position_Person_BOTTOM_CENTER"
  9. [41] "Person_position_Person_BOTTOM_LEFT" "Person_position_Person_BOTTOM_RIGHT" "Person_position_Person_CENTER" "Person_position_Person_CENTER_LEFT" "Person_position_Person_CENTER_RIGHT"
  10. [46] "Person_position_Person_TOP_CENTER" "Person_position_Person_TOP_LEFT" "Person_position_Person_TOP_RIGHT" "Logo_position_Logo_0" "Logo_position_Logo_BOTTOM_CENTER"
  11. [51] "Logo_position_Logo_BOTTOM_LEFT" "Logo_position_Logo_BOTTOM_RIGHT" "Logo_position_Logo_CENTER" "Logo_position_Logo_CENTER_LEFT" "Logo_position_Logo_CENTER_RIGHT"
  12. [56] "Logo_position_Logo_TOP_CENTER" "Logo_position_Logo_TOP_LEFT" "Logo_position_Logo_TOP_RIGHT" "CTA_ShopNow_position_Shop Now_0" "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"
  13. [61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT" "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT" "CTA_ShopNow_position_Shop Now_CENTER" "CTA_ShopNow_position_Shop Now_CENTER_LEFT" "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"
  14. [66] "CTA_ShopNow_position_Shop Now_TOP_CENTER" "CTA_ShopNow_position_Shop Now_TOP_RIGHT" "CTA_JoinNow_position_Join Now_0" "CTA_JoinNow_position_Join Now_BOTTOM_CENTER" "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"
  15. [71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT" "CTA_JoinNow_position_Join Now_CENTER" "CTA_JoinNow_position_Join Now_CENTER_RIGHT" "CTA_JoinNow_position_Join Now_TOP_CENTER" "CTA_JoinNow_position_Join Now_TOP_RIGHT"
  16. [76] "CTA_position_CTA_0" "CTA_position_CTA_BOTTOM_CENTER" "CTA_position_CTA_BOTTOM_LEFT" "CTA_position_CTA_BOTTOM_RIGHT" "CTA_position_CTA_CENTER"
  17. [81] "CTA_position_CTA_CENTER_LEFT" "CTA_position_CTA_CENTER_RIGHT" "CTA_position_CTA_TOP_CENTER" "CTA_position_CTA_TOP_LEFT" "CTA_position_CTA_TOP_RIGHT"
  18. [86] "Text_position_text_BOTTOM_CENTER" "Text_position_text_BOTTOM_LEFT" "Text_position_text_BOTTOM_RIGHT" "Text_position_text_CENTER" "Text_position_text_CENTER_LEFT"
  19. [91] "Text_position_text_CENTER_RIGHT" "Text_position_text_TOP_CENTER" "Text_position_text_TOP_LEFT" "Text_position_text_TOP_RIGHT" "Product_position_Product_0_LF"
  20. [96] "Product_position_Product_BOTTOM_CENTER_LF" "Product_position_Product_BOTTOM_LEFT_LF" "Product_position_Product_BOTTOM_RIGHT_LF" "Product_position_Product_CENTER_LF" "Product_position_Product_CENTER_LEFT_LF"
  21. [101] "Product_position_Product_CENTER_RIGHT_LF" "Product_position_Product_TOP_CENTER_LF" "Product_position_Product_TOP_LEFT_LF" "Product_position_Product_TOP_RIGHT_LF" "Logo_position_Logo_0_LF"

我想对其中一些列进行分组,例如对包含 "BOTTOM_CENTER"、"BOTTOM_RIGHT"、"BOTTOM_LEFT" 的列进行求和。但是,我需要在每个匹配的前缀内进行分组,例如,仅对 imagetag_logos_position_Apple 进行求和,对 imagetag_logos_position_Banana 进行单独求和。

我已经尝试过不同的方法来使数据框按列表中的字符串进行分组,以便执行列的相加操作,但是似乎无法弄清楚如何进行此操作。%in% 操作符不支持部分匹配,所以我不确定要使用哪个其他函数。

谢谢!

  1. for(i in prefix_list1){
  2. sapply(positionsdf, function(x) i %in% x)
  3. }
英文:

I have a large dataframe containing binary columns. Here is a list of the column names:

  1. [1] "imagetag_logos_position_Apple_BOTTOM_CENTER" "imagetag_logos_position_Apple_BOTTOM_LEFT" "imagetag_logos_position_Apple_BOTTOM_RIGHT" "imagetag_logos_position_Apple_CENTER" "imagetag_logos_position_Apple_CENTER_LEFT"
  2. [6] "imagetag_logos_position_Apple_CENTER_RIGHT" "imagetag_logos_position_Apple_TOP_CENTER" "imagetag_logos_position_Apple_TOP_LEFT" "imagetag_logos_position_Apple_TOP_RIGHT" "imagetag_logos_position_Banana_BOTTOM_CENTER"
  3. [11] "imagetag_logos_position_Banana_BOTTOM_LEFT" "imagetag_logos_position_Banana_BOTTOM_RIGHT" "imagetag_logos_position_Banana_CENTER_LEFT" "imagetag_logos_position_Banana_CENTER_RIGHT" "imagetag_logos_position_Banana_TOP_RIGHT"
  4. [16] "imagetag_logos_position_Pear_BOTTOM_CENTER" "imagetag_logos_position_Pear_BOTTOM_LEFT" "imagetag_logos_position_Pear_BOTTOM_RIGHT" "imagetag_logos_position_Pear_CENTER" "imagetag_logos_position_Pear_CENTER_LEFT"
  5. [21] "imagetag_logos_position_Pear_CENTER_RIGHT" "imagetag_logos_position_Pear_TOP_RIGHT" "imagetag_logos_position_Kiwi_BOTTOM_CENTER" "imagetag_logos_position_Kiwi_BOTTOM_LEFT" "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"
  6. [26] "imagetag_logos_position_Kiwi_CENTER_LEFT" "imagetag_logos_position_Kiwi_CENTER_RIGHT" "imagetag_logos_position_Kiwi_TOP_LEFT" "Product_position_Product_0" "Product_position_Product_BOTTOM_CENTER"
  7. [31] "Product_position_Product_BOTTOM_LEFT" "Product_position_Product_BOTTOM_RIGHT" "Product_position_Product_CENTER" "Product_position_Product_CENTER_LEFT" "Product_position_Product_CENTER_RIGHT"
  8. [36] "Product_position_Product_TOP_CENTER" "Product_position_Product_TOP_LEFT" "Product_position_Product_TOP_RIGHT" "Person_position_Person_0" "Person_position_Person_BOTTOM_CENTER"
  9. [41] "Person_position_Person_BOTTOM_LEFT" "Person_position_Person_BOTTOM_RIGHT" "Person_position_Person_CENTER" "Person_position_Person_CENTER_LEFT" "Person_position_Person_CENTER_RIGHT"
  10. [46] "Person_position_Person_TOP_CENTER" "Person_position_Person_TOP_LEFT" "Person_position_Person_TOP_RIGHT" "Logo_position_Logo_0" "Logo_position_Logo_BOTTOM_CENTER"
  11. [51] "Logo_position_Logo_BOTTOM_LEFT" "Logo_position_Logo_BOTTOM_RIGHT" "Logo_position_Logo_CENTER" "Logo_position_Logo_CENTER_LEFT" "Logo_position_Logo_CENTER_RIGHT"
  12. [56] "Logo_position_Logo_TOP_CENTER" "Logo_position_Logo_TOP_LEFT" "Logo_position_Logo_TOP_RIGHT" "CTA_ShopNow_position_Shop Now_0" "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"
  13. [61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT" "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT" "CTA_ShopNow_position_Shop Now_CENTER" "CTA_ShopNow_position_Shop Now_CENTER_LEFT" "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"
  14. [66] "CTA_ShopNow_position_Shop Now_TOP_CENTER" "CTA_ShopNow_position_Shop Now_TOP_RIGHT" "CTA_JoinNow_position_Join Now_0" "CTA_JoinNow_position_Join Now_BOTTOM_CENTER" "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"
  15. [71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT" "CTA_JoinNow_position_Join Now_CENTER" "CTA_JoinNow_position_Join Now_CENTER_RIGHT" "CTA_JoinNow_position_Join Now_TOP_CENTER" "CTA_JoinNow_position_Join Now_TOP_RIGHT"
  16. [76] "CTA_position_CTA_0" "CTA_position_CTA_BOTTOM_CENTER" "CTA_position_CTA_BOTTOM_LEFT" "CTA_position_CTA_BOTTOM_RIGHT" "CTA_position_CTA_CENTER"
  17. [81] "CTA_position_CTA_CENTER_LEFT" "CTA_position_CTA_CENTER_RIGHT" "CTA_position_CTA_TOP_CENTER" "CTA_position_CTA_TOP_LEFT" "CTA_position_CTA_TOP_RIGHT"
  18. [86] "Text_position_text_BOTTOM_CENTER" "Text_position_text_BOTTOM_LEFT" "Text_position_text_BOTTOM_RIGHT" "Text_position_text_CENTER" "Text_position_text_CENTER_LEFT"
  19. [91] "Text_position_text_CENTER_RIGHT" "Text_position_text_TOP_CENTER" "Text_position_text_TOP_LEFT" "Text_position_text_TOP_RIGHT" "Product_position_Product_0_LF"
  20. [96] "Product_position_Product_BOTTOM_CENTER_LF" "Product_position_Product_BOTTOM_LEFT_LF" "Product_position_Product_BOTTOM_RIGHT_LF" "Product_position_Product_CENTER_LF" "Product_position_Product_CENTER_LEFT_LF"
  21. [101] "Product_position_Product_CENTER_RIGHT_LF" "Product_position_Product_TOP_CENTER_LF" "Product_position_Product_TOP_LEFT_LF" "Product_position_Product_TOP_RIGHT_LF" "Logo_position_Logo_0_LF"

I want to group some of these columns, for example sum the columns that contain "BOTTOM_CENTER", "BOTTOM_RIGHT", "BOTTOM_LEFT". However I need to group them within each prefix that matches, for example, only sum for imagetag_logos_position_Apple, and a separate sum for imagetag_logos_position_Banana.

I did this to create a list of the unique prefixes:

  1. prefix_list <- str_extract(colnames(positionsdf),".+?(?=([A-Z])([A-Z])([A-Z]))")
  2. prefix_list1 <- unique(prefix_list)
  1. > prefix_list1
  2. [1] "imagetag_logos_position_Apple_" "imagetag_logos_position_Banana_" "imagetag_logos_position_Kiwi_" "imagetag_logos_position_Pear_" NA "Product_position_Product_" "Person_position_Person_"
  3. [8] "Logo_position_Logo_" "CTA_ShopNow_position_Shop Now_" "CTA_JoinNow_position_Join Now_" "CTA_position_" "Text_position_text_" "CTA_LearnMore_position_Learn More_" "Person_position_"

I have tried different ways to get the dataframe to group by the string in the list so that I can perform the addition of columns but can not seem to figure out how to go about this. %in% will not support partial match so I am not sure what other function to use
Thanks!

  1. for(i in prefix_list1){
  2. sapply(positionsdf, function(x) i %in% x)
  3. }

答案1

得分: 0

  1. 可能执行以下操作:

sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})

  1. <details>
  2. <summary>英文:</summary>
  3. We may do

sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})

  1. </details>

huangapple
  • 本文由 发表于 2023年2月10日 04:49:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404292.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定