英文:
Group columns in dataframe by list and mutate in R
问题
我有一个包含二进制列的大型数据框。以下是列名的列表:
[1] "imagetag_logos_position_Apple_BOTTOM_CENTER" "imagetag_logos_position_Apple_BOTTOM_LEFT" "imagetag_logos_position_Apple_BOTTOM_RIGHT" "imagetag_logos_position_Apple_CENTER" "imagetag_logos_position_Apple_CENTER_LEFT"
[6] "imagetag_logos_position_Apple_CENTER_RIGHT" "imagetag_logos_position_Apple_TOP_CENTER" "imagetag_logos_position_Apple_TOP_LEFT" "imagetag_logos_position_Apple_TOP_RIGHT" "imagetag_logos_position_Banana_BOTTOM_CENTER"
[11] "imagetag_logos_position_Banana_BOTTOM_LEFT" "imagetag_logos_position_Banana_BOTTOM_RIGHT" "imagetag_logos_position_Banana_CENTER_LEFT" "imagetag_logos_position_Banana_CENTER_RIGHT" "imagetag_logos_position_Banana_TOP_RIGHT"
[16] "imagetag_logos_position_Pear_BOTTOM_CENTER" "imagetag_logos_position_Pear_BOTTOM_LEFT" "imagetag_logos_position_Pear_BOTTOM_RIGHT" "imagetag_logos_position_Pear_CENTER" "imagetag_logos_position_Pear_CENTER_LEFT"
[21] "imagetag_logos_position_Pear_CENTER_RIGHT" "imagetag_logos_position_Pear_TOP_RIGHT" "imagetag_logos_position_Kiwi_BOTTOM_CENTER" "imagetag_logos_position_Kiwi_BOTTOM_LEFT" "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"
[26] "imagetag_logos_position_Kiwi_CENTER_LEFT" "imagetag_logos_position_Kiwi_CENTER_RIGHT" "imagetag_logos_position_Kiwi_TOP_LEFT" "Product_position_Product_0" "Product_position_Product_BOTTOM_CENTER"
[31] "Product_position_Product_BOTTOM_LEFT" "Product_position_Product_BOTTOM_RIGHT" "Product_position_Product_CENTER" "Product_position_Product_CENTER_LEFT" "Product_position_Product_CENTER_RIGHT"
[36] "Product_position_Product_TOP_CENTER" "Product_position_Product_TOP_LEFT" "Product_position_Product_TOP_RIGHT" "Person_position_Person_0" "Person_position_Person_BOTTOM_CENTER"
[41] "Person_position_Person_BOTTOM_LEFT" "Person_position_Person_BOTTOM_RIGHT" "Person_position_Person_CENTER" "Person_position_Person_CENTER_LEFT" "Person_position_Person_CENTER_RIGHT"
[46] "Person_position_Person_TOP_CENTER" "Person_position_Person_TOP_LEFT" "Person_position_Person_TOP_RIGHT" "Logo_position_Logo_0" "Logo_position_Logo_BOTTOM_CENTER"
[51] "Logo_position_Logo_BOTTOM_LEFT" "Logo_position_Logo_BOTTOM_RIGHT" "Logo_position_Logo_CENTER" "Logo_position_Logo_CENTER_LEFT" "Logo_position_Logo_CENTER_RIGHT"
[56] "Logo_position_Logo_TOP_CENTER" "Logo_position_Logo_TOP_LEFT" "Logo_position_Logo_TOP_RIGHT" "CTA_ShopNow_position_Shop Now_0" "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"
[61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT" "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT" "CTA_ShopNow_position_Shop Now_CENTER" "CTA_ShopNow_position_Shop Now_CENTER_LEFT" "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"
[66] "CTA_ShopNow_position_Shop Now_TOP_CENTER" "CTA_ShopNow_position_Shop Now_TOP_RIGHT" "CTA_JoinNow_position_Join Now_0" "CTA_JoinNow_position_Join Now_BOTTOM_CENTER" "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"
[71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT" "CTA_JoinNow_position_Join Now_CENTER" "CTA_JoinNow_position_Join Now_CENTER_RIGHT" "CTA_JoinNow_position_Join Now_TOP_CENTER" "CTA_JoinNow_position_Join Now_TOP_RIGHT"
[76] "CTA_position_CTA_0" "CTA_position_CTA_BOTTOM_CENTER" "CTA_position_CTA_BOTTOM_LEFT" "CTA_position_CTA_BOTTOM_RIGHT" "CTA_position_CTA_CENTER"
[81] "CTA_position_CTA_CENTER_LEFT" "CTA_position_CTA_CENTER_RIGHT" "CTA_position_CTA_TOP_CENTER" "CTA_position_CTA_TOP_LEFT" "CTA_position_CTA_TOP_RIGHT"
[86] "Text_position_text_BOTTOM_CENTER" "Text_position_text_BOTTOM_LEFT" "Text_position_text_BOTTOM_RIGHT" "Text_position_text_CENTER" "Text_position_text_CENTER_LEFT"
[91] "Text_position_text_CENTER_RIGHT" "Text_position_text_TOP_CENTER" "Text_position_text_TOP_LEFT" "Text_position_text_TOP_RIGHT" "Product_position_Product_0_LF"
[96] "Product_position_Product_BOTTOM_CENTER_LF" "Product_position_Product_BOTTOM_LEFT_LF" "Product_position_Product_BOTTOM_RIGHT_LF" "Product_position_Product_CENTER_LF" "Product_position_Product_CENTER_LEFT_LF"
[101] "Product_position_Product_CENTER_RIGHT_LF" "Product_position_Product_TOP_CENTER_LF" "Product_position_Product_TOP_LEFT_LF" "Product_position_Product_TOP_RIGHT_LF" "Logo_position_Logo_0_LF"
我想对其中一些列进行分组,例如对包含 "BOTTOM_CENTER"、"BOTTOM_RIGHT"、"BOTTOM_LEFT" 的列进行求和。但是,我需要在每个匹配的前缀内进行分组,例如,仅对 imagetag_logos_position_Apple 进行求和,对 imagetag_logos_position_Banana 进行单独求和。
我已经尝试过不同的方法来使数据框按列表中的字符串进行分组,以便执行列的相加操作,但是似乎无法弄清楚如何进行此操作。%in%
操作符不支持部分匹配,所以我不确定要使用哪个其他函数。
谢谢!
for(i in prefix_list1){
sapply(positionsdf, function(x) i %in% x)
}
英文:
I have a large dataframe containing binary columns. Here is a list of the column names:
[1] "imagetag_logos_position_Apple_BOTTOM_CENTER" "imagetag_logos_position_Apple_BOTTOM_LEFT" "imagetag_logos_position_Apple_BOTTOM_RIGHT" "imagetag_logos_position_Apple_CENTER" "imagetag_logos_position_Apple_CENTER_LEFT"
[6] "imagetag_logos_position_Apple_CENTER_RIGHT" "imagetag_logos_position_Apple_TOP_CENTER" "imagetag_logos_position_Apple_TOP_LEFT" "imagetag_logos_position_Apple_TOP_RIGHT" "imagetag_logos_position_Banana_BOTTOM_CENTER"
[11] "imagetag_logos_position_Banana_BOTTOM_LEFT" "imagetag_logos_position_Banana_BOTTOM_RIGHT" "imagetag_logos_position_Banana_CENTER_LEFT" "imagetag_logos_position_Banana_CENTER_RIGHT" "imagetag_logos_position_Banana_TOP_RIGHT"
[16] "imagetag_logos_position_Pear_BOTTOM_CENTER" "imagetag_logos_position_Pear_BOTTOM_LEFT" "imagetag_logos_position_Pear_BOTTOM_RIGHT" "imagetag_logos_position_Pear_CENTER" "imagetag_logos_position_Pear_CENTER_LEFT"
[21] "imagetag_logos_position_Pear_CENTER_RIGHT" "imagetag_logos_position_Pear_TOP_RIGHT" "imagetag_logos_position_Kiwi_BOTTOM_CENTER" "imagetag_logos_position_Kiwi_BOTTOM_LEFT" "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"
[26] "imagetag_logos_position_Kiwi_CENTER_LEFT" "imagetag_logos_position_Kiwi_CENTER_RIGHT" "imagetag_logos_position_Kiwi_TOP_LEFT" "Product_position_Product_0" "Product_position_Product_BOTTOM_CENTER"
[31] "Product_position_Product_BOTTOM_LEFT" "Product_position_Product_BOTTOM_RIGHT" "Product_position_Product_CENTER" "Product_position_Product_CENTER_LEFT" "Product_position_Product_CENTER_RIGHT"
[36] "Product_position_Product_TOP_CENTER" "Product_position_Product_TOP_LEFT" "Product_position_Product_TOP_RIGHT" "Person_position_Person_0" "Person_position_Person_BOTTOM_CENTER"
[41] "Person_position_Person_BOTTOM_LEFT" "Person_position_Person_BOTTOM_RIGHT" "Person_position_Person_CENTER" "Person_position_Person_CENTER_LEFT" "Person_position_Person_CENTER_RIGHT"
[46] "Person_position_Person_TOP_CENTER" "Person_position_Person_TOP_LEFT" "Person_position_Person_TOP_RIGHT" "Logo_position_Logo_0" "Logo_position_Logo_BOTTOM_CENTER"
[51] "Logo_position_Logo_BOTTOM_LEFT" "Logo_position_Logo_BOTTOM_RIGHT" "Logo_position_Logo_CENTER" "Logo_position_Logo_CENTER_LEFT" "Logo_position_Logo_CENTER_RIGHT"
[56] "Logo_position_Logo_TOP_CENTER" "Logo_position_Logo_TOP_LEFT" "Logo_position_Logo_TOP_RIGHT" "CTA_ShopNow_position_Shop Now_0" "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"
[61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT" "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT" "CTA_ShopNow_position_Shop Now_CENTER" "CTA_ShopNow_position_Shop Now_CENTER_LEFT" "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"
[66] "CTA_ShopNow_position_Shop Now_TOP_CENTER" "CTA_ShopNow_position_Shop Now_TOP_RIGHT" "CTA_JoinNow_position_Join Now_0" "CTA_JoinNow_position_Join Now_BOTTOM_CENTER" "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"
[71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT" "CTA_JoinNow_position_Join Now_CENTER" "CTA_JoinNow_position_Join Now_CENTER_RIGHT" "CTA_JoinNow_position_Join Now_TOP_CENTER" "CTA_JoinNow_position_Join Now_TOP_RIGHT"
[76] "CTA_position_CTA_0" "CTA_position_CTA_BOTTOM_CENTER" "CTA_position_CTA_BOTTOM_LEFT" "CTA_position_CTA_BOTTOM_RIGHT" "CTA_position_CTA_CENTER"
[81] "CTA_position_CTA_CENTER_LEFT" "CTA_position_CTA_CENTER_RIGHT" "CTA_position_CTA_TOP_CENTER" "CTA_position_CTA_TOP_LEFT" "CTA_position_CTA_TOP_RIGHT"
[86] "Text_position_text_BOTTOM_CENTER" "Text_position_text_BOTTOM_LEFT" "Text_position_text_BOTTOM_RIGHT" "Text_position_text_CENTER" "Text_position_text_CENTER_LEFT"
[91] "Text_position_text_CENTER_RIGHT" "Text_position_text_TOP_CENTER" "Text_position_text_TOP_LEFT" "Text_position_text_TOP_RIGHT" "Product_position_Product_0_LF"
[96] "Product_position_Product_BOTTOM_CENTER_LF" "Product_position_Product_BOTTOM_LEFT_LF" "Product_position_Product_BOTTOM_RIGHT_LF" "Product_position_Product_CENTER_LF" "Product_position_Product_CENTER_LEFT_LF"
[101] "Product_position_Product_CENTER_RIGHT_LF" "Product_position_Product_TOP_CENTER_LF" "Product_position_Product_TOP_LEFT_LF" "Product_position_Product_TOP_RIGHT_LF" "Logo_position_Logo_0_LF"
I want to group some of these columns, for example sum the columns that contain "BOTTOM_CENTER", "BOTTOM_RIGHT", "BOTTOM_LEFT". However I need to group them within each prefix that matches, for example, only sum for imagetag_logos_position_Apple, and a separate sum for imagetag_logos_position_Banana.
I did this to create a list of the unique prefixes:
prefix_list <- str_extract(colnames(positionsdf),".+?(?=([A-Z])([A-Z])([A-Z]))")
prefix_list1 <- unique(prefix_list)
> prefix_list1
[1] "imagetag_logos_position_Apple_" "imagetag_logos_position_Banana_" "imagetag_logos_position_Kiwi_" "imagetag_logos_position_Pear_" NA "Product_position_Product_" "Person_position_Person_"
[8] "Logo_position_Logo_" "CTA_ShopNow_position_Shop Now_" "CTA_JoinNow_position_Join Now_" "CTA_position_" "Text_position_text_" "CTA_LearnMore_position_Learn More_" "Person_position_"
I have tried different ways to get the dataframe to group by the string in the list so that I can perform the addition of columns but can not seem to figure out how to go about this. %in% will not support partial match so I am not sure what other function to use
Thanks!
for(i in prefix_list1){
sapply(positionsdf, function(x) i %in% x)
}
答案1
得分: 0
可能执行以下操作:
sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})
<details>
<summary>英文:</summary>
We may do
sapply(prefix_list1, function(pat) {
nm1 <- grep(pat, names(positions_df), value = TRUE)
nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
rowSums(positions_df[nm2], na.rm = TRUE)
})
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论