英文:
Split colon comma but ignore brackets in R?
问题
我有一个数据框,我想要按逗号和冒号分割Mem
列中的字符串。以下是我的示例:
df <- data.frame(ID=c("AM", "UA", "AS"),
Mem = c("WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass",
"WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass",
"WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass"))
我想要按冒号和逗号分割Mem
列中的字符串。结果应该是:
ID WRR(Happiness Report Index,HRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1: AM 1470 60.2 1380 69.7 93.88 pass
2: UA 2280 96.2 2010 107.5 88.16 pass
3: AS 3170 101.6 2950 95.5 93.06 pass
英文:
I have a data frame, and I want to split the strings in the Mem
column by commas and colons. Here's my example:
df <- data.frame(ID=c("AM", "UA", "AS"),
Mem = c("WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass",
"WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass",
"WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass"))
I want to split the strings in the Mem
column by colon and comma. The result should be:
ID WRR(Happiness Report Index,HRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1: AM 1470 60.2 1380 69.7 93.88 pass
2: UA 2280 96.2 2010 107.5 88.16 pass
3: AS 3170 101.6 2950 95.5 93.06 pass
Any help would be greatly appreciated!
答案1
得分: 1
以下是翻译好的代码部分:
# 可以使用模式 `":[^,]+,*"` 来将变量名称与其值分隔开。它的含义是:
# - 一个 `:`;
# - 后面跟着任意数量的字符(`+`),除了逗号(`[^,]`);
# - 然后,可能是一个逗号 `,`;
# 然后,我们可以将这些名称保存在一个变量中:
variables <- str_split_1(df$Mem[1], "":[^,]+,*"") %>% head(-1)
# 注意:该模式最终在末尾创建了一个空字符串,因此需要使用 `head(-1)` 来移除它。
# 接下来,要获取值,我们需要获得字符串的其余部分。可以通过从字符串中删除 `variables` 中的每个元素来实现。我不知道是否已经有一个可以做到这一点的函数,但是下面是一个自定义函数:
str_remove_multiple <- function(x, patterns){
for(i in patterns) x <- str_remove_all(x, fixed(i))
x
}
# 在“清理”Mem变量之后,我们可以按剩余的逗号“,”拆分它,并根据`variables`将每个值保存到一个新列中:
df %>%
mutate(Mem = str_remove_list(Mem, c(variables, "":""))) %>%
separate(Mem, into = variables, sep = "","") %>%
mutate(across(-c(ID, Note), as.numeric))
结果:
ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1 AM 1470 60.2 1380 69.7 93.88 pass
2 UA 2280 96.2 2010 107.5 88.16 pass
3 AS 3170 101.6 2950 95.5 93.06 pass
英文:
The pattern ":[^,]+,*"
can separate the variable names from its values. It means:
- A
:
; - Folowed by any number of (
+
) characters, except commas ([^,]
); - Then, perhaps (
*
), a comma,
;
We can then save these names on a variable with:
variables <- str_split_1(df$Mem[1], ":[^,]+,*") %>% head(-1)
Obs: the pattern ends up creating an empty string at the end, hence the head(-1)
.
Then, to get the values, we want the rest of the string. We can do that by removing every element of variables
from it. I don't know if there's already a function that does this, but here is a custom one:
str_remove_multiple <- function(x, patterns){
for(i in patterns) x <- str_remove_all(x, fixed(i))
x
}
After "cleaning" the Mem variable, we can split it by the remaining ","
, and save each value to a new column based on variables
:
df %>%
mutate(Mem = str_remove_list(Mem, c(variables, ":"))) %>%
separate(Mem, into = variables, sep = ",") %>%
mutate(across(-c(ID, Note), as.numeric))
Result:
ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1 AM 1470 60.2 1380 69.7 93.88 pass
2 UA 2280 96.2 2010 107.5 88.16 pass
3 AS 3170 101.6 2950 95.5 93.06 pass
答案2
得分: 1
以下是翻译好的代码部分:
dt = setDT(tstrsplit(df$Mem, "([^0-9.]+:)", keep=2:7, type.convert=TRUE))
setnames(dt, unlist(strsplit(df$Mem[1], "(:[^,]+,)|(:pass$)")))
如果您需要任何进一步的翻译或解释,请随时提出。
英文:
Can this help?:
dt = setDT(tstrsplit(df$Mem, "([^0-9.]+:)", keep=2:7, type.convert=TRUE))
setnames(dt, unlist(strsplit(df$Mem[1], "(:[^,]+,)|(:pass$)")))
WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
<int> <num> <int> <num> <num> <char>
1: 1470 60.2 1380 69.7 93.88 pass
2: 2280 96.2 2010 107.5 88.16 pass
3: 3170 101.6 2950 95.5 93.06 pass
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论