分割冒号和逗号,但忽略括号在 R 中?

huangapple go评论164阅读模式
英文:

Split colon comma but ignore brackets in R?

问题

我有一个数据框,我想要按逗号和冒号分割Mem列中的字符串。以下是我的示例:

  1. df <- data.frame(ID=c("AM", "UA", "AS"),
  2. Mem = c("WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass",
  3. "WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass",
  4. "WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass"))

我想要按冒号和逗号分割Mem列中的字符串。结果应该是:

  1. ID WRR(Happiness Report Index,HRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
  2. 1: AM 1470 60.2 1380 69.7 93.88 pass
  3. 2: UA 2280 96.2 2010 107.5 88.16 pass
  4. 3: AS 3170 101.6 2950 95.5 93.06 pass
英文:

I have a data frame, and I want to split the strings in the Mem column by commas and colons. Here's my example:

  1. df &lt;- data.frame(ID=c(&quot;AM&quot;, &quot;UA&quot;, &quot;AS&quot;),
  2. Mem = c(&quot;WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass&quot;,
  3. &quot;WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass&quot;,
  4. &quot;WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass&quot;))

I want to split the strings in the Mem column by colon and comma. The result should be:

  1. ID WRR(Happiness Report Index,HRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
  2. 1: AM 1470 60.2 1380 69.7 93.88 pass
  3. 2: UA 2280 96.2 2010 107.5 88.16 pass
  4. 3: AS 3170 101.6 2950 95.5 93.06 pass

Any help would be greatly appreciated!

答案1

得分: 1

以下是翻译好的代码部分:

  1. # 可以使用模式 `&quot;:[^,]+,*&quot;` 来将变量名称与其值分隔开。它的含义是:
  2. # - 一个 `:`;
  3. # - 后面跟着任意数量的字符(`+`),除了逗号(`[^,]`);
  4. # - 然后,可能是一个逗号 `,`;
  5. # 然后,我们可以将这些名称保存在一个变量中:
  6. variables <- str_split_1(df$Mem[1], "&quot;:[^,]+,*&quot;") %>% head(-1)
  7. # 注意:该模式最终在末尾创建了一个空字符串,因此需要使用 `head(-1)` 来移除它。
  8. # 接下来,要获取值,我们需要获得字符串的其余部分。可以通过从字符串中删除 `variables` 中的每个元素来实现。我不知道是否已经有一个可以做到这一点的函数,但是下面是一个自定义函数:
  9. str_remove_multiple <- function(x, patterns){
  10. for(i in patterns) x <- str_remove_all(x, fixed(i))
  11. x
  12. }
  13. # 在“清理”Mem变量之后,我们可以按剩余的逗号“,”拆分它,并根据`variables`将每个值保存到一个新列中:
  14. df %>%
  15. mutate(Mem = str_remove_list(Mem, c(variables, "&quot;:&quot;"))) %>%
  16. separate(Mem, into = variables, sep = "&quot;,&quot;") %>%
  17. mutate(across(-c(ID, Note), as.numeric))

结果:

  1. ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
  2. 1 AM 1470 60.2 1380 69.7 93.88 pass
  3. 2 UA 2280 96.2 2010 107.5 88.16 pass
  4. 3 AS 3170 101.6 2950 95.5 93.06 pass
英文:

The pattern &quot;:[^,]+,*&quot; can separate the variable names from its values. It means:

  • A :;
  • Folowed by any number of (+) characters, except commas ([^,]);
  • Then, perhaps (*), a comma ,;

We can then save these names on a variable with:

  1. variables &lt;- str_split_1(df$Mem[1], &quot;:[^,]+,*&quot;) %&gt;% head(-1)

Obs: the pattern ends up creating an empty string at the end, hence the head(-1).

Then, to get the values, we want the rest of the string. We can do that by removing every element of variables from it. I don't know if there's already a function that does this, but here is a custom one:

  1. str_remove_multiple &lt;- function(x, patterns){
  2. for(i in patterns) x &lt;- str_remove_all(x, fixed(i))
  3. x
  4. }

After "cleaning" the Mem variable, we can split it by the remaining &quot;,&quot;, and save each value to a new column based on variables:

  1. df %&gt;%
  2. mutate(Mem = str_remove_list(Mem, c(variables, &quot;:&quot;))) %&gt;%
  3. separate(Mem, into = variables, sep = &quot;,&quot;) %&gt;%
  4. mutate(across(-c(ID, Note), as.numeric))

Result:

  1. ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
  2. 1 AM 1470 60.2 1380 69.7 93.88 pass
  3. 2 UA 2280 96.2 2010 107.5 88.16 pass
  4. 3 AS 3170 101.6 2950 95.5 93.06 pass

答案2

得分: 1

以下是翻译好的代码部分:

  1. dt = setDT(tstrsplit(df$Mem, "([^0-9.]+:)", keep=2:7, type.convert=TRUE))
  2. setnames(dt, unlist(strsplit(df$Mem[1], "(:[^,]+,)|(:pass$)")))

如果您需要任何进一步的翻译或解释,请随时提出。

英文:

Can this help?:

  1. dt = setDT(tstrsplit(df$Mem, &quot;([^0-9.]+:)&quot;, keep=2:7, type.convert=TRUE))
  2. setnames(dt, unlist(strsplit(df$Mem[1], &quot;(:[^,]+,)|(:pass$)&quot;)))
  3. WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
  4. &lt;int&gt; &lt;num&gt; &lt;int&gt; &lt;num&gt; &lt;num&gt; &lt;char&gt;
  5. 1: 1470 60.2 1380 69.7 93.88 pass
  6. 2: 2280 96.2 2010 107.5 88.16 pass
  7. 3: 3170 101.6 2950 95.5 93.06 pass

huangapple
  • 本文由 发表于 2023年6月16日 10:19:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76486582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定