2023年6月16日 10:19:47go评论164阅读模式

英文:

Split colon comma but ignore brackets in R?

问题

我有一个数据框，我想要按逗号和冒号分割Mem列中的字符串。以下是我的示例：

df <- data.frame(ID=c("AM", "UA", "AS"),
                 Mem = c("WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass",
                         "WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass",
                         "WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass"))

我想要按冒号和逗号分割Mem列中的字符串。结果应该是：

    ID  WRR(Happiness Report Index,HRI)(Cs)  Country(%)  UAM(The Star Spangled Banner,TSSB)(s)  City(%)  TSSB/Cs(%)  Note
1:  AM  1470  60.2  1380   69.7  93.88  pass
2:  UA  2280  96.2  2010  107.5  88.16  pass
3:  AS  3170 101.6  2950   95.5  93.06  pass

英文:

I have a data frame, and I want to split the strings in the Mem column by commas and colons. Here's my example:

df &lt;- data.frame(ID=c(&quot;AM&quot;, &quot;UA&quot;, &quot;AS&quot;),
                 Mem = c(&quot;WRR(World Happiness Report Index,WHRI)(Cs):1470,Country(%):60.2,UAM(The Star Spangled Banner,TSSB)(s):1380,City(%):69.7,TSSB/Cs(%):93.88,Note:pass&quot;,
                         &quot;WRR(World Happiness Report Index,WHRI)(Cs):2280,Country(%):96.2,UAM(The Star Spangled Banner,TSSB)(s):2010,City(%):107.5,TSSB/Cs(%):88.16,Note:pass&quot;,
                         &quot;WRR(World Happiness Report Index,WHRI)(Cs):3170,Country(%):101.6,UAM(The Star Spangled Banner,TSSB)(s):2950,City(%):95.5,TSSB/Cs(%):93.06,Note:pass&quot;))

I want to split the strings in the Mem column by colon and comma. The result should be:

    ID  WRR(Happiness Report Index,HRI)(Cs)  Country(%)  UAM(The Star Spangled Banner,TSSB)(s)  City(%)  TSSB/Cs(%)  Note
1:  AM  1470  60.2  1380   69.7  93.88  pass
2:  UA  2280  96.2  2010  107.5  88.16  pass
3:  AS  3170 101.6  2950   95.5  93.06  pass

Any help would be greatly appreciated!

答案1

得分: 1

以下是翻译好的代码部分：

# 可以使用模式 `&quot;:[^,]+,*&quot;` 来将变量名称与其值分隔开。它的含义是：
# - 一个 `:`；
# - 后面跟着任意数量的字符（`+`），除了逗号（`[^,]`）；
# - 然后，可能是一个逗号 `,`；
# 然后，我们可以将这些名称保存在一个变量中：
variables <- str_split_1(df$Mem[1], "&quot;:[^,]+,*&quot;") %>% head(-1)
# 注意：该模式最终在末尾创建了一个空字符串，因此需要使用 `head(-1)` 来移除它。
# 接下来，要获取值，我们需要获得字符串的其余部分。可以通过从字符串中删除 `variables` 中的每个元素来实现。我不知道是否已经有一个可以做到这一点的函数，但是下面是一个自定义函数：
str_remove_multiple <- function(x, patterns){
  for(i in patterns) x <- str_remove_all(x, fixed(i))
  x
}
# 在“清理”Mem变量之后，我们可以按剩余的逗号“,”拆分它，并根据`variables`将每个值保存到一个新列中：
df %>%
  mutate(Mem = str_remove_list(Mem, c(variables, "&quot;:&quot;"))) %>%
  separate(Mem, into = variables, sep = "&quot;,&quot;") %>%
  mutate(across(-c(ID, Note), as.numeric))

结果：

      ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1 AM                                       1470       60.2                                  1380    69.7      93.88 pass
2 UA                                       2280       96.2                                  2010   107.5      88.16 pass
3 AS                                       3170      101.6                                  2950    95.5      93.06 pass

英文:

The pattern ":[^,]+,*" can separate the variable names from its values. It means:

A :;
Folowed by any number of (+) characters, except commas ([^,]);
Then, perhaps (*), a comma ,;

We can then save these names on a variable with:

variables &lt;- str_split_1(df$Mem[1], &quot;:[^,]+,*&quot;) %&gt;% head(-1)

Obs: the pattern ends up creating an empty string at the end, hence the head(-1).

Then, to get the values, we want the rest of the string. We can do that by removing every element of variables from it. I don't know if there's already a function that does this, but here is a custom one:

str_remove_multiple &lt;- function(x, patterns){
  for(i in patterns) x &lt;- str_remove_all(x, fixed(i))
  x
}

After "cleaning" the Mem variable, we can split it by the remaining ",", and save each value to a new column based on variables:

df %&gt;%
  mutate(Mem = str_remove_list(Mem, c(variables, &quot;:&quot;))) %&gt;%
  separate(Mem, into = variables, sep = &quot;,&quot;) %&gt;%
  mutate(across(-c(ID, Note), as.numeric))

Result:

  ID WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%) Note
1 AM                                       1470       60.2                                  1380    69.7      93.88 pass
2 UA                                       2280       96.2                                  2010   107.5      88.16 pass
3 AS                                       3170      101.6                                  2950    95.5      93.06 pass

答案2

得分: 1

以下是翻译好的代码部分：

dt = setDT(tstrsplit(df$Mem, "([^0-9.]+:)", keep=2:7, type.convert=TRUE))
setnames(dt, unlist(strsplit(df$Mem[1], "(:[^,]+,)|(:pass$)")))

如果您需要任何进一步的翻译或解释，请随时提出。

英文:

Can this help?:

dt = setDT(tstrsplit(df$Mem, &quot;([^0-9.]+:)&quot;, keep=2:7, type.convert=TRUE))
setnames(dt, unlist(strsplit(df$Mem[1], &quot;(:[^,]+,)|(:pass$)&quot;)))
   WRR(World Happiness Report Index,WHRI)(Cs) Country(%) UAM(The Star Spangled Banner,TSSB)(s) City(%) TSSB/Cs(%)   Note
										&lt;int&gt;      &lt;num&gt;                                 &lt;int&gt;   &lt;num&gt;      &lt;num&gt; &lt;char&gt;
1:                                       1470       60.2                                  1380    69.7      93.88   pass
2:                                       2280       96.2                                  2010   107.5      88.16   pass
3:                                       3170      101.6                                  2950    95.5      93.06   pass

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

分割冒号和逗号，但忽略括号在 R 中？

问题

答案1

答案2

你可以在Dyplr的`rename_with()`函数的`.cols`参数中指定tibble的最后一列吗？

如何删除数据框中包含在另一个字符串中已经包含的子字符串的行？

为什么 ggplot2 中的 ‘dodge’ 命令对我无效？

如何在加载JavaScript文件之前显示旋转器？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。