2023年6月19日 14:54:48go评论116阅读模式

英文:

extract value between specific string and colon in R

问题

我有一个表格示例，类似这样

No, Memo
  1, 日期：2020/10/22 城市：UA 注释：任何技能的真正掌握需要一生的时间。
  2, 日期：2022/11/01 城市：CH 注释：汗水是成功的润滑剂。
  3, 日期：2022年11月1日 城市：UA 注释：每个高尚的工作起初都是不可能的。
  4, 日期：2022年2月15日 城市：AA 注释：活得美丽，梦想满怀，爱得完整。

我想提取日期：后面的字符串，城市：后面的字符串和注释：后面的字符串。
例如，在编号1处，我需要提取在日期：和城市：之间的2020/10/22，在城市：和注释：之间的UA，以及在注释：之后的True mastery of any skill takes a lifetime.。

期望的输出如下：

 No 日期       城市 注释
  1 2020/10/22 UA   任何技能的真正掌握需要一生的时间。
  2 2022/11/01 CH   汗水是成功的润滑剂。
  3 2022年11月1日 UA   每个高尚的工作起初都是不可能的。
  4 2022年2月15日 AA   活得美丽，梦想满怀，爱得完整。

有人知道如何做吗？任何帮助都将不胜感激。谢谢。

英文:

I have a table example like this

No, Memo
  1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

I want to extract string after Date: ,City: and Note:.
For example at NO. 1,I need to extract the "2020/10/22" which is between Date: and City:, "UA" which is between City: and Note:, and the "True mastery of any skill takes a lifetime." which is after Note:.

Desired Output like :

 No Date       City Note
  1 2020/10/22 UA   True mastery of any skill takes a lifetime.
  2 2022/11/01 CH   Sweat is the lubricant of success.
  3 2022y11m1d UA   Every noble work is at first impossible.
  4 2022y2m15d AA   Live beautifully, dream passionately, love completely.

Does anyone know an answer for that?Any help would be great.Thank you.

答案1

得分: 4

我的解决方案使用正则表达式以及stringr和dplyr库。

library(stringr)
library(dplyr)
df <- read.table(
  text = "No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
  sep = ";",
  header = T
)
df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
                         city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
                         note = str_extract(Memo, "(?<=Note: ).*")) %>%
  select(-Memo)

> df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

正则表达式匹配了指定的组之间的所有内容，使用了正向先行和正向后行查找。

英文:

My solution using regex and stringr and dplyr

library(stringr)
library(dplyr)
df &lt;- read.table(
  text = &quot;No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;,
  sep = &quot;;&quot;,
  header = T
)
df_test &lt;- df %&gt;% mutate(date = str_extract(Memo, &quot;(?&lt;=Date: )(.*)(?= City)&quot;),
                         city = str_extract(Memo, &quot;(?&lt;=City: )(.*)(?= Note)&quot;),
                         note = str_extract(Memo, &quot;(?&lt;=Note: ).*&quot;)) %&gt;%
  select(-Memo)

&gt; df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

The regex matches everything between the groups specified using positive lookahead and loohbehind.

答案2

得分: 0

在 Memo 中的每个关键字前面放置一个换行符，此时 Memo 的格式为 dcf，因此使用 read.dcf 进行读取。这是通用的，不依赖于 Memo 中的特定关键字，并且不依赖于任何包。
DF |&gt;
  transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  with(data.frame(No, read.dcf(textConnection(Memo))))
得到
  No       Date City                                                   Note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.
## Note
DF &lt;- data.frame(
  No = 1:4,
  Memo = c(
    &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
    &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
    &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
    &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  )
)

英文:

Place a newline before each keyword in Memo at which point it is in dcf format so read that using read.dcf. This is general, not depending on the particular keywords in Memo, and does not depend on any packages.

DF |&gt;
  transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  with(data.frame(No, read.dcf(textConnection(Memo))))

giving

  No       Date City                                                   Note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

Note

DF &lt;- data.frame(
  No = 1:4,
  Memo = c(
    &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
    &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
    &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
    &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  )
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中提取特定字符串和冒号之间的数值。

问题

答案1

答案2

Note

替换正则表达式中引号内的值，如果只有引号则忽略。

如何正确使用tidyverse包中的map()函数在添加矩阵计算层时？

mutate()函数在列中用均值替换-1，但所有值都无条件替换。

如何使用strsplit基于行名称筛选数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。