在R中提取特定字符串和冒号之间的数值。

huangapple go评论83阅读模式
英文:

extract value between specific string and colon in R

问题

我有一个表格示例,类似这样

No, Memo
  1, 日期:2020/10/22 城市:UA 注释:任何技能的真正掌握需要一生的时间。
  2, 日期:2022/11/01 城市:CH 注释:汗水是成功的润滑剂。
  3, 日期:2022年11月1日 城市:UA 注释:每个高尚的工作起初都是不可能的。
  4, 日期:2022年2月15日 城市:AA 注释:活得美丽,梦想满怀,爱得完整。

我想提取日期:后面的字符串,城市:后面的字符串和注释:后面的字符串。
例如,在编号1处,我需要提取在日期:城市:之间的2020/10/22,在城市:注释:之间的UA,以及在注释:之后的True mastery of any skill takes a lifetime.

期望的输出如下:

 No 日期       城市 注释
  1 2020/10/22 UA   任何技能的真正掌握需要一生的时间。
  2 2022/11/01 CH   汗水是成功的润滑剂。
  3 2022年11月1日 UA   每个高尚的工作起初都是不可能的。
  4 2022年2月15日 AA   活得美丽,梦想满怀,爱得完整。

有人知道如何做吗?任何帮助都将不胜感激。谢谢。

英文:

I have a table example like this

No, Memo
  1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

I want to extract string after Date: ,City: and Note:.
For example at NO. 1,I need to extract the "2020/10/22" which is between Date: and City:, "UA" which is between City: and Note:, and the "True mastery of any skill takes a lifetime." which is after Note:.

Desired Output like :

 No Date       City Note
  1 2020/10/22 UA   True mastery of any skill takes a lifetime.
  2 2022/11/01 CH   Sweat is the lubricant of success.
  3 2022y11m1d UA   Every noble work is at first impossible.
  4 2022y2m15d AA   Live beautifully, dream passionately, love completely.

Does anyone know an answer for that?Any help would be great.Thank you.

答案1

得分: 4

我的解决方案使用正则表达式以及stringrdplyr库。

library(stringr)
library(dplyr)

df <- read.table(
  text = "No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
  sep = ";",
  header = T
)

df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
                         city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
                         note = str_extract(Memo, "(?<=Note: ).*")) %>%
  select(-Memo)
> df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

正则表达式匹配了指定的组之间的所有内容,使用了正向先行和正向后行查找。

英文:

My solution using regex and stringr and dplyr

library(stringr)
library(dplyr)

df &lt;- read.table(
  text = &quot;No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;,
  sep = &quot;;&quot;,
  header = T
)

df_test &lt;- df %&gt;% mutate(date = str_extract(Memo, &quot;(?&lt;=Date: )(.*)(?= City)&quot;),
                         city = str_extract(Memo, &quot;(?&lt;=City: )(.*)(?= Note)&quot;),
                         note = str_extract(Memo, &quot;(?&lt;=Note: ).*&quot;)) %&gt;%
  select(-Memo)


&gt; df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

The regex matches everything between the groups specified using positive lookahead and loohbehind.

答案2

得分: 0

在 Memo 中的每个关键字前面放置一个换行符,此时 Memo 的格式为 dcf,因此使用 read.dcf 进行读取。这是通用的,不依赖于 Memo 中的特定关键字,并且不依赖于任何包。

DF |&gt;
  transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  with(data.frame(No, read.dcf(textConnection(Memo))))

得到

  No       Date City                                                   Note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.


## Note

DF &lt;- data.frame(
  No = 1:4,
  Memo = c(
    &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
    &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
    &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
    &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  )
)
英文:

Place a newline before each keyword in Memo at which point it is in dcf format so read that using read.dcf. This is general, not depending on the particular keywords in Memo, and does not depend on any packages.

DF |&gt;
  transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  with(data.frame(No, read.dcf(textConnection(Memo))))

giving

  No       Date City                                                   Note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

Note

DF &lt;- data.frame(
  No = 1:4,
  Memo = c(
    &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
    &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
    &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
    &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  )
)

huangapple
  • 本文由 发表于 2023年6月19日 14:54:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76504262.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定