在R中提取特定字符串和冒号之间的数值。

huangapple go评论116阅读模式
英文:

extract value between specific string and colon in R

问题

我有一个表格示例,类似这样

  1. No, Memo
  2. 1, 日期:2020/10/22 城市:UA 注释:任何技能的真正掌握需要一生的时间。
  3. 2, 日期:2022/11/01 城市:CH 注释:汗水是成功的润滑剂。
  4. 3, 日期:2022111 城市:UA 注释:每个高尚的工作起初都是不可能的。
  5. 4, 日期:2022215 城市:AA 注释:活得美丽,梦想满怀,爱得完整。

我想提取日期:后面的字符串,城市:后面的字符串和注释:后面的字符串。
例如,在编号1处,我需要提取在日期:城市:之间的2020/10/22,在城市:注释:之间的UA,以及在注释:之后的True mastery of any skill takes a lifetime.

期望的输出如下:

  1. No 日期 城市 注释
  2. 1 2020/10/22 UA 任何技能的真正掌握需要一生的时间。
  3. 2 2022/11/01 CH 汗水是成功的润滑剂。
  4. 3 2022111 UA 每个高尚的工作起初都是不可能的。
  5. 4 2022215 AA 活得美丽,梦想满怀,爱得完整。

有人知道如何做吗?任何帮助都将不胜感激。谢谢。

英文:

I have a table example like this

  1. No, Memo
  2. 1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  3. 2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  4. 3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  5. 4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

I want to extract string after Date: ,City: and Note:.
For example at NO. 1,I need to extract the "2020/10/22" which is between Date: and City:, "UA" which is between City: and Note:, and the "True mastery of any skill takes a lifetime." which is after Note:.

Desired Output like :

  1. No Date City Note
  2. 1 2020/10/22 UA True mastery of any skill takes a lifetime.
  3. 2 2022/11/01 CH Sweat is the lubricant of success.
  4. 3 2022y11m1d UA Every noble work is at first impossible.
  5. 4 2022y2m15d AA Live beautifully, dream passionately, love completely.

Does anyone know an answer for that?Any help would be great.Thank you.

答案1

得分: 4

我的解决方案使用正则表达式以及stringrdplyr库。

  1. library(stringr)
  2. library(dplyr)
  3. df <- read.table(
  4. text = "No; Memo
  5. 1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  6. 2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  7. 3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  8. 4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
  9. sep = ";",
  10. header = T
  11. )
  12. df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
  13. city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
  14. note = str_extract(Memo, "(?<=Note: ).*")) %>%
  15. select(-Memo)
  1. > df_test
  2. No date city note
  3. 1 1 2020/10/22 UA True mastery of any skill takes a lifetime.
  4. 2 2 2022/11/01 CH Sweat is the lubricant of success.
  5. 3 3 2022y11m1d UA Every noble work is at first impossible.
  6. 4 4 2022y2m15d AA Live beautifully, dream passionately, love completely.

正则表达式匹配了指定的组之间的所有内容,使用了正向先行和正向后行查找。

英文:

My solution using regex and stringr and dplyr

  1. library(stringr)
  2. library(dplyr)
  3. df &lt;- read.table(
  4. text = &quot;No; Memo
  5. 1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  6. 2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  7. 3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  8. 4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;,
  9. sep = &quot;;&quot;,
  10. header = T
  11. )
  12. df_test &lt;- df %&gt;% mutate(date = str_extract(Memo, &quot;(?&lt;=Date: )(.*)(?= City)&quot;),
  13. city = str_extract(Memo, &quot;(?&lt;=City: )(.*)(?= Note)&quot;),
  14. note = str_extract(Memo, &quot;(?&lt;=Note: ).*&quot;)) %&gt;%
  15. select(-Memo)
  1. &gt; df_test
  2. No date city note
  3. 1 1 2020/10/22 UA True mastery of any skill takes a lifetime.
  4. 2 2 2022/11/01 CH Sweat is the lubricant of success.
  5. 3 3 2022y11m1d UA Every noble work is at first impossible.
  6. 4 4 2022y2m15d AA Live beautifully, dream passionately, love completely.

The regex matches everything between the groups specified using positive lookahead and loohbehind.

答案2

得分: 0

  1. Memo 中的每个关键字前面放置一个换行符,此时 Memo 的格式为 dcf,因此使用 read.dcf 进行读取。这是通用的,不依赖于 Memo 中的特定关键字,并且不依赖于任何包。
  2. DF |&gt;
  3. transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  4. with(data.frame(No, read.dcf(textConnection(Memo))))
  5. 得到
  6. No Date City Note
  7. 1 1 2020/10/22 UA True mastery of any skill takes a lifetime.
  8. 2 2 2022/11/01 CH Sweat is the lubricant of success.
  9. 3 3 2022y11m1d UA Every noble work is at first impossible.
  10. 4 4 2022y2m15d AA Live beautifully, dream passionately, love completely.
  11. ## Note
  12. DF &lt;- data.frame(
  13. No = 1:4,
  14. Memo = c(
  15. &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
  16. &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
  17. &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
  18. &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  19. )
  20. )
英文:

Place a newline before each keyword in Memo at which point it is in dcf format so read that using read.dcf. This is general, not depending on the particular keywords in Memo, and does not depend on any packages.

  1. DF |&gt;
  2. transform(Memo = gsub(&quot;(\\w+: )&quot;, &quot;\n\&quot;, Memo)) |&gt;
  3. with(data.frame(No, read.dcf(textConnection(Memo))))

giving

  1. No Date City Note
  2. 1 1 2020/10/22 UA True mastery of any skill takes a lifetime.
  3. 2 2 2022/11/01 CH Sweat is the lubricant of success.
  4. 3 3 2022y11m1d UA Every noble work is at first impossible.
  5. 4 4 2022y2m15d AA Live beautifully, dream passionately, love completely.

Note

  1. DF &lt;- data.frame(
  2. No = 1:4,
  3. Memo = c(
  4. &quot;Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.&quot;,
  5. &quot;Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.&quot;,
  6. &quot;Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.&quot;,
  7. &quot;Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.&quot;
  8. )
  9. )

huangapple
  • 本文由 发表于 2023年6月19日 14:54:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76504262.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定