2023年3月7日 09:18:59go评论90阅读模式

英文:

Extract values after a word/alphabet from one column to a new column

问题

    GEOID_Final = c("42101000101", "42101000102", "42101000103", "42101000104", "42101000105")
    Desired_df = data.frame(id, GEOID, GEOID_Final)

英文:

Based on the data below how can I create another column based on the values after the alphabetU in every row of column GEOID?

Data:

id = c(1, 2, 3, 4, 5)
GEOID = c(&quot;1400000US42101000101&quot;, &quot;1400000US42101000102&quot;, &quot;1400000US42101000103&quot;,
&quot;1400000US42101000104&quot;, &quot;1400000US42101000105&quot;)
df = data.frame(id, GEOID)

Desired output:

id = c(1, 2, 3, 4, 5)
GEOID = c(&quot;1400000US42101000101&quot;, &quot;1400000US42101000102&quot;, &quot;1400000US42101000103&quot;,
&quot;1400000US42101000104&quot;, &quot;1400000US42101000105&quot;)
GEOID_Final = c(&quot;42101000101&quot;, &quot;42101000102&quot;, &quot;42101000103&quot;, &quot;42101000104&quot;, &quot;42101000105&quot;)
Desired_df = data.frame(id, GEOID, GEOID_Final)

Code:

library(stringr)
library(dplyr)
desired_df = df %&gt;% word(?, sep = &quot;US&quot;) # Stuck

答案1

得分: 3

我会使用正则表达式来找到“US”之前和“US”后面的所有数字，并从字符串中删除它们，只保留“US”后面的数字。

library(stringr); library(dplyr)

id = c(1, 2, 3, 4, 5)
GEOID = c("1400000US42101000101", "1400000US42101000102", "1400000US42101000103",
"1400000US42101000104", "1400000US42101000105")

df = data.frame(id, GEOID)

df %>%
mutate(
# 用空字符串替换每个“US”之前和“US”后面的数字
GEOID_final = str_replace(GEOID, pattern="\d+US" , replacement ="")
)

id GEOID GEOID_final

1 1 1400000US42101000101 42101000101

2 2 1400000US42101000102 42101000102

3 3 1400000US42101000103 42101000103

4 4 1400000US42101000104 42101000104

5 5 1400000US42101000105 42101000105

`str_replace`接受一个字符串（在这种情况下，是一个字符串向量，GEOID），并将用`replacement`替换掉与`pattern`匹配的内容。我使用的模式是\\\d+US，表示“匹配一个或多个数字，后跟US”。然后将其替换为空，只保留US后面的数字。您也可以使用.+US进行匹配，表示“匹配US之前的任何内容，以及US”。
希望这有所帮助！

英文:

I would use regular expressions to find every digits before "US" and US, and remove them from the string, only keeping the numbers after US.

library(stringr); library(dplyr)
id = c(1, 2, 3, 4, 5)
GEOID = c(&quot;1400000US42101000101&quot;, &quot;1400000US42101000102&quot;, &quot;1400000US42101000103&quot;,
          &quot;1400000US42101000104&quot;, &quot;1400000US42101000105&quot;)
df = data.frame(id, GEOID)
df %&gt;%
  mutate(
    # replace every digit before US and US with empty string                       
    GEOID_final = str_replace(GEOID, pattern=&quot;\\d+US&quot; , replacement =&quot;&quot;)
  )
# id                GEOID GEOID_final
# 1  1 1400000US42101000101 42101000101
# 2  2 1400000US42101000102 42101000102
# 3  3 1400000US42101000103 42101000103
# 4  4 1400000US42101000104 42101000104
# 5  5 1400000US42101000105 42101000105

str_replace takes a string (or in this case, a vector of string, GEOID), and will replace with replacement what has been matched with the pattern. The pattern I use is \\d+US, which means 'match any digit one or more times, followed by US'. This is then replaced with nothing, only keeping the digits after US. You could also match it with .+US, which means 'match anything before US, and US'.

Hope this helps!

答案2

得分: 1

如果你想使用word()
    desired_df = df %>% word(?, sep = "US") # 卡住
尝试使用 `stringr::word`
    word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
结果
> word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
[1] "42101000101" "42101000102" "42101000103" "42101000104" "42101000105"
将它们保存在第三列或在mutate中使用：
    df %>% mutate(
      GEOID_Final = word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
    )
结果
  
        id                GEOID GEOID_Final
    1  1 1400000US42101000101 42101000101
    2  2 1400000US42101000102 42101000102
    3  3 1400000US42101000103 42101000103
    4  4 1400000US42101000104 42101000104
    5  5 1400000US42101000105 42101000105

英文:

If you want to use word()

desired_df = df %&gt;% word(?, sep = &quot;US&quot;) # Stuck

try this with stringr::word

word(string = str_replace(GEOID,pattern = &#39;US&#39;,replacement = &quot; &quot;),start = 2,end = 2)

result
> word(string = str_replace(GEOID,pattern = 'US',replacement = " "),start = 2,end = 2)
[1] "42101000101" "42101000102" "42101000103" "42101000104" "42101000105"

save them in a 3rd column or use in mutate:

df%&gt;%mutate(
GEOID_Final =  word(string = str_replace(GEOID,pattern = &#39;US&#39;,replacement = &quot; &quot;),start = 2,end = 2)
  )

result

    id                GEOID GEOID_Final
1  1 1400000US42101000101 42101000101
2  2 1400000US42101000102 42101000102
3  3 1400000US42101000103 42101000103
4  4 1400000US42101000104 42101000104
5  5 1400000US42101000105 42101000105

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从一列中提取一个单词/字母后面的数值到新的一列

问题

答案1

id GEOID GEOID_final

1 1 1400000US42101000101 42101000101

2 2 1400000US42101000102 42101000102

3 3 1400000US42101000103 42101000103

4 4 1400000US42101000104 42101000104

5 5 1400000US42101000105 42101000105

答案2

如何构建一个函数，根据变量来构建直方图或柱状图

How to remove character strings that are detected/contained within other character strings, but only within a specified group_by()-argument

如何重新排列堆叠面积图？

sapply无法简化为向量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。