2023年5月17日 11:59:21go评论101阅读模式

英文:

Get rid of all the numbers and commas that are at least 2 spaces to the right of the word

问题

Here's the translated code snippet without any additional content:

我正在尝试抓取支持 Microsoft 的语音服务的[区域表](https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/regions#speech-service)。我已经成功获取到以下字符向量：
```R
region &lt;- c(&quot;southafricanorth 6&quot;, &quot;eastasia 5&quot;, &quot;southeastasia 1,2,3,4,5&quot;, 
&quot;australiaeast 1,2,3,4&quot;, &quot;centralindia 1,2,3,4,5&quot;, &quot;japaneast 2,5&quot;, 
&quot;japanwest&quot;, &quot;koreacentral 2&quot;, &quot;canadacentral 1&quot;, &quot;northeurope 1,2,4,5&quot;, 
&quot;westeurope 1,2,3,4,5&quot;, &quot;francecentral&quot;, &quot;germanywestcentral&quot;, 
&quot;norwayeast&quot;, &quot;switzerlandnorth 6&quot;, &quot;switzerlandwest&quot;, &quot;uksouth 1,2,3,4&quot;, 
&quot;uaenorth 6&quot;, &quot;brazilsouth 6&quot;, &quot;centralus&quot;, &quot;eastus 1,2,3,4,5&quot;, 
&quot;eastus2 1,2,4,5&quot;, &quot;northcentralus 4,6&quot;, &quot;southcentralus 1,2,3,4,5,6&quot;, 
&quot;westcentralus 5&quot;, &quot;westus 2,5&quot;, &quot;westus2 1,2,4,5&quot;, &quot;westus3&quot;
)

用正则表达式去除距离单词至少两个空格的所有数字和逗号，例如，我只想要 westus2，而不是 westus2 1,2,4,5。

我尝试过以下代码但未成功：gsub("\\s{2,}\\d+.*", "", region)


<details>
<summary>英文:</summary>
I&#39;m trying to scrape this [table of regions](https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/regions#speech-service) that support Microsoft&#39;s Speech service. I&#39;ve managed to get the following character vector:

region <- c("southafricanorth 6", "eastasia 5", "southeastasia 1,2,3,4,5",
"australiaeast 1,2,3,4", "centralindia 1,2,3,4,5", "japaneast 2,5",
"japanwest", "koreacentral 2", "canadacentral 1", "northeurope 1,2,4,5",
"westeurope 1,2,3,4,5", "francecentral", "germanywestcentral",
"norwayeast", "switzerlandnorth 6", "switzerlandwest", "uksouth 1,2,3,4",
"uaenorth 6", "brazilsouth 6", "centralus", "eastus 1,2,3,4,5",
"eastus2 1,2,4,5", "northcentralus 4,6", "southcentralus 1,2,3,4,5,6",
"westcentralus 5", "westus 2,5", "westus2 1,2,4,5", "westus3"
)


What is the regex that gets rid of all the numbers and commas that are at least 2 spaces to the right of the words? For ex, I just want `westus2`, instead of `westus2 1,2,4,5`. 
I&#39;ve tried this to no avail: `gsub(&quot;\\s{2,}\\d+.*&quot;, &quot;&quot;, region)`
</details>
# 答案1
**得分**: 4
以下是翻译好的内容：
"regions names without the superscripts are contained inside `<code>` tags in the HTML. So you could avoid the need for regexes by modifying your scraping code to something like:
```R
library(rvest)
url <- "https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/regions"
regions <- read_html(url) %>%
  # first table only
  html_element("table") %>%
  html_elements("code") %>%
  html_text()
  
regions

[1] "southafricanorth" "eastasia" "southeastasia" "australiaeast"
"centralindia" "japaneast" "japanwest" "koreacentral"
[9] "canadacentral" "northeurope" "westeurope" "francecentral"
"germanywestcentral" "norwayeast" "switzerlandnorth" "switzerlandwest"
[17] "uksouth" "uaenorth" "brazilsouth" "centralus"
"eastus" "eastus2" "northcentralus" "southcentralus"
[25] "westcentralus" "westus" "westus2" "westus3"`

请注意，上述内容中的代码部分未被翻译。

英文:

The regions names without the superscripts are contained inside <code> tags in the HTML. So you could avoid the need for regexes by modifying your scraping code to something like:

library(rvest)
url &lt;- &quot;https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/regions&quot;
regions &lt;- read_html(url) %&gt;% 
  # first table only
  html_element(&quot;table&quot;) %&gt;% 
  html_elements(&quot;code&quot;) %&gt;% 
  html_text()
regions
[1] &quot;southafricanorth&quot;   &quot;eastasia&quot;           &quot;southeastasia&quot;      &quot;australiaeast&quot;      
    &quot;centralindia&quot;       &quot;japaneast&quot;          &quot;japanwest&quot;          &quot;koreacentral&quot;      
[9] &quot;canadacentral&quot;      &quot;northeurope&quot;        &quot;westeurope&quot;         &quot;francecentral&quot;      
    &quot;germanywestcentral&quot; &quot;norwayeast&quot;         &quot;switzerlandnorth&quot;   &quot;switzerlandwest&quot;   
[17] &quot;uksouth&quot;            &quot;uaenorth&quot;           &quot;brazilsouth&quot;        &quot;centralus&quot;          
     &quot;eastus&quot;             &quot;eastus2&quot;            &quot;northcentralus&quot;     &quot;southcentralus&quot;    
[25] &quot;westcentralus&quot;      &quot;westus&quot;             &quot;westus2&quot;            &quot;westus3&quot;

答案2

得分: 2

另一个优雅的解决方案是stringr包中的word()函数：

默认情况下，第一个单词是：

word(string, start = 1L, end = start, sep = fixed(" "))

library(stringr)
word(region)
 [1] "southafricanorth"   "eastasia"           "southeastasia"      "australiaeast"     
 [5] "centralindia"       "japaneast"          "japanwest"          "koreacentral"      
 [9] "canadacentral"      "northeurope"        "westeurope"         "francecentral"     
[13] "germanywestcentral" "norwayeast"         "switzerlandnorth"   "switzerlandwest"   
[17] "uksouth"            "uaenorth"           "brazilsouth"        "centralus"         
[21] "eastus"             "eastus2"            "northcentralus"     "southcentralus"    
[25] "westcentralus"      "westus"             "westus2"            "westus3"

英文:

Another elegant solution is word() function from stringr package:

The first word is default:

word(string, start = 1L, end = start, sep = fixed(" "))

library(stringr)
word(region)
 [1] &quot;southafricanorth&quot;   &quot;eastasia&quot;           &quot;southeastasia&quot;      &quot;australiaeast&quot;     
 [5] &quot;centralindia&quot;       &quot;japaneast&quot;          &quot;japanwest&quot;          &quot;koreacentral&quot;      
 [9] &quot;canadacentral&quot;      &quot;northeurope&quot;        &quot;westeurope&quot;         &quot;francecentral&quot;     
[13] &quot;germanywestcentral&quot; &quot;norwayeast&quot;         &quot;switzerlandnorth&quot;   &quot;switzerlandwest&quot;   
[17] &quot;uksouth&quot;            &quot;uaenorth&quot;           &quot;brazilsouth&quot;        &quot;centralus&quot;         
[21] &quot;eastus&quot;             &quot;eastus2&quot;            &quot;northcentralus&quot;     &quot;southcentralus&quot;    
[25] &quot;westcentralus&quot;      &quot;westus&quot;             &quot;westus2&quot;            &quot;westus3&quot;

答案3

得分: 2

你的正则表达式不匹配，因为你的字符串没有两个空格。如果你将 \\s{2,} 改成 \\s 或，它应该会得到预期的结果。

sub(" \\d+.*", "", region)

在这种情况下，看起来可以简化为

sub(" .*", "", region)

或者

sub(" .+", "", region)

英文:

Your regex does not match because you string does not have two spaces. If you change \\s{2,} to \\s or it should give the expected result.

sub(&quot;\\s\\d+.*&quot;, &quot;&quot;, region)
# [1] &quot;southafricanorth&quot;   &quot;eastasia&quot;           &quot;southeastasia&quot;     
# [4] &quot;australiaeast&quot;      &quot;centralindia&quot;       &quot;japaneast&quot;         
# [7] &quot;japanwest&quot;          &quot;koreacentral&quot;       &quot;canadacentral&quot;     
#[10] &quot;northeurope&quot;        &quot;westeurope&quot;         &quot;francecentral&quot;     
#[13] &quot;germanywestcentral&quot; &quot;norwayeast&quot;         &quot;switzerlandnorth&quot;  
#[16] &quot;switzerlandwest&quot;    &quot;uksouth&quot;            &quot;uaenorth&quot;          
#[19] &quot;brazilsouth&quot;        &quot;centralus&quot;          &quot;eastus&quot;            
#[22] &quot;eastus2&quot;            &quot;northcentralus&quot;     &quot;southcentralus&quot;    
#[25] &quot;westcentralus&quot;      &quot;westus&quot;             &quot;westus2&quot;           
#[28] &quot;westus3&quot;

In this case it looks like that it could be simplified to

sub(&quot; .*&quot;, &quot;&quot;, region)

sub(&quot; .+&quot;, &quot;&quot;, region)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

去掉所有距单词右边至少2个空格的数字和逗号。

问题

答案2

答案3

R survminer::ggsurvplot无法合并，因为存在”atomic vector”。

从CRAN安装R包不包括Bioconductor包。

基本理解 – TypeScript和如何正确将JS模块实现到Web项目中

Boxplot with additional lines for 10th and 90th percentile in R

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。