2021年8月12日 04:46:01go评论87阅读模式

英文:

A Regular Expression to make acronyms with word boundaries and remove characters preceding a word

问题

Go版本

go version go1.16.7 linux/amd64

问题

我正在进行一个关于创建首字母缩写的练习，我选择使用正则表达式来完成。

以下是给我的一些测试用例：

	input:    "Ruby on Rails",
	expected: "ROR"
	
    input:    "GNU Image Manipulation Program",
	expected: "GIMP"

	input:    "Complementary metal-oxide semiconductor",
	expected: "CMOS"

	input:    "Something - I made up from thin air",
	expected: "SIMUFTA"

	input:    "Halley's Comet",
	expected: "HC"

	input:    "The Road _Not_ Taken",
	expected: "TRNT"

下面的代码可以通过许多简单的测试，如果首字母是大写，则提取该字母并生成首字母缩写。

 Portable Network Graphics -> PNG

代码

// Package acronym creates an acronym based on Capitalized Letters
package acronym

import (
	"regexp"
	"strings"
)

// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
	re := regexp.MustCompile(`\b[A-Za-z]`)
	abbreviation := strings.Join(re.FindAllString(s, -1), "")
	return strings.ToUpper(abbreviation)
}

我唯一失败的测试是

=== RUN   TestAcronym
    acronym_test.go:11: Acronym test [Halley's Comet], expected [HC], actual [HSC]
    acronym_test.go:11: Acronym test [The Road _Not_ Taken], expected [TRNT], actual [TRT]
--- FAIL: TestAcronym (0.00s)

Regex101 Playground

在Regex 101中的Playground链接

问题

我无法弄清楚如何仅编译Halley's Comet测试用例中的HC并获取The Road _Not_ Taken测试用例中的N。

我必须保留小写字符[a-z]的一个原因是因为Complementary metal-oxide semiconductor这种情况，以及其他某些测试用例中的小写字符。

我可以在正则表达式编译之前删除诸如-或_之类的字符，但我认为这不会使我的函数更通用（而只是为了通过测试而进行的修改）。

我想知道如何删除字符'和_，以使首字母缩写函数更健壮？

英文:

Go Version

go version go1.16.7 linux/amd64

Problem

I am going through an Exercise about creating acronyms and I chose to do it with regular expressions.

Some of the test cases given to me are following:

	input:    &quot;Ruby on Rails&quot;,
	expected: &quot;ROR&quot;
	
    input:    &quot;GNU Image Manipulation Program&quot;,
	expected: &quot;GIMP&quot;

	input:    &quot;Complementary metal-oxide semiconductor&quot;,
	expected: &quot;CMOS&quot;

	input:    &quot;Something - I made up from thin air&quot;,
	expected: &quot;SIMUFTA&quot;

	input:    &quot;Halley&#39;s Comet&quot;,
	expected: &quot;HC&quot;

	input:    &quot;The Road _Not_ Taken&quot;,
	expected: &quot;TRNT&quot;

The following code is what is able to pass a lot of simple tests where If the First Letter is capital then extract that letter and make an acronym out of it

 Portable Network Graphics -&gt; PNG

Code

// Package acronym creates an acronym based on Capitalized Letters
package acronym

import (
	&quot;regexp&quot;
	&quot;strings&quot;
)

// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
	re := regexp.MustCompile(`\b[A-Za-z]`)
	abbreviation := strings.Join(re.FindAllString(s, -1), &quot;&quot;)
	return strings.ToUpper(abbreviation)
}

The Only tests I am failing are

=== RUN   TestAcronym
    acronym_test.go:11: Acronym test [Halley&#39;s Comet], expected [HC], actual [HSC]
    acronym_test.go:11: Acronym test [The Road _Not_ Taken], expected [TRNT], actual [TRT]
--- FAIL: TestAcronym (0.00s)

Regex101 Playground

Link to Playground in Regex 101

Problem

I am unable to figure out how do I compile only the HC for Halley's Comet test case and obtain the N in the The Road _Not_ Taken test case.

One of the reasons I have to keep lower-case characters [a-z] is because of the case Complementary metal-oxide semiconductor and also because of other lower-case characters in certain test cases

I could actually remove such characters such as - or _ before the regexp compilation but I think that would not make my function more generic (rather hack to just past the test)

I wish to know how do I remove the characters ' and _ in order to make the acronym function more robust?

答案1

得分: 1

你可以使用以下代码来创建一个全称字符串的首字母缩写：

// Abbreviate: 为全称字符串创建首字母缩写
func Abbreviate(s string) string {
    var abbreviation = ""
    re := regexp.MustCompile(`\w'\w|(?:_|\b)([A-Za-z])`)
    for _, match := range re.FindAllStringSubmatch(s, -1) {
        abbreviation = abbreviation + match[1] 
    }
    return strings.ToUpper(abbreviation)
}

详细说明：

\w'\w - 单词字符，'，单词字符（为了避免匹配单词字符之间的 '，如果在连续匹配中出现问题，请替换为 \b'\w）
| - 或
(?:_|\b) - _ 或者单词边界
([A-Za-z]) - 第一组：一个 ASCII 字母（使用 \p{L} 来匹配任何 Unicode 字母）。

查看 Go 示例。

英文:

You may use

// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
    var abbreviation = &quot;&quot;
    re := regexp.MustCompile(`\w&#39;\w|(?:_|\b)([A-Za-z])`)
    for _, match := range re.FindAllStringSubmatch(s, -1) {
        abbreviation = abbreviation + match[1] 
    }
    return strings.ToUpper(abbreviation)
}

See the Go demo. Details:

\w'\w - word char, ', word char (to avoid matching ' in between word chars, if you have issues with consequent matches, replace with \b'\w)
| - or
(?:_|\b) - either _ or word boundary
([A-Za-z]) - Group 1: an ASCII letter (use \p{L} to match any Unicode letter).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

A Regular Expression to make acronyms with word boundaries and remove characters preceding a word

问题

Go版本

问题

代码

Regex101 Playground

问题

Go Version

Problem

Code

Regex101 Playground

Problem

答案1

In go language, may I define allowed values of a string in a struct and/or force creation only via constructor? Or avoid direct creation of a struct?

Go pipeline using channels

golang appengine内部服务器错误

给指针类型的结构属性添加方法

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论