英文:
A Regular Expression to make acronyms with word boundaries and remove characters preceding a word
问题
Go版本
go version go1.16.7 linux/amd64
问题
我正在进行一个关于创建首字母缩写的练习,我选择使用正则表达式来完成。
以下是给我的一些测试用例:
input: "Ruby on Rails",
expected: "ROR"
input: "GNU Image Manipulation Program",
expected: "GIMP"
input: "Complementary metal-oxide semiconductor",
expected: "CMOS"
input: "Something - I made up from thin air",
expected: "SIMUFTA"
input: "Halley's Comet",
expected: "HC"
input: "The Road _Not_ Taken",
expected: "TRNT"
下面的代码可以通过许多简单的测试,如果首字母是大写,则提取该字母并生成首字母缩写。
Portable Network Graphics -> PNG
代码
// Package acronym creates an acronym based on Capitalized Letters
package acronym
import (
"regexp"
"strings"
)
// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
re := regexp.MustCompile(`\b[A-Za-z]`)
abbreviation := strings.Join(re.FindAllString(s, -1), "")
return strings.ToUpper(abbreviation)
}
我唯一失败的测试是
=== RUN TestAcronym
acronym_test.go:11: Acronym test [Halley's Comet], expected [HC], actual [HSC]
acronym_test.go:11: Acronym test [The Road _Not_ Taken], expected [TRNT], actual [TRT]
--- FAIL: TestAcronym (0.00s)
Regex101 Playground
问题
我无法弄清楚如何仅编译Halley's Comet
测试用例中的HC
并获取The Road _Not_ Taken
测试用例中的N
。
我必须保留小写字符[a-z]
的一个原因是因为Complementary metal-oxide semiconductor
这种情况,以及其他某些测试用例中的小写字符。
我可以在正则表达式编译之前删除诸如-
或_
之类的字符,但我认为这不会使我的函数更通用(而只是为了通过测试而进行的修改)。
我想知道如何删除字符'
和_
,以使首字母缩写函数更健壮?
英文:
Go Version
go version go1.16.7 linux/amd64
Problem
I am going through an Exercise about creating acronyms and I chose to do it with regular expressions.
Some of the test cases given to me are following:
input: "Ruby on Rails",
expected: "ROR"
input: "GNU Image Manipulation Program",
expected: "GIMP"
input: "Complementary metal-oxide semiconductor",
expected: "CMOS"
input: "Something - I made up from thin air",
expected: "SIMUFTA"
input: "Halley's Comet",
expected: "HC"
input: "The Road _Not_ Taken",
expected: "TRNT"
The following code is what is able to pass a lot of simple tests where If the First Letter is capital then extract that letter and make an acronym out of it
Portable Network Graphics -> PNG
Code
// Package acronym creates an acronym based on Capitalized Letters
package acronym
import (
"regexp"
"strings"
)
// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
re := regexp.MustCompile(`\b[A-Za-z]`)
abbreviation := strings.Join(re.FindAllString(s, -1), "")
return strings.ToUpper(abbreviation)
}
The Only tests I am failing are
=== RUN TestAcronym
acronym_test.go:11: Acronym test [Halley's Comet], expected [HC], actual [HSC]
acronym_test.go:11: Acronym test [The Road _Not_ Taken], expected [TRNT], actual [TRT]
--- FAIL: TestAcronym (0.00s)
Regex101 Playground
Link to Playground in Regex 101
Problem
I am unable to figure out how do I compile only the HC
for Halley's Comet
test case and obtain the N
in the The Road _Not_ Taken
test case.
One of the reasons I have to keep lower-case characters [a-z]
is because of the case Complementary metal-oxide semiconductor
and also because of other lower-case characters in certain test cases
I could actually remove such characters such as -
or _
before the regexp compilation but I think that would not make my function more generic (rather hack to just past the test)
I wish to know how do I remove the characters '
and _
in order to make the acronym function more robust?
答案1
得分: 1
你可以使用以下代码来创建一个全称字符串的首字母缩写:
// Abbreviate: 为全称字符串创建首字母缩写
func Abbreviate(s string) string {
var abbreviation = ""
re := regexp.MustCompile(`\w'\w|(?:_|\b)([A-Za-z])`)
for _, match := range re.FindAllStringSubmatch(s, -1) {
abbreviation = abbreviation + match[1]
}
return strings.ToUpper(abbreviation)
}
详细说明:
\w'\w
- 单词字符,'
,单词字符(为了避免匹配单词字符之间的'
,如果在连续匹配中出现问题,请替换为\b'\w
)|
- 或(?:_|\b)
-_
或者单词边界([A-Za-z])
- 第一组:一个 ASCII 字母(使用\p{L}
来匹配任何 Unicode 字母)。
查看 Go 示例。
英文:
You may use
// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
var abbreviation = ""
re := regexp.MustCompile(`\w'\w|(?:_|\b)([A-Za-z])`)
for _, match := range re.FindAllStringSubmatch(s, -1) {
abbreviation = abbreviation + match[1]
}
return strings.ToUpper(abbreviation)
}
See the Go demo. Details:
\w'\w
- word char,'
, word char (to avoid matching'
in between word chars, if you have issues with consequent matches, replace with\b'\w
)|
- or(?:_|\b)
- either_
or word boundary([A-Za-z])
- Group 1: an ASCII letter (use\p{L}
to match any Unicode letter).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论