用模式将非字母的序列替换为一个短横线。

huangapple go评论62阅读模式
英文:

Replace sequence of non-alphabet with one dash using Pattern

问题

Here's the translated code portion:

public static void main(String[] args) {
    final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
    final String clean = input.replaceAll("[^a-zA-Z]+", "-");
    System.out.println(clean);
    // print ---------important---------------alsoimportant-------
    // would like -important-alsoimportant-
}

If you have any more code to translate, please provide it, and I'll assist you further.

英文:

What I am trying to achieve:

I have a string that looks like this: ��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~

Basically, inside the string, there are words (composed of only the alphabet) with gibberish (one or multiple non-alphabet) before and after.

I would like to retain the alphabet with the sequence of non-alphabet replaced by one dash only. If possible, use Java Pattern, because it is quite efficient.

What I tried:

public static void main(String[] args)  {
        final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
        final String clean = input.replaceAll("[^a-zA-Z]", "-");
        System.out.println(clean);
        // print ---------important---------------alsoimportant-------
        // would like -important-alsoimportant-
    }

Question:

With my current solution:

  • It is not using Pattern
  • there are a number of dashes equal to the number of non-alphabet. I just need one.
    What is a good way to achieve this?

答案1

得分: 2

你可以在正则表达式中使用 + 量词来匹配1个或更多连续的非字母字符:

final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = input.replaceAll("[^a-zA-Z]+", "-");
System.out.println(clean); // 输出 -important-alsoimportant-

同样适用于 Pattern(如果你需要多次进行这种替换):

final Pattern NON_ALPHABET = Pattern.compile("[^a-zA-Z]+");
final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = NON_ALPHABET.matcher(input).replaceAll("-");
System.out.println(clean); // 输出相同的结果
英文:

You can use the + quantifier in your regular expression to match 1+ consecutive non-alphabet characters:

 final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
 final String clean = input.replaceAll("[^a-zA-Z]+", "-"); 
 System.out.println(clean); // prints -important-alsoimportant-

Similar goes for the Pattern (might be helpful if you're doing this replacement multiple times):

final Pattern NON_ALPHABET = Pattern.compile("[^a-zA-Z]+");
final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = NON_ALPHABET.matcher(input).replaceAll("-");
System.out.println(clean); // prints the same

答案2

得分: 1

以下是翻译好的部分:

这是另一种方法,如果你关注单词的话。我想要使用 Pattern.compile("a-zA-Z]+") 来解决这个问题。在获取到期望的单词后,比如 "important" 和 "alsoimportant",使用 StringBuilder 将它们连接在一起非常方便。

注意:上面的模式假设所有单词仅由字母'a'到'z'或'A'到'Z'组成。但是有些单词,比如 "mp3",会被视为单词 "mp" 和无意义的 '3' 的组合。要避免这种情况,只需将模式更改为 Pattern.compile("\\w+")

英文:

Here's another way if you focus on the words. I'd like to use Pattern.compile("[a-zA-Z]+") to solve the problem. After getting the expected words, say, "important" and "alsoimportant", it is convenient to use StringBuilder to join them together.

public class Main {
    public static void main(String[] args) {
        String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
        Pattern pattern = Pattern.compile("[a-zA-Z]+");
        Matcher matcher = pattern.matcher(input);
        StringBuilder builder = new StringBuilder("-");
        while (matcher.find()) {
            builder.append(matcher.group()).append("-");
        }
        System.out.println(builder); // output "-important-alsoimportant-"
    }
}

Note: The pattern above assumes that all of the words consist only of letters from 'a' to 'z' or 'A' to 'Z'. Yet some of the words, like "mp3" would be regarded as the combination of the word "mp" and gibberish '3'. To avoid this, just change the pattern to Pattern.compile("\\w+").

答案3

得分: 1

如我之前所说,正则表达式和流一样,是一个很好的通用工具,但并不总是像命令式编程一样高效。以下内容比使用String.replaceAll要高效约37%。

boolean seen = false;
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
    if (!Character.isAlphabetic(c)) {
        if (!seen) {
            sb.append("-");
            seen = true;
        }
    } else {
        sb.append(c);
        seen = false;
    }
}
String result = sb.toString();

尽管时间差异很显著,但所需时间非常短。因此,在这个特定问题中,这不会产生很大的差异,我可能会选择使用String.replaceAll

英文:

As I have said before, regular expressions, like streams, are a good general tool but not always as efficient as imperative programming. The following was about 37% more efficient that using String.replaceAll

boolean seen = false;
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
    if (!Character.isAlphabetic(c)) {
        if (!seen) {
            sb.append("-");
            seen = true;
        }
    } else {
        sb.append(c);
        seen = false;
    }
}
String result = sb.toString();

Although the time differences were significant, the time required was very small. As such it wouldn't really make a big difference in this particular problem and I would have probably opted for String.replaceAll myself.

答案4

得分: -1

> "It is not using Pattern"

实际上是的。
String#replaceAll 方法正在使用 Pattern 类。

> "there are a number of dashes equal to the number of non-alphabet. I just need one. What is a good way to achieve this?"

您要实现的效果可以使用“+ 量词”来完成。
这将尝试匹配前面的值的“1个或多个”实例,这里是“字符类”。

final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = input.replaceAll("[^a-zA-Z]+", "-");
System.out.println(clean);
// 输出 -important-alsoimportant-

输出

-important-alsoimportant-

如果您想要显式地实现 PatternMatcher 类,您可以使用以下代码。

Pattern pattern = Pattern.compile("[^a-zA-Z]+");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceAll("-"));

输出

-important-alsoimportant-
英文:

> "It is not using Pattern"

It is, actually.
The String#replaceAll method is utilizing the Pattern class.

> "there are a number of dashes equal to the number of non-alphabet. I just need one. What is a good way to achieve this?"

The effect you're looking to achieve can be done with the + quantifer.
This will attempt to match 1 or more of the the preceding value, in this case the character class.

final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = input.replaceAll("[^a-zA-Z]+", "-");
System.out.println(clean);
// print ---------important---------------alsoimportant-------
// would like -important-alsoimportant-

Output

-important-alsoimportant-

If you'd like to implement the Pattern and Matcher classes explicitly, you can use the following.

Pattern pattern = Pattern.compile("[^a-zA-Z]+");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceAll("-"));

Output

-important-alsoimportant-

huangapple
  • 本文由 发表于 2023年6月9日 01:29:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434332.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定