2023年6月9日 01:29:29go评论62阅读模式

英文:

Replace sequence of non-alphabet with one dash using Pattern

问题

Here's the translated code portion:

public static void main(String[] args) {
    final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
    final String clean = input.replaceAll("[^a-zA-Z]+", "-");
    System.out.println(clean);
    // print ---------important---------------alsoimportant-------
    // would like -important-alsoimportant-
}

If you have any more code to translate, please provide it, and I'll assist you further.

英文:

What I am trying to achieve:

I have a string that looks like this: ��[1]��important?@~?@~?@~?@~?@~alsoimportant�@~@~@~

Basically, inside the string, there are words (composed of only the alphabet) with gibberish (one or multiple non-alphabet) before and after.

I would like to retain the alphabet with the sequence of non-alphabet replaced by one dash only. If possible, use Java Pattern, because it is quite efficient.

What I tried:

public static void main(String[] args)  {
        final String input = &quot;��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~&quot;;
        final String clean = input.replaceAll(&quot;[^a-zA-Z]&quot;, &quot;-&quot;);
        System.out.println(clean);
        // print ---------important---------------alsoimportant-------
        // would like -important-alsoimportant-
    }

Question:

With my current solution:

It is not using Pattern
there are a number of dashes equal to the number of non-alphabet. I just need one.
What is a good way to achieve this?

答案1

得分: 2

你可以在正则表达式中使用 + 量词来匹配1个或更多连续的非字母字符：

final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = input.replaceAll("[^a-zA-Z]+", "-");
System.out.println(clean); // 输出 -important-alsoimportant-

同样适用于 Pattern（如果你需要多次进行这种替换）：

final Pattern NON_ALPHABET = Pattern.compile("[^a-zA-Z]+");
final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = NON_ALPHABET.matcher(input).replaceAll("-");
System.out.println(clean); // 输出相同的结果

英文:

You can use the + quantifier in your regular expression to match 1+ consecutive non-alphabet characters:

 final String input = &quot;��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~&quot;;
 final String clean = input.replaceAll(&quot;[^a-zA-Z]+&quot;, &quot;-&quot;); 
 System.out.println(clean); // prints -important-alsoimportant-

Similar goes for the Pattern (might be helpful if you're doing this replacement multiple times):

final Pattern NON_ALPHABET = Pattern.compile(&quot;[^a-zA-Z]+&quot;);
final String input = &quot;��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~&quot;;
final String clean = NON_ALPHABET.matcher(input).replaceAll(&quot;-&quot;);
System.out.println(clean); // prints the same

答案2

得分: 1

以下是翻译好的部分：

这是另一种方法，如果你关注单词的话。我想要使用 Pattern.compile("a-zA-Z]+") 来解决这个问题。在获取到期望的单词后，比如 "important" 和 "alsoimportant"，使用 StringBuilder 将它们连接在一起非常方便。

注意：上面的模式假设所有单词仅由字母'a'到'z'或'A'到'Z'组成。但是有些单词，比如 "mp3"，会被视为单词 "mp" 和无意义的 '3' 的组合。要避免这种情况，只需将模式更改为 Pattern.compile("\\w+")。

英文:

Here's another way if you focus on the words. I'd like to use Pattern.compile("[a-zA-Z]+") to solve the problem. After getting the expected words, say, "important" and "alsoimportant", it is convenient to use StringBuilder to join them together.

public class Main {
    public static void main(String[] args) {
        String input = &quot;��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~&quot;;
        Pattern pattern = Pattern.compile(&quot;[a-zA-Z]+&quot;);
        Matcher matcher = pattern.matcher(input);
        StringBuilder builder = new StringBuilder(&quot;-&quot;);
        while (matcher.find()) {
            builder.append(matcher.group()).append(&quot;-&quot;);
        }
        System.out.println(builder); // output &quot;-important-alsoimportant-&quot;
    }
}

Note: The pattern above assumes that all of the words consist only of letters from 'a' to 'z' or 'A' to 'Z'. Yet some of the words, like "mp3" would be regarded as the combination of the word "mp" and gibberish '3'. To avoid this, just change the pattern to Pattern.compile("\\w+").

答案3

得分: 1

如我之前所说，正则表达式和流一样，是一个很好的通用工具，但并不总是像命令式编程一样高效。以下内容比使用String.replaceAll要高效约37%。

boolean seen = false;
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
    if (!Character.isAlphabetic(c)) {
        if (!seen) {
            sb.append("-");
            seen = true;
        }
    } else {
        sb.append(c);
        seen = false;
    }
}
String result = sb.toString();

尽管时间差异很显著，但所需时间非常短。因此，在这个特定问题中，这不会产生很大的差异，我可能会选择使用String.replaceAll。

英文:

As I have said before, regular expressions, like streams, are a good general tool but not always as efficient as imperative programming. The following was about 37% more efficient that using String.replaceAll

boolean seen = false;
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
    if (!Character.isAlphabetic(c)) {
        if (!seen) {
            sb.append(&quot;-&quot;);
            seen = true;
        }
    } else {
        sb.append(c);
        seen = false;
    }
}
String result = sb.toString();

Although the time differences were significant, the time required was very small. As such it wouldn't really make a big difference in this particular problem and I would have probably opted for String.replaceAll myself.

答案4

得分: -1

> "It is not using Pattern"

实际上是的。
String#replaceAll 方法正在使用 Pattern 类。

> "there are a number of dashes equal to the number of non-alphabet. I just need one. What is a good way to achieve this?"

您要实现的效果可以使用“+ 量词”来完成。
这将尝试匹配前面的值的“1个或多个”实例，这里是“字符类”。

final String input = "��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~";
final String clean = input.replaceAll("[^a-zA-Z]+", "-");
System.out.println(clean);
// 输出 -important-alsoimportant-

输出

-important-alsoimportant-

如果您想要显式地实现 Pattern 和 Matcher 类，您可以使用以下代码。

Pattern pattern = Pattern.compile("[^a-zA-Z]+");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceAll("-"));

输出

-important-alsoimportant-

英文:

> "It is not using Pattern"

It is, actually.
The String#replaceAll method is utilizing the Pattern class.

> "there are a number of dashes equal to the number of non-alphabet. I just need one. What is a good way to achieve this?"

The effect you're looking to achieve can be done with the + quantifer.
This will attempt to match 1 or more of the the preceding value, in this case the character class.

final String input = &quot;��[1]����important?@~?@~?@~?@~?@~alsoimportant�@~@~@~&quot;;
final String clean = input.replaceAll(&quot;[^a-zA-Z]+&quot;, &quot;-&quot;);
System.out.println(clean);
// print ---------important---------------alsoimportant-------
// would like -important-alsoimportant-

Output

-important-alsoimportant-

If you'd like to implement the Pattern and Matcher classes explicitly, you can use the following.

Pattern pattern = Pattern.compile(&quot;[^a-zA-Z]+&quot;);
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceAll(&quot;-&quot;));

Output

-important-alsoimportant-

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用模式将非字母的序列替换为一个短横线。

问题

答案1

答案2

答案3

答案4

实例和对象有区别吗？

关于下面问题特定的类创建问题。

Map接口是否扩展或实现其他接口或类？

Detected dialect: W3C using Selenium Java

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论