2020年8月1日 18:38:50go评论83阅读模式

英文:

Regex splitting resulting in an extra space, using \\s but not getting rid of it

问题

我正在尝试使用正则表达式解析一个字符串，该字符串包含开括号和闭括号，在它们之间列出了小写英文字母，用逗号分隔。每个逗号后面跟一个空格。就像这样：

import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

    public static void main(String[] args) {
        try {
            BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
            String[] temp = br.readLine().split("\\s*(\\{|,|\\}|\\s)\\s*");

            for (int i = 0; i < temp.length; i++) {
                System.out.println("temp[" + i + "] ===>" + temp[i]);
            }

            Set<String> set = new HashSet<>();
            for (String a : temp) {
                set.add(a);
            }

            System.out.println(set.size());
        } catch (IOException ioe) {

        }
    }
}

当我将 {a, b, c} 作为输入传递时：

$ java StringLetters
{}
temp[0] ==>a
temp[1] ==>b
temp[2] ==>c
3

其他输入可以是 {}, {s, h, a, n, o, n, o}, {h, e, a, l, h, t} 等等。

所以当我执行以下命令时：

$ java StringLetters
{}
0

这是正确的，对于空字符串，我应该得到0。

在temp数组的第0个位置上的空字符串不是我想要的，为了摆脱它，我在 '\\s*(\\{|,|\\}|\\s)\\s*' 中使用了 \\s，但这在这里没有帮助！

英文:

I am trying to parse a string using regex, the string has opening parantheses and ending parantheses, between them small English letters are listed, separated by a comma. Each comma is followed by a space. Like this

import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

	public static void main(String[] args) {
		try {
			BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
			String[] temp = br.readLine().split(&quot;\\s*(\\{|,|\\}|\\s)\\s*&quot;);

			for (int i = 0; i &lt; temp.length; i++) {
				System.out.println(&quot;temp[&quot; + i + &quot;] ===&gt;&quot; + temp[i]);
			}

			Set&lt;String&gt; set = new HashSet&lt;&gt;();
			for (String a : temp) {
				set.add(a);
			}

			System.out.println(set.size());
		} catch (IOException ioe) {

		}
	}
}

And when I am passing {a, b, c} this as input

$ java StringLetters

  {a, b, c}

 temp[0] ===&gt;
 temp[1] ===&gt;a
 temp[2] ===&gt;b
 temp[3] ===&gt;c
 4

Other inputs can be {}, {s, h, a, n, o, n, o}, {h, e, a, l, h, t} ... etc

So when I am doing

   $ java StringLetters
    {}
    0

which is correct, for empty string I should be getting this 0.

The empty string at the 0th place in the temp array is not what i want, to get rid of that I am using \s inside the '(\{|,|\}|\s)' but that is not helping me here!!

答案1

得分: 2

以下是您要翻译的内容：

如果您只需要输入中的单个字母，我会选择一种不同的方法而不是拆分。

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) throws ParseException, IOException {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        Pattern pattern = Pattern.compile("\\p{Alpha}");
        Matcher matcher = pattern.matcher(br.readLine());

        Set<String> set = new HashSet<>();

        while (matcher.find()) {
            System.out.println(matcher.group());
            set.add(matcher.group());
        }

        System.out.println(set.size());
    }
}

一个示例运行：

{a, b, c}
a
b
c
3

另一个示例运行：

{}
0

注意： \p{Alpha} 代表单个字母，可以替换为 [A-Za-Z]。您可以在此处了解更多关于这些模式的信息。您还可以查看Java正则表达式教程。

如果您想坚持使用自己的方法（即拆分字符串），可以按如下方式操作：

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;

public class Main {
    public static void main(String[] args) throws ParseException, IOException {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        String[] temp = br.readLine().split("\\s*(\\{|,|\\}|\\s)\\s*");

        for (int i = 1; i < temp.length; i++) {// 从索引1开始
            System.out.println("temp[" + i + "] ===>" + temp[i]);
        }

        Set<String> set = new HashSet<>();
        for (String a : temp) {
            if (!a.isBlank() && !a.isEmpty()) {// 检查字符串是否不为空或不为空白
                set.add(a);
            }
        }

        System.out.println(set.size());
    }
}

一个示例运行：

{a, b, c}
temp[1] ==>a
temp[2] ==>b
temp[3] ==>c
3

我在代码中添加了注释，以便您更容易注意到这些更改。这些更改之所以需要是因为 String#split 返回一个至少大小为 1 的数组，例如 System.out.println("Hello".split("$").length) 将打印 1。

英文:

If all you need are the single alphabets from the input, I would go with a different approach than splitting.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) throws ParseException, IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

		Pattern pattern = Pattern.compile(&quot;\\p{Alpha}&quot;);
		Matcher matcher = pattern.matcher(br.readLine());

		Set&lt;String&gt; set = new HashSet&lt;&gt;();

		while (matcher.find()) {
			System.out.println(matcher.group());
			set.add(matcher.group());
		}

		System.out.println(set.size());
	}
}

A sample run:

{a, b, c}
a
b
c
3

Another sample run:

{}
0

Note: \p{Alpha} stands for a single alphabet and can be replaced with [A-Za-Z]. Learn more about these patterns here. You would also like to check Java regex tutorial.

If you want to stick to your own way of doing it (i.e. splitting the string), you can do it as follows:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;

public class Main {
	public static void main(String[] args) throws ParseException, IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

		String[] temp = br.readLine().split(&quot;\\s*(\\{|,|\\}|\\s)\\s*&quot;);

		for (int i = 1; i &lt; temp.length; i++) {// Start with index 1
			System.out.println(&quot;temp[&quot; + i + &quot;] ===&gt;&quot; + temp[i]);
		}

		Set&lt;String&gt; set = new HashSet&lt;&gt;();
		for (String a : temp) {
			if (!a.isBlank() &amp;&amp; !a.isEmpty()) {// Check if the string is not empty or blank
				set.add(a);
			}
		}

		System.out.println(set.size());
	}
}

A sample run:

{a, b, c}
temp[1] ===&gt;a
temp[2] ===&gt;b
temp[3] ===&gt;c
3

I've put comments in the code to make it easier for you to notice the changes. The reason why these changes are required is String#split returns an array with a minimum size of 1 e.g. System.out.println("Hello".split("$").length) will print 1.

答案2

得分: 1

The regex is seeing the { as the first delimiter, giving you an empty string. The simple way is to filter the array as you create the Set:

Set<String> set = new HashSet<>();
for (String a : temp) {
    if (a != null && !a.isEmpty())
        set.add(a);
}

英文:

The regex is seeing the { as the first delimiter, giving you an empty string. The simple way is to filter the array as you create the Set:

        Set&lt;String&gt; set = new HashSet&lt;&gt;(); 
        for(String a : temp){
            if (a != null &amp;&amp; !a.isEmpty())
               set.add(a);
        }

答案3

得分: 0

以下是代码的翻译部分：

不调用拆分函数，可以如下完成上述任务：

import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

    public static void main(String[] args){

        try{

            BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
            String str = br.readLine();
            Set<Character> set = new HashSet<>();
            for(int i = 0; i < str.length(); i++){

                if(str.charAt(i) != '{' && str.charAt(i) != '}' && str.charAt(i) != ' ' && str.charAt(i) != ','){

                    set.add(str.charAt(i));

                }
            }

            System.out.println(set.size());

        }catch(IOException ioe){
            ioe.printStackTrace();
        }
    }
}

关于正则表达式的部分，我明白你有兴趣，但不在本次翻译的范围内。

英文:

Without calling the split the above task can be done as follows :-

import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {
  
    public static void main(String[] args){
      
        try{
           
          BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
          String str = br.readLine();
          Set&lt;Character&gt; set = new HashSet&lt;&gt;();
          for(int i = 0; i &lt; str.length(); i++){
            
              if(str.charAt(i) != &#39;{&#39; &amp;&amp; str.charAt(i) != &#39;}&#39; &amp;&amp; str.charAt(i) != &#39; &#39; &amp;&amp; str.charAt(i) != &#39;,&#39;){
                 
                 set.add(str.charAt(i));

              }
          }

          System.out.println(set.size());
        
        }catch(IOException ioe){
          ioe.printStackTrace();
        }
    }
}

The task itself is easy but I am interested in the regex, would like to know how this can be done using regex.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式分割导致额外的空格，使用 \\s 但不能去除它

问题

答案1

答案2

答案3

如何在视图内以编程方式添加 CardView，但使用在 XML 中声明的布局？

我想在我的安卓应用中实现多用户登录。

有没有类似于 Lombok 的 @AllArgsConstructor，用于 @MockBeans？

数组在拦截器中不断添加相同的值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论