正则表达式分割导致额外的空格,使用 \\s 但不能去除它

huangapple go评论83阅读模式
英文:

Regex splitting resulting in an extra space, using \\s but not getting rid of it

问题

我正在尝试使用正则表达式解析一个字符串,该字符串包含开括号和闭括号,在它们之间列出了小写英文字母,用逗号分隔。每个逗号后面跟一个空格。就像这样:

import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

    public static void main(String[] args) {
        try {
            BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
            String[] temp = br.readLine().split("\\s*(\\{|,|\\}|\\s)\\s*");

            for (int i = 0; i < temp.length; i++) {
                System.out.println("temp[" + i + "] ===>" + temp[i]);
            }

            Set<String> set = new HashSet<>();
            for (String a : temp) {
                set.add(a);
            }

            System.out.println(set.size());
        } catch (IOException ioe) {

        }
    }
}

当我将 {a, b, c} 作为输入传递时:

$ java StringLetters
{}
temp[0] ==>a
temp[1] ==>b
temp[2] ==>c
3

其他输入可以是 {}, {s, h, a, n, o, n, o}, {h, e, a, l, h, t} 等等。

所以当我执行以下命令时:

$ java StringLetters
{}
0

这是正确的,对于空字符串,我应该得到0。

在temp数组的第0个位置上的空字符串不是我想要的,为了摆脱它,我在 '\\s*(\\{|,|\\}|\\s)\\s*' 中使用了 \\s,但这在这里没有帮助!

英文:

I am trying to parse a string using regex, the string has opening parantheses and ending parantheses, between them small English letters are listed, separated by a comma. Each comma is followed by a space. Like this

import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

	public static void main(String[] args) {
		try {
			BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
			String[] temp = br.readLine().split(&quot;\\s*(\\{|,|\\}|\\s)\\s*&quot;);

			for (int i = 0; i &lt; temp.length; i++) {
				System.out.println(&quot;temp[&quot; + i + &quot;] ===&gt;&quot; + temp[i]);
			}

			Set&lt;String&gt; set = new HashSet&lt;&gt;();
			for (String a : temp) {
				set.add(a);
			}

			System.out.println(set.size());
		} catch (IOException ioe) {

		}
	}
}

And when I am passing {a, b, c} this as input

$ java StringLetters

  {a, b, c}

 temp[0] ===&gt;
 temp[1] ===&gt;a
 temp[2] ===&gt;b
 temp[3] ===&gt;c
 4

Other inputs can be {}, {s, h, a, n, o, n, o}, {h, e, a, l, h, t} ... etc

So when I am doing

   $ java StringLetters
    {}
    0

which is correct, for empty string I should be getting this 0.

The empty string at the 0th place in the temp array is not what i want, to get rid of that I am using \s inside the '(\{|,|\}|\s)' but that is not helping me here!!

答案1

得分: 2

以下是您要翻译的内容:

如果您只需要输入中的单个字母,我会选择一种不同的方法而不是拆分。

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) throws ParseException, IOException {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        Pattern pattern = Pattern.compile("\\p{Alpha}");
        Matcher matcher = pattern.matcher(br.readLine());

        Set<String> set = new HashSet<>();

        while (matcher.find()) {
            System.out.println(matcher.group());
            set.add(matcher.group());
        }

        System.out.println(set.size());
    }
}

一个示例运行:

{a, b, c}
a
b
c
3

另一个示例运行:

{}
0

注意: \p{Alpha} 代表单个字母,可以替换为 [A-Za-Z]。您可以在此处了解更多关于这些模式的信息。您还可以查看Java正则表达式教程

如果您想坚持使用自己的方法(即拆分字符串),可以按如下方式操作:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;

public class Main {
    public static void main(String[] args) throws ParseException, IOException {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        String[] temp = br.readLine().split("\\s*(\\{|,|\\}|\\s)\\s*");

        for (int i = 1; i < temp.length; i++) {// 从索引1开始
            System.out.println("temp[" + i + "] ===>" + temp[i]);
        }

        Set<String> set = new HashSet<>();
        for (String a : temp) {
            if (!a.isBlank() && !a.isEmpty()) {// 检查字符串是否不为空或不为空白
                set.add(a);
            }
        }

        System.out.println(set.size());
    }
}

一个示例运行:

{a, b, c}
temp[1] ==>a
temp[2] ==>b
temp[3] ==>c
3

我在代码中添加了注释,以便您更容易注意到这些更改。这些更改之所以需要是因为 String#split 返回一个至少大小为 1 的数组,例如 System.out.println("Hello".split("$").length) 将打印 1

英文:

If all you need are the single alphabets from the input, I would go with a different approach than splitting.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) throws ParseException, IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

		Pattern pattern = Pattern.compile(&quot;\\p{Alpha}&quot;);
		Matcher matcher = pattern.matcher(br.readLine());

		Set&lt;String&gt; set = new HashSet&lt;&gt;();

		while (matcher.find()) {
			System.out.println(matcher.group());
			set.add(matcher.group());
		}

		System.out.println(set.size());
	}
}

A sample run:

{a, b, c}
a
b
c
3

Another sample run:

{}
0

Note: \p{Alpha} stands for a single alphabet and can be replaced with [A-Za-Z]. Learn more about these patterns here. You would also like to check Java regex tutorial.

If you want to stick to your own way of doing it (i.e. splitting the string), you can do it as follows:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.util.HashSet;
import java.util.Set;

public class Main {
	public static void main(String[] args) throws ParseException, IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

		String[] temp = br.readLine().split(&quot;\\s*(\\{|,|\\}|\\s)\\s*&quot;);

		for (int i = 1; i &lt; temp.length; i++) {// Start with index 1
			System.out.println(&quot;temp[&quot; + i + &quot;] ===&gt;&quot; + temp[i]);
		}

		Set&lt;String&gt; set = new HashSet&lt;&gt;();
		for (String a : temp) {
			if (!a.isBlank() &amp;&amp; !a.isEmpty()) {// Check if the string is not empty or blank
				set.add(a);
			}
		}

		System.out.println(set.size());
	}
}

A sample run:

{a, b, c}
temp[1] ===&gt;a
temp[2] ===&gt;b
temp[3] ===&gt;c
3

I've put comments in the code to make it easier for you to notice the changes. The reason why these changes are required is String#split returns an array with a minimum size of 1 e.g. System.out.println(&quot;Hello&quot;.split(&quot;$&quot;).length) will print 1.

答案2

得分: 1

The regex is seeing the { as the first delimiter, giving you an empty string. The simple way is to filter the array as you create the Set:

Set<String> set = new HashSet<>();
for (String a : temp) {
    if (a != null && !a.isEmpty())
        set.add(a);
}
英文:

The regex is seeing the { as the first delimiter, giving you an empty string. The simple way is to filter the array as you create the Set:

        Set&lt;String&gt; set = new HashSet&lt;&gt;(); 
        for(String a : temp){
            if (a != null &amp;&amp; !a.isEmpty())
               set.add(a);
        }

答案3

得分: 0

以下是代码的翻译部分:

不调用拆分函数可以如下完成上述任务

import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {

    public static void main(String[] args){

        try{

            BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
            String str = br.readLine();
            Set<Character> set = new HashSet<>();
            for(int i = 0; i < str.length(); i++){

                if(str.charAt(i) != '{' && str.charAt(i) != '}' && str.charAt(i) != ' ' && str.charAt(i) != ','){

                    set.add(str.charAt(i));

                }
            }

            System.out.println(set.size());

        }catch(IOException ioe){
            ioe.printStackTrace();
        }
    }
}

关于正则表达式的部分,我明白你有兴趣,但不在本次翻译的范围内。

英文:

Without calling the split the above task can be done as follows :-

import java.util.Set;
import java.util.HashSet;
import java.io.*;

public class StringLetters {
  
    public static void main(String[] args){
      
        try{
           
          BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
          String str = br.readLine();
          Set&lt;Character&gt; set = new HashSet&lt;&gt;();
          for(int i = 0; i &lt; str.length(); i++){
            
              if(str.charAt(i) != &#39;{&#39; &amp;&amp; str.charAt(i) != &#39;}&#39; &amp;&amp; str.charAt(i) != &#39; &#39; &amp;&amp; str.charAt(i) != &#39;,&#39;){
                 
                 set.add(str.charAt(i));

              }
          }

          System.out.println(set.size());
        
        }catch(IOException ioe){
          ioe.printStackTrace();
        }
    }
}

The task itself is easy but I am interested in the regex, would like to know how this can be done using regex.

huangapple
  • 本文由 发表于 2020年8月1日 18:38:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/63204271.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定