分割字符串并提取文本和数字

huangapple go评论74阅读模式
英文:

Split string and extract text and number

问题

以下是您要的翻译内容:

我必须将地址分割为街道和编号。示例:

Lievensberg 31D
Jablunkovska 21/2
Weimarstraat 113 A
Pastoor Baltesenstraat 22
Van Musschenboek strasse 84

我需要这样分割:

街道1:Lievensberg
编号1:31D

街道2:Jablunkovska
编号2:21/2

街道3:Weimarstraat
编号3:113 A

街道4:Pastoor Baltesenstraat
编号4:22

街道5:Van Musschenboek strasse
编号5:84

我使用了以下代码,但不起作用,因为我只需要在空格后面的字符是数字时才进行分割:

String[] arrSplit = address_line.split("\\s");
for (int i = 0; i < arrSplit.length; i++) {
    System.out.println(arrSplit[i]);
}

但我不知道如何做到满足我所有的要求。有任何想法吗?

英文:

I have to divide an address into street and number. Examples

Lievensberg 31D
Jablunkovska 21/2
Weimarstraat 113 A
Pastoor Baltesenstraat 22
Van Musschenboek strasse 84

I need to split like this:

Street1: Lievensberg
Number1: 31D

Street2: Jablunkovska
Number2: 21/2

Street3: Weimarstraat
Number3: 113 A

Street4: Pastoor Baltesenstraat
Number4: 22

Street5: Van Musschenboek strasse
Number5: 84

I used this code but not working, because I need to split only when the character after the white space will be a number:

String[] arrSplit = address_line.split(&quot;\\s&quot;);   
for (int i = 0; i &lt; arrSplit.length; i++) {     
	System.out.println(arrSplit[i]);   
}

But I don't know how to do it so that all my requirements are met. Any idea?

答案1

得分: 2

如果数字是可选的,可以使用两个捕获组,其中第二个组是可选的。

^([^\d\r\n]+?)(?:\h*(\d.*)|$)

解释

  • ^ 字符串开始
  • ([^\d\r\n]+?) 匹配1个或多个字符,但不包括数字或换行符,非贪婪模式
  • (?: 非捕获组
    • \h*(\d.*) 匹配0个或多个水平空白字符
    • | 或者
    • $ 字符串结束
  • ) 关闭非捕获组

正则表达式示例 | Java 示例

示例代码

String regex = "^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$);"
String string = "Lievensberg 31D\n"
 + "Jablunkovska 21/2\n"
 + "Weimarstraat 113 A\n"
 + "Pastoor Baltesenstraat 22\n"
 + "Van Musschenboek strasse 84\n"
 + "Lievensberg";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println("Street: " + matcher.group(1));
    if (matcher.group(2) != null) {
        System.out.println("Number: " + matcher.group(2));	
    }
    System.out.println("------------------");
}

输出

Street: Lievensberg
Number: 31D
------------------
Street: Jablunkovska
Number: 21/2
------------------
Street: Weimarstraat
Number: 113 A
------------------
Street: Pastoor Baltesenstraat
Number: 22
------------------
Street: Van Musschenboek strasse
Number: 84
------------------
Street: Lievensberg
------------------
英文:

If the number can be optional, instead of using split, you could use 2 capturing groups where the second group is optional.

^([^\d\r\n]+?)(?:\h*(\d.*)|$)

Explanation

  • ^ Start of string
  • ([^\d\r\n]+?) Match 1+ times any char except a digit or newline non greedy
  • (?: Non capture group
    • \h*(\d.*) Match 0+ horizontal whitespace chars
    • | Or
    • $ End of string
  • ) Close non capture group

Regex demo | Java demo

Example code

String regex = &quot;^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$)&quot;;
String string = &quot;Lievensberg 31D\n&quot;
 + &quot;Jablunkovska 21/2\n&quot;
 + &quot;Weimarstraat 113 A\n&quot;
 + &quot;Pastoor Baltesenstraat 22\n&quot;
 + &quot;Van Musschenboek strasse 84\n&quot;
 + &quot;Lievensberg&quot;;

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(&quot;Street: &quot; + matcher.group(1));
    if (matcher.group(2) != null) {
    	System.out.println(&quot;Number: &quot; + matcher.group(2));	
    }
    System.out.println(&quot;------------------&quot;);
}

Output

Street: Lievensberg
Number: 31D
------------------
Street: Jablunkovska
Number: 21/2
------------------
Street: Weimarstraat
Number: 113 A
------------------
Street: Pastoor Baltesenstraat
Number: 22
------------------
Street: Van Musschenboek strasse
Number: 84
------------------
Street: Lievensberg
------------------

答案2

得分: 1

你可以使用正则表达式首先验证是否匹配,然后再进行处理。

String str1 = "Lievensberg 31D"; // 街道 = Lievensberg,号码 = 31D
String str2 = "Lievensberg NN31D"; // 不匹配
String str3 = "Lievensberg"; // 街道 = Lievensberg,号码 = null
String str4 = "Pastoor Baltesenstraat 22"; // 街道 = Pastoor Baltesenstraat,号码 = 22

Pattern pattern = Pattern.compile("([a-zA-Z ]+?)(\\s(\\d+)(.*))?");
Matcher matcher = pattern.matcher(str1);
if (matcher.matches()) {
    String street = matcher.group(1);
    String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
    System.out.println("街道 = " + street);
    System.out.println("号码 = " + number);
}
英文:

You can use regex to verify first whether it matches or not, then only process it.

String str1 = &quot;Lievensberg 31D&quot;; // street = Lievensberg, number = 31D
String str2 = &quot;Lievensberg NN31D&quot;; // doesn&#39;t matches
String str3 = &quot;Lievensberg&quot;; // street = Lievensberg, number = null
String str4 = &quot;Pastoor Baltesenstraat 22&quot;; // street = Pastoor Baltesenstraat, number = 22


Pattern pattern = Pattern.compile(&quot;([a-zA-Z ]+?)(\\s(\\d+)(.*))?&quot;);
Matcher matcher = pattern.matcher(str1);
if(matcher.matches()) {
    String street = matcher.group(1);
    String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
    System.out.println(&quot;street = &quot; + street);
    System.out.println(&quot;number = &quot; + number);
}

答案3

得分: 1

ArrayList<String> list = new ArrayList();
list.add("Lievensberg 31D");
list.add("Jablunkovska 21/2");
list.add("Weimarstraat 113 A");
list.add("Pastoor Baltesenstraat 22");
list.add("Van Musschenboek strasse 84");

for(int i=0;i<list.size();i++){     
    System.out.println("Street"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[0]);
    System.out.println("Number"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[1]);
}
英文:

Something like this:

    ArrayList&lt;String&gt; list = new ArrayList();
    list.add(&quot;Lievensberg 31D&quot;);
    list.add(&quot;Jablunkovska 21/2&quot;);
    list.add(&quot;Weimarstraat 113 A&quot;);
    list.add(&quot;Pastoor Baltesenstraat 22&quot;);
    list.add(&quot;Van Musschenboek strasse 84&quot;);
    
    for(int i=0;i&lt;list.size();i++){     
        System.out.println(&quot;Street&quot;+(i+1)+&quot;: &quot;+ list.get(i).split(&quot;\\s+(?=\\d)&quot;)[0]);
        System.out.println(&quot;Number&quot;+(i+1)+&quot;: &quot;+ list.get(i).split(&quot;\\s+(?=\\d)&quot;)[1]);
    }

答案4

得分: 0

你可以使用这个逻辑:

  1. 找到第一个数字的索引
  2. 根据这个索引来分割字符串

为了更好地理解,使用以下代码:

public static void main(String[] args) {
    
    String address_line = "Weimarstraat 113 A";
    
    // 找到第一个数字的索引
    Matcher matcher = Pattern.compile("\\d+").matcher(address_line);
    int i = -1;
    for(char c: address_line.toCharArray() ){
        if('0'<=c && c<='9')
            break;
            
        i++;
    }
        
    // 使用索引分割字符串
    System.out.println(address_line.substring(0, i));
    System.out.println(address_line.substring(i+1));
    
}

其输出将是:

Weimarstraat
113 A
英文:

You can use this logic:

  1. Find the index of the first number
  2. Split the string based on this index

For better understanding use below code
public static void main(String[] args) {

    String address_line = &quot;Weimarstraat 113 A&quot;;
    
    // Find index of first number
    Matcher matcher = Pattern.compile(&quot;\\d+&quot;).matcher(address_line);
    int i = -1;
    for(char c: address_line.toCharArray() ){
        if(&#39;0&#39;&lt;=c &amp;&amp; c&lt;=&#39;9&#39;)
            break;
            
        i++;
    }
        
    //Split string using index
    System.out.println(address_line.substring(0, i));
    System.out.println(address_line.substring(i+1));
    
}

Its output will be:

Weimarstraat                                                                                                                                                  
113 A  

答案5

得分: -1

这是使用正则表达式和分割的简单解决方案:

String str = "Jablunkovska 21/2";
String[] split = str.split("\\s(?=\\d)", 2);
System.out.println(Arrays.toString(split));

输出:

[Jablunkovska, 21/2]

表达式 (?=\\d) 是一个数字的前瞻,因此它不会在分割时被移除。

英文:

Here's a simple solution using regex and split:

String str = &quot;Jablunkovska 21/2&quot;;
String[] split = str.split(&quot;\\s(?=\\d)&quot;, 2);
System.out.println(Arrays.toString(split));

Output:

[Jablunkovska, 21/2]

The expression (?=\\d) is a lookahead for a digit, so it doesn't get removed with the split.

huangapple
  • 本文由 发表于 2020年9月17日 20:58:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/63938555.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定