分割字符串并提取文本和数字

huangapple go评论88阅读模式
英文:

Split string and extract text and number

问题

以下是您要的翻译内容:

我必须将地址分割为街道和编号。示例:

  1. Lievensberg 31D
  2. Jablunkovska 21/2
  3. Weimarstraat 113 A
  4. Pastoor Baltesenstraat 22
  5. Van Musschenboek strasse 84

我需要这样分割:

  1. 街道1Lievensberg
  2. 编号131D
  3. 街道2Jablunkovska
  4. 编号221/2
  5. 街道3Weimarstraat
  6. 编号3113 A
  7. 街道4Pastoor Baltesenstraat
  8. 编号422
  9. 街道5Van Musschenboek strasse
  10. 编号584

我使用了以下代码,但不起作用,因为我只需要在空格后面的字符是数字时才进行分割:

  1. String[] arrSplit = address_line.split("\\s");
  2. for (int i = 0; i < arrSplit.length; i++) {
  3. System.out.println(arrSplit[i]);
  4. }

但我不知道如何做到满足我所有的要求。有任何想法吗?

英文:

I have to divide an address into street and number. Examples

  1. Lievensberg 31D
  2. Jablunkovska 21/2
  3. Weimarstraat 113 A
  4. Pastoor Baltesenstraat 22
  5. Van Musschenboek strasse 84

I need to split like this:

  1. Street1: Lievensberg
  2. Number1: 31D
  3. Street2: Jablunkovska
  4. Number2: 21/2
  5. Street3: Weimarstraat
  6. Number3: 113 A
  7. Street4: Pastoor Baltesenstraat
  8. Number4: 22
  9. Street5: Van Musschenboek strasse
  10. Number5: 84

I used this code but not working, because I need to split only when the character after the white space will be a number:

  1. String[] arrSplit = address_line.split(&quot;\\s&quot;);
  2. for (int i = 0; i &lt; arrSplit.length; i++) {
  3. System.out.println(arrSplit[i]);
  4. }

But I don't know how to do it so that all my requirements are met. Any idea?

答案1

得分: 2

如果数字是可选的,可以使用两个捕获组,其中第二个组是可选的。

  1. ^([^\d\r\n]+?)(?:\h*(\d.*)|$)

解释

  • ^ 字符串开始
  • ([^\d\r\n]+?) 匹配1个或多个字符,但不包括数字或换行符,非贪婪模式
  • (?: 非捕获组
    • \h*(\d.*) 匹配0个或多个水平空白字符
    • | 或者
    • $ 字符串结束
  • ) 关闭非捕获组

正则表达式示例 | Java 示例

示例代码

  1. String regex = "^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$);"
  2. String string = "Lievensberg 31D\n"
  3. + "Jablunkovska 21/2\n"
  4. + "Weimarstraat 113 A\n"
  5. + "Pastoor Baltesenstraat 22\n"
  6. + "Van Musschenboek strasse 84\n"
  7. + "Lievensberg";
  8. Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
  9. Matcher matcher = pattern.matcher(string);
  10. while (matcher.find()) {
  11. System.out.println("Street: " + matcher.group(1));
  12. if (matcher.group(2) != null) {
  13. System.out.println("Number: " + matcher.group(2));
  14. }
  15. System.out.println("------------------");
  16. }

输出

  1. Street: Lievensberg
  2. Number: 31D
  3. ------------------
  4. Street: Jablunkovska
  5. Number: 21/2
  6. ------------------
  7. Street: Weimarstraat
  8. Number: 113 A
  9. ------------------
  10. Street: Pastoor Baltesenstraat
  11. Number: 22
  12. ------------------
  13. Street: Van Musschenboek strasse
  14. Number: 84
  15. ------------------
  16. Street: Lievensberg
  17. ------------------
英文:

If the number can be optional, instead of using split, you could use 2 capturing groups where the second group is optional.

  1. ^([^\d\r\n]+?)(?:\h*(\d.*)|$)

Explanation

  • ^ Start of string
  • ([^\d\r\n]+?) Match 1+ times any char except a digit or newline non greedy
  • (?: Non capture group
    • \h*(\d.*) Match 0+ horizontal whitespace chars
    • | Or
    • $ End of string
  • ) Close non capture group

Regex demo | Java demo

Example code

  1. String regex = &quot;^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$)&quot;;
  2. String string = &quot;Lievensberg 31D\n&quot;
  3. + &quot;Jablunkovska 21/2\n&quot;
  4. + &quot;Weimarstraat 113 A\n&quot;
  5. + &quot;Pastoor Baltesenstraat 22\n&quot;
  6. + &quot;Van Musschenboek strasse 84\n&quot;
  7. + &quot;Lievensberg&quot;;
  8. Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
  9. Matcher matcher = pattern.matcher(string);
  10. while (matcher.find()) {
  11. System.out.println(&quot;Street: &quot; + matcher.group(1));
  12. if (matcher.group(2) != null) {
  13. System.out.println(&quot;Number: &quot; + matcher.group(2));
  14. }
  15. System.out.println(&quot;------------------&quot;);
  16. }

Output

  1. Street: Lievensberg
  2. Number: 31D
  3. ------------------
  4. Street: Jablunkovska
  5. Number: 21/2
  6. ------------------
  7. Street: Weimarstraat
  8. Number: 113 A
  9. ------------------
  10. Street: Pastoor Baltesenstraat
  11. Number: 22
  12. ------------------
  13. Street: Van Musschenboek strasse
  14. Number: 84
  15. ------------------
  16. Street: Lievensberg
  17. ------------------

答案2

得分: 1

你可以使用正则表达式首先验证是否匹配,然后再进行处理。

  1. String str1 = "Lievensberg 31D"; // 街道 = Lievensberg,号码 = 31D
  2. String str2 = "Lievensberg NN31D"; // 不匹配
  3. String str3 = "Lievensberg"; // 街道 = Lievensberg,号码 = null
  4. String str4 = "Pastoor Baltesenstraat 22"; // 街道 = Pastoor Baltesenstraat,号码 = 22
  5. Pattern pattern = Pattern.compile("([a-zA-Z ]+?)(\\s(\\d+)(.*))?");
  6. Matcher matcher = pattern.matcher(str1);
  7. if (matcher.matches()) {
  8. String street = matcher.group(1);
  9. String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
  10. System.out.println("街道 = " + street);
  11. System.out.println("号码 = " + number);
  12. }
英文:

You can use regex to verify first whether it matches or not, then only process it.

  1. String str1 = &quot;Lievensberg 31D&quot;; // street = Lievensberg, number = 31D
  2. String str2 = &quot;Lievensberg NN31D&quot;; // doesn&#39;t matches
  3. String str3 = &quot;Lievensberg&quot;; // street = Lievensberg, number = null
  4. String str4 = &quot;Pastoor Baltesenstraat 22&quot;; // street = Pastoor Baltesenstraat, number = 22
  5. Pattern pattern = Pattern.compile(&quot;([a-zA-Z ]+?)(\\s(\\d+)(.*))?&quot;);
  6. Matcher matcher = pattern.matcher(str1);
  7. if(matcher.matches()) {
  8. String street = matcher.group(1);
  9. String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
  10. System.out.println(&quot;street = &quot; + street);
  11. System.out.println(&quot;number = &quot; + number);
  12. }

答案3

得分: 1

  1. ArrayList<String> list = new ArrayList();
  2. list.add("Lievensberg 31D");
  3. list.add("Jablunkovska 21/2");
  4. list.add("Weimarstraat 113 A");
  5. list.add("Pastoor Baltesenstraat 22");
  6. list.add("Van Musschenboek strasse 84");
  7. for(int i=0;i<list.size();i++){
  8. System.out.println("Street"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[0]);
  9. System.out.println("Number"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[1]);
  10. }
英文:

Something like this:

  1. ArrayList&lt;String&gt; list = new ArrayList();
  2. list.add(&quot;Lievensberg 31D&quot;);
  3. list.add(&quot;Jablunkovska 21/2&quot;);
  4. list.add(&quot;Weimarstraat 113 A&quot;);
  5. list.add(&quot;Pastoor Baltesenstraat 22&quot;);
  6. list.add(&quot;Van Musschenboek strasse 84&quot;);
  7. for(int i=0;i&lt;list.size();i++){
  8. System.out.println(&quot;Street&quot;+(i+1)+&quot;: &quot;+ list.get(i).split(&quot;\\s+(?=\\d)&quot;)[0]);
  9. System.out.println(&quot;Number&quot;+(i+1)+&quot;: &quot;+ list.get(i).split(&quot;\\s+(?=\\d)&quot;)[1]);
  10. }

答案4

得分: 0

你可以使用这个逻辑:

  1. 找到第一个数字的索引
  2. 根据这个索引来分割字符串

为了更好地理解,使用以下代码:

  1. public static void main(String[] args) {
  2. String address_line = "Weimarstraat 113 A";
  3. // 找到第一个数字的索引
  4. Matcher matcher = Pattern.compile("\\d+").matcher(address_line);
  5. int i = -1;
  6. for(char c: address_line.toCharArray() ){
  7. if('0'<=c && c<='9')
  8. break;
  9. i++;
  10. }
  11. // 使用索引分割字符串
  12. System.out.println(address_line.substring(0, i));
  13. System.out.println(address_line.substring(i+1));
  14. }

其输出将是:

  1. Weimarstraat
  2. 113 A
英文:

You can use this logic:

  1. Find the index of the first number
  2. Split the string based on this index

For better understanding use below code
public static void main(String[] args) {

  1. String address_line = &quot;Weimarstraat 113 A&quot;;
  2. // Find index of first number
  3. Matcher matcher = Pattern.compile(&quot;\\d+&quot;).matcher(address_line);
  4. int i = -1;
  5. for(char c: address_line.toCharArray() ){
  6. if(&#39;0&#39;&lt;=c &amp;&amp; c&lt;=&#39;9&#39;)
  7. break;
  8. i++;
  9. }
  10. //Split string using index
  11. System.out.println(address_line.substring(0, i));
  12. System.out.println(address_line.substring(i+1));
  13. }

Its output will be:

  1. Weimarstraat
  2. 113 A

答案5

得分: -1

这是使用正则表达式和分割的简单解决方案:

  1. String str = "Jablunkovska 21/2";
  2. String[] split = str.split("\\s(?=\\d)", 2);
  3. System.out.println(Arrays.toString(split));

输出:

  1. [Jablunkovska, 21/2]

表达式 (?=\\d) 是一个数字的前瞻,因此它不会在分割时被移除。

英文:

Here's a simple solution using regex and split:

  1. String str = &quot;Jablunkovska 21/2&quot;;
  2. String[] split = str.split(&quot;\\s(?=\\d)&quot;, 2);
  3. System.out.println(Arrays.toString(split));

Output:

  1. [Jablunkovska, 21/2]

The expression (?=\\d) is a lookahead for a digit, so it doesn't get removed with the split.

huangapple
  • 本文由 发表于 2020年9月17日 20:58:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/63938555.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定