2020年9月17日 16:54:29go评论101阅读模式

英文:

Java/Groovy regex parse Key-Value pairs without delimiters

问题

我在使用正则表达式提取键值对时遇到了麻烦。

迄今为止的代码：

String raw = '''
MA1
 D. Mueller Gießer 
MA2 Peter 
Mustermann 2. Mann  
MA3 Ulrike Mastorius Schmelzer 
MA4 Heiner Becker s 3.Mann
 MA5 Rudolf Peters 
Gießer
'''
Map map = [:]
ArrayList<String> split = raw.findAll("(MA\\d)+(.*)"){ full, name, value ->  map[name] = value }
println map

输出结果为：
[MA1:, MA2: Peter, MA3: Ulrike Mastorius Schmelzer, MA4: Heiner Becker, MA5: Rudolf Peters]

在我的情况下，键是：
MA1、MA2、MA3、MA\d（即带有任意一位数字的MA）

值是直到出现下一个键为止的所有内容（包括换行、制表符、空格等）。

有人知道如何做到这一点吗？

提前感谢您，
Sebastian

英文:

I have trouble fetching Key Value pairs with my regex

Code so far:

String raw = &#39;&#39;&#39;
MA1
 D. Mueller Gie&#223;er 
MA2 Peter 
Mustermann 2. Mann  
MA3 Ulrike Mastorius Schmelzer 
MA4 Heiner Becker s 3.Mann
 MA5 Rudolf Peters 
Gie&#223;er 
&#39;&#39;&#39;
Map map = [:] 
ArrayList&lt;String&gt; split = raw.findAll(&quot;(MA\\d)+(.*)&quot;){ full, name, value -&gt;  map[name] = value }  
println map

Output is:
[MA1:, MA2: Peter, MA3: Ulrike Mastorius Schmelzer, MA4: Heiner Becker, MA5: Rudolf Peters]

In my case the keys are:
MA1, MA2, MA3, MA\d (so MA with any 1 digit Number)

The value is absolutely everything until the next key comes up (including line breaks, tab, spaces etc...)

Does anybody have a clue how to do this?

Thanks in advance,
Sebastian

答案1

得分: 3

你可以在第二个组中捕获所有跟在关键字后面的内容，以及所有不以关键字开头的行。

^(MA\d+)(.*(?:\R(?!MA\d).*)*)

该模式匹配：

^ 字符串的开头
(MA\d+) 捕获 第一组，匹配 MA 和1个或多个数字
( 捕获 第二组
- .* 匹配行剩余的部分
- (?:\R(?!MA\d).*)* 匹配所有不以 MA 后跟数字开头的行，其中 \R 匹配任何Unicode换行序列
) 结束第二组

正则表达式演示

在Java中使用双重转义的反斜杠：

final String regex = "^MA\\\\d+)(.*(?:\\\\R(?!MA\\\\d).*)*)";

英文:

You can capture in the second group all that follows after the key and all the lines that do not start with the key

^(MA\d+)(.*(?:\R(?!MA\d).*)*)

The pattern matches

^ Start of string
(MA\d+) Capture group 1 matching MA and 1+ digits
( Capture group 2
- .* Match the rest of the line
- (?:\R(?!MA\d).*)* Match all lines that do not start with MA followed by a digit, where \R matches any unicode newline sequence
) Close group 2

Regex demo

In Java with the doubled escaped backslashes

final String regex = &quot;^(MA\\d+)(.*(?:\\R(?!MA\\d).*)*)&quot;;

答案2

得分: 0

使用
    (?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)
参见 [证明][1]。
**解释**
                             解释
    --------------------------------------------------------------------------------
      (?ms)                    设置标志以匹配这个块（使用 ^ 和 $ 匹配行的开头和结尾）
                               （使用 . 匹配 \n）（区分大小写）（正常匹配空白和 #）
    --------------------------------------------------------------------------------
      ^                        行的开头
    --------------------------------------------------------------------------------
      (                        第1组并捕获至 ：
    --------------------------------------------------------------------------------
        MA                       &#39;MA&#39;
    --------------------------------------------------------------------------------
        \d+                      数字（0-9）（1次或多次（匹配最多数量））
    --------------------------------------------------------------------------------
      )                         的结束
    --------------------------------------------------------------------------------
      (                        第2组并捕获至 ：
    --------------------------------------------------------------------------------
        .*?                      任意字符（0次或多次（匹配最少数量））
    --------------------------------------------------------------------------------
      )                         的结束
    --------------------------------------------------------------------------------
      (?=                      向前查找以查看是否存在：
    --------------------------------------------------------------------------------
        \n                       &#39;\n&#39;（换行）
    --------------------------------------------------------------------------------
        MA                       &#39;MA&#39;
    --------------------------------------------------------------------------------
        \d                       数字（0-9）
    --------------------------------------------------------------------------------
       |                        或者
    --------------------------------------------------------------------------------
        \z                       字符串的结尾
    --------------------------------------------------------------------------------
      )                        向前查找的结束
  [1]: https://regex101.com/r/NOkli9/1

英文:

Use

(?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)

See proof.

Explanation

                         EXPLANATION
--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a &quot;line&quot;
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    MA                       &#39;MA&#39;
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       &#39;\n&#39; (newline)
--------------------------------------------------------------------------------
    MA                       &#39;MA&#39;
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java/Groovy正则表达式解析无需分隔符的键值对

问题

答案1

答案2

如何使用Spring Security实现具有两级令牌身份验证的Spring Boot微服务？

将DataFrame与Spark / Java上的按降序排序限制连接。

Spring表达式语言 – 根据URL映射值（基于路径，不包含主机和端口）

Deploy to Maven Packages via GitHub Actions.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。