2020年7月30日 00:32:23go评论106阅读模式

英文:

Java and regex lexer

问题

我正在尝试使用Java中的正则表达式为我创建的自定义Markdown“语言”制作某种词法分析器，这是我第一次处理这些内容，所以在某些方面有点迷茫。
一个可能的语法示例是：
Some <#000000>*text* [<#ffffff>Some more](action: Other <#gradient>text) and **finally** some more <#000>text!
我能够捕获一些内容，例如，我正在使用(?<hex><#\w+>)来捕获“hex”，并且使用(?<action>\[[^]]*]$[^]]*$)来获取整个“action”块。
我的问题是如何能够将它们全部组合在一起，就像如何将它们合并在一起。例如，词法分析器需要输出类似于以下内容：

TEXT - Some
HEX - &lt;#000000&gt;
TEXT - *text*
ACTION - [&lt;#ffffff&gt;Some more](action: Other &lt;#gradient&gt;text)
TEXT - and **finally** some more
HEX - &lt;#000&gt;
TEXT - text!

我稍后会处理粗体和斜体。希望提供一些如何将它们组合的建议！

英文:

I am trying to make some sort of Lexer in Java using regex for a custom markdown "language" I'm making, it's my first time working with this stuff so a little lost on a few things.
An example of a possible syntax in it is:
Some <#000000>*text* [<#ffffff>Some more](action: Other <#gradient>text) and **finally** some more <#000>text!
I was able to capture a few things, for example I'm using (?<hex><#\w+>) to capture the "hex" and (?<action>\[[^]]*]$[^]]*$) to get the entire "action" block.
My problem is being able to capture it all together, like, how to combine it all. For example the lexer needs to output something like:

TEXT - Some
HEX - &lt;#000000&gt;
TEXT - *text*
ACTION - [&lt;#ffffff&gt;Some more](action: Other &lt;#gradient&gt;text)
TEXT - and **finally** some more
HEX - &lt;#000&gt;
TEXT - text!

I'll handle the bold and italic later.
Would love just some suggestions on how to combine all of them!

答案1

得分: 2

以下是已翻译的内容：

One option could be using an alternation matching each of the separate parts, and for the text part use for example a character class [\w!* ]+

在这种情况下，可以使用交替方式匹配每个单独的部分，并且对于文本部分，例如可以使用一个字符类 [\w!* ]+。

In Java, you could check for the name of the capturing group.

在Java中，您可以检查捕获组的名称。

Example code:

示例代码：

String regex = "(?<hex><#\w+>)|(?<action>\[[^]]]$[^]]$)|(?<text>[\w!* ]+)";

String string = "Some <#000000>text [<#ffffff>Some more](action: Other <#gradient>text) and finally some more <#000>text!";

Pattern pattern = Pattern.compile(regex);

Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
if (matcher.group("hex") != null) {
System.out.println("HEX - " + matcher.group("hex"));
}
if (matcher.group("text") != null) {
System.out.println("TEXT - " + matcher.group("text"));
}
if (matcher.group("action") != null) {
System.out.println("ACTION - " + matcher.group("action"));
}
}

Output

输出结果：

TEXT - Some
HEX - <#000000>
TEXT - text
ACTION - [<#ffffff>Some more](action: Other <#gradient>text)
TEXT - and finally some more
HEX - <#000>
TEXT - text!

英文:

One option could be using an alternation matching each of the separate parts, and for the text part use for example a character class [\w!* ]+

In Java, you could check for the name of the capturing group.

(?&lt;hex&gt;&lt;#\w+&gt;)|(?&lt;action&gt;\[[^]]*]\([^]]*\))|(?&lt;text&gt;[\w!* ]+)

Explanation

(?<hex><#\w+>) Capture group hex, match # and 1+ word chars
| Or
(?<action> Capture group action
- \[[^]]*]$[^]]*$ Match [...] followed by (...)
) Close group
| Or
(?<text>[\w!* ]+) Capture group text, match 1+ times any char listed in the character class

Regex demo | Java demo

Example code:

String regex = &quot;(?&lt;hex&gt;&lt;#\\w+&gt;)|(?&lt;action&gt;\\[[^]]*]\\([^]]*\\))|(?&lt;text&gt;[\\w!* ]+)&quot;;
String string = &quot;Some &lt;#000000&gt;*text* [&lt;#ffffff&gt;Some more](action: Other &lt;#gradient&gt;text) and **finally** some more &lt;#000&gt;text!&quot;;

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
	if (matcher.group(&quot;hex&quot;) != null) {
		System.out.println(&quot;HEX - &quot; + matcher.group(&quot;hex&quot;));	
	}
	if (matcher.group(&quot;text&quot;) != null) {
		System.out.println(&quot;TEXT - &quot; + matcher.group(&quot;text&quot;));	
	}
	if (matcher.group(&quot;action&quot;) != null) {
		System.out.println(&quot;ACTION - &quot; + matcher.group(&quot;action&quot;));	
	}
}

Output

TEXT - Some 
HEX - &lt;#000000&gt;
TEXT - *text* 
ACTION - [&lt;#ffffff&gt;Some more](action: Other &lt;#gradient&gt;text)
TEXT -  and **finally** some more 
HEX - &lt;#000&gt;
TEXT - text!

答案2

得分: 0

你可以使用正则表达式捕获组来实现这个，就像这样
^(.*?) (?<hex1><#\w+>)(\*[^*]*\*) (?<action>\[[^]]*]$[^]]*$) (.*?) (?<hex2><#\w+>)(.*)$
要更好地理解，请参考此链接点击这里

英文:

You can achieve this using Regex- Capturing groups like this
^(.*?) (?<hex1><#\w+>)(\*[^*]*\*) (?<action>\[[^]]*]$[^]]*$) (.*?) (?<hex2><#\w+>)(.*)$
To get a better understanding refer this Click here

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java和正则表达式词法分析器

问题

答案1

答案2

JAVA Paho mqtt – publish to wildcard

无法找到生成OAuth令牌的页面。

使用Spring Boot按部分（分页）发送电子邮件？

如何使用Material Design更改操作栏文本颜色？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论