2020年8月5日 21:22:28go评论76阅读模式

英文:

Extraction of string from particular tag using java

问题

为什么我没有得到预期的输出？

您没有得到预期的输出是因为您的正则表达式匹配过于贪婪。在正则表达式中，通常使用.*来匹配任意字符，但默认情况下它是贪婪的，会尽可能多地匹配字符。这导致您的正则表达式匹配了所有位于第一个<AT>和最后一个</AT>之间的文本。

要修复这个问题，您可以将正则表达式改为非贪婪匹配，使用.*?代替.*。这样正则表达式将尽可能少地匹配字符，以便找到最近的<AT>和</AT>标签对。

以下是修正后的正则表达式和代码：

private static final Pattern TAG_REGEX = Pattern.compile("<AT>(.*?)</AT>");

public static void getText(String text) {
    final Matcher matcher = TAG_REGEX.matcher(text);

    while (matcher.find()) {
        String url = matcher.group(1);
        System.out.println("Extracted URL::" + url);
    }
}

使用这个修正后的正则表达式，您应该能够得到预期的输出：

Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL

英文:

I am having few tags inside of html. As you can see in below HTML having <AT></AT>. So I need to extract text from <AT></AT> this tag.

I have followed below approach

Written one regex what will extract text from AT tag

Below is testing string::

href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; style=&quot;font-weight: bold;letter-spacing: normal;line-height: 100%;text-align: center;text-decoration: none;color: #FFFFFF;&quot;&gt;Sign In&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; title=&quot;&quot; class=&quot;&quot; target=&quot;_blank&quot;&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; title=&quot;&quot; class=&quot;&quot; target=&quot;_blank&quot;&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; title=&quot;&quot; class=&quot;&quot; target=&quot;_blank&quot;&gt;

Used below program for extracting text from AT Tag

private static final Pattern TAG_REGEX = Pattern.compile(&quot;&lt;AT&gt;(.*)&lt;/AT&gt;&quot;);

public static String getText(String text) {
	final Matcher matcher = TAG_REGEX.matcher(text);

	while (matcher.find()) {
		String url = matcher.group(1);
		
		System.out.println(&quot;Extracted URL::&quot;+url);						
	}	
}

Getting output from above program:

Extracted URL::EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; style=&quot;font-weight: bold;letter-spacing: normal;line-height: 100%;text-align: center;text-decoration: none;color: #FFFFFF;&quot;&gt;Sign In&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; title=&quot;&quot; class=&quot;&quot; target=&quot;_blank&quot;&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL&lt;/AT&gt;&quot; target=&quot;_blank&quot; title=&quot;&quot; class=&quot;&quot; target=&quot;_blank&quot;&gt; &lt;a href=&quot;&lt;AT&gt;EXTRACT_URL

Expected Output:

Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL
Extracted URL::EXTRACT_URL

Why I am not getting expected output?

答案1

得分: 2

这是因为Pattern。

在这种情况下，正确的模式应该是

private static final Pattern TAG_REGEX = Pattern.compile("<AT>(.*?)</AT>");

两者都会匹配任何字符序列，但是

.* 是贪婪的，会尽可能多地匹配（它会在最后一个</AT>处结束）
.*? 是勉强的，会尽可能少地匹配

更多信息请参阅此教程。

英文:

It's because of the Pattern

Correct patter in this case would be

private static final Pattern TAG_REGEX = Pattern.compile(&quot;&lt;AT&gt;(.*?)&lt;/AT&gt;&quot;);

Both will match any sequence of characters but

.* is greedy and will match as much as possible (it will end at the last </AT>)
.*? is reluctant and will match as few as possible

使用Java从特定标签中提取字符串

问题

答案1

如何在Java中拆分括号和逗号

Ways to get meta info of servlet in filter?

为每次轮询使用JpaPollingChannelAdapter时为NamedQuery参数设置不同的值。

编写一个用于运行竞赛的数组。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论