2020年9月19日 06:40:16go评论68阅读模式

英文:

How to strip down a string to the first duplicate?

问题

我只想提取第一行内的链接(https://www.apple.com/ca/)，并忽略其余的HTML和代码。我该如何做？

英文:

I have converted a few web pages into string and the string contains these lines(along with other code):

&lt;div class=&quot;r&quot;&gt;&lt;a href=&quot;https://www.apple.com/ca/&quot;
&lt;div class=&quot;r&quot;&gt;&lt;a href=&quot;https://www.facebook.com/ca/&quot;
&lt;div class=&quot;r&quot;&gt;&lt;a href=&quot;https://www.utorrent.com/ca/&quot;

but I just want to strip out the link inside the first line(https://www.apple.com/ca/) and ignore the rest of the HTML and the code. How do I do that?

答案1

得分: 2

以下是翻译好的内容：

简单的方法：

String url = input.replaceAll("(?s).*?href=\"(.*?)\"", "$1");

为什么这段代码有效的关键点：

正则表达式匹配整个输入，但捕获了目标部分。替换内容是捕获的部分（第1组）。这种方法有效地提取了目标部分。
(?s) 表示“点号匹配换行符”。
.*? 勉强地（尽可能少地）匹配到 "href""。
(.*?) 勉强地 捕获到 "quot;" 之前的所有内容。
.* 贪婪地（尽可能多地）匹配剩余部分（由于上面的 (?s)）。
替换内容为 $1 - 匹配中的第一个（也是唯一的）组。

英文:

The easy way:

String url = input.replaceAll(&quot;(?s).*?href=\&quot;(.*?)\&quot;.*&quot;, &quot;$1&quot;);

Key points of why this works:

regex matches the whole input, but captures the target. The replacement is the capture (group #1). This approach effectively extracts the target
(?s) means “dot matches newline”
.*? is reluctantly (as little input as possible) matches up to “href"”
(.*?) capture (reluctantly) everything up to “"”
.* greedily (as much as possible) matches the rest (thanks to (?s) above)
replacement is $1 - the first (and only) group in the match

答案2

得分: 1

使用在这个答案中提到的正则表达式，以下是使用Java正则表达式API的解决方案：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        String str = "<div class=\"r\"><a href=\"https://www.apple.com/ca/\">Hello</a>\n"
                + "<div class=\"r\"><a href=\"https://www.facebook.com/ca/\">Hello</a>\n"
                + "<div class=\"r\"><a href=\"https://www.utorrent.com/ca/\">Hello</a>";
        String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(str);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

输出：

https://www.apple.com/ca/
https://www.facebook.com/ca/
https://www.utorrent.com/ca/

英文:

Using the regex mentioned in the answer, given below is the solution using the Java regex API:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) {
		String str = &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.apple.com/ca/\&quot;&gt;Hello&lt;/a&gt;\n&quot;
				+ &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.facebook.com/ca/\&quot;&gt;Hello&lt;/a&gt;\n&quot;
				+ &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.utorrent.com/ca/\&quot;&gt;Hello&lt;/a&gt;&quot;;
		String regex = &quot;\\b(https?|ftp|file)://[-a-zA-Z0-9+&amp;@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&amp;@#/%=~_|]&quot;;
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(str);
		while (matcher.find()) {
			System.out.println(matcher.group());
		}
	}
}

Output:

https://www.apple.com/ca/
https://www.facebook.com/ca/
https://www.utorrent.com/ca/

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将字符串削减到第一个重复项？

问题

答案1

答案2

“disableOnClick”按钮在Vaadin 14中的可靠性。

如何将一个整数转换为Java中的整数数组？

弹性搜索：映射未应用于AWS ELK

如何在Android Java中获取Google广告标识符（Google Ad Id）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论