如何将字符串削减到第一个重复项?

huangapple go评论68阅读模式
英文:

How to strip down a string to the first duplicate?

问题

我只想提取第一行内的链接(https://www.apple.com/ca/),并忽略其余的HTML和代码。我该如何做?

英文:

I have converted a few web pages into string and the string contains these lines(along with other code):

<div class="r"><a href="https://www.apple.com/ca/"
<div class="r"><a href="https://www.facebook.com/ca/"
<div class="r"><a href="https://www.utorrent.com/ca/"

but I just want to strip out the link inside the first line(https://www.apple.com/ca/) and ignore the rest of the HTML and the code. How do I do that?

答案1

得分: 2

以下是翻译好的内容:

简单的方法:

String url = input.replaceAll("(?s).*?href=\"(.*?)\"", "$1");

为什么这段代码有效的关键点:

  • 正则表达式匹配整个输入,但捕获了目标部分。替换内容是捕获的部分(第1组)。这种方法有效地 提取 了目标部分。
  • (?s) 表示“点号匹配换行符”。
  • .*? 勉强地(尽可能少地)匹配到 "href""。
  • (.*?) 勉强地 捕获到 "quot;" 之前的所有内容。
  • .* 贪婪地(尽可能多地)匹配剩余部分(由于上面的 (?s))。
  • 替换内容为 $1 - 匹配中的第一个(也是唯一的)组。
英文:

The easy way:

String url = input.replaceAll("(?s).*?href=\"(.*?)\".*", "$1");

Key points of why this works:

  • regex matches the whole input, but captures the target. The replacement is the capture (group #1). This approach effectively extracts the target
  • (?s) means “dot matches newline”
  • .*? is reluctantly (as little input as possible) matches up to “href"”
  • (.*?) capture (reluctantly) everything up to “"”
  • .* greedily (as much as possible) matches the rest (thanks to (?s) above)
  • replacement is $1 - the first (and only) group in the match

答案2

得分: 1

使用在这个答案中提到的正则表达式,以下是使用Java正则表达式API的解决方案:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        String str = "<div class=\"r\"><a href=\"https://www.apple.com/ca/\">Hello</a>\n"
                + "<div class=\"r\"><a href=\"https://www.facebook.com/ca/\">Hello</a>\n"
                + "<div class=\"r\"><a href=\"https://www.utorrent.com/ca/\">Hello</a>";
        String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(str);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

输出:

https://www.apple.com/ca/
https://www.facebook.com/ca/
https://www.utorrent.com/ca/
英文:

Using the regex mentioned in the answer, given below is the solution using the Java regex API:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) {
		String str = &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.apple.com/ca/\&quot;&gt;Hello&lt;/a&gt;\n&quot;
				+ &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.facebook.com/ca/\&quot;&gt;Hello&lt;/a&gt;\n&quot;
				+ &quot;&lt;div class=\&quot;r\&quot;&gt;&lt;a href=\&quot;https://www.utorrent.com/ca/\&quot;&gt;Hello&lt;/a&gt;&quot;;
		String regex = &quot;\\b(https?|ftp|file)://[-a-zA-Z0-9+&amp;@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&amp;@#/%=~_|]&quot;;
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(str);
		while (matcher.find()) {
			System.out.println(matcher.group());
		}
	}
}

Output:

https://www.apple.com/ca/
https://www.facebook.com/ca/
https://www.utorrent.com/ca/

huangapple
  • 本文由 发表于 2020年9月19日 06:40:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/63963621.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定