2020年7月26日 22:53:18go评论117阅读模式

英文:

How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java

问题

        String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
        Pattern scriptPattern = Pattern.compile("<script[^>]+src\\s*=\\s*[\"'](.*?)[\"'][^>]*>");
        Matcher script = scriptPattern.matcher(javaScript);
        if (script.find()) {
            String srcValue = script.group(1);
            String[] pathSegments = srcValue.split("[\\\\/]");
            String fileName = pathSegments[pathSegments.length - 1];
            System.out.println(fileName);
        }

Output:

v-0.52.min.js

英文:

I need to get the file name from the src attribute of HTML 'script' tag. I managed to get the value for entire src attribute but not sure how to get only file name including extension. Below is the code with example.

        String javaScript = &quot;&lt;script src=\&quot;https://www.xxx.co.uk/rta2/v-0.52.min.js\&quot; class=\&quot;RTA2-loader\&quot; data-hosts=\&quot;ted.xxx.co.uk\&quot;&gt;&lt;/script&gt;&quot;;
        Pattern scriptPattern = Pattern.compile(&quot;&lt;script[^&gt;]+src\\s*=\\s*[\&quot;&#39;](.*?)[\&quot;&#39;][^&gt;]*&gt;&quot;);
        Matcher script = scriptPattern.matcher(javaScript);
        if (script.find()) {
            System.out.println(script.group(1));
        }

The above one prints https://www.xxx.co.uk/rta2/v-0.52.min.js

Instead of entire URL I want the file name i.e.

v-0.52.min.js

Also it should support '/' and '\' path separator.

Please help.

答案1

得分: 0

String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern pattern = Pattern.compile("<script src=\"[^\"]+(?:/|\\\\)([^\"]+)\"");
Matcher matcher = pattern.matcher(javaScript);
if (matcher.find()) {
    String src = matcher.group(1);
    System.out.println(src);
}

The regular expression searches for the literal string <script src=
followed by a single double quote character, i.e. "
followed by one or more characters that are not the double quote character
followed by either a single forward slash, i.e. /, or a single backslash, i.e. \
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.

The above code displays the following:

v-0.52.min.js

Nonetheless, I wish to point out that using an HTML parser is preferred over regular expressions when it comes to parsing HTML.

英文:

String javaScript = &quot;&lt;script src=\&quot;https://www.xxx.co.uk/rta2/v-0.52.min.js\&quot; class=\&quot;RTA2-loader\&quot; data-hosts=\&quot;ted.xxx.co.uk\&quot;&gt;&lt;/script&gt;&quot;;
Pattern pattern = Pattern.compile(&quot;&lt;script src=\&quot;[^\&quot;]+(?:/|\\\\)([^\&quot;]+)\&quot;&quot;);
Matcher matcher = pattern.matcher(javaScript);
if (matcher.find()) {
    String src = matcher.group(1);
    System.out.println(src);
}

The above code displays the following:

v-0.52.min.js

Nonetheless, I wish to point out that using a HTML parser is preferred over regular expressions when it comes to parsing HTML.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java

问题

答案1

如何在Kotlin中使用相同参数传递给IN子句的多个值。

无法在带有@GeoSpatialIndexed的模型中查询geoNear。

VSCode查找和替换，使用不同的模式

创建一个正则表达式，它可以接受数字和 + 作为第一个字符。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。