英文:
How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java
问题
String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern scriptPattern = Pattern.compile("<script[^>]+src\\s*=\\s*[\"'](.*?)[\"'][^>]*>");
Matcher script = scriptPattern.matcher(javaScript);
if (script.find()) {
String srcValue = script.group(1);
String[] pathSegments = srcValue.split("[\\\\/]");
String fileName = pathSegments[pathSegments.length - 1];
System.out.println(fileName);
}
Output:
v-0.52.min.js
英文:
I need to get the file name from the src attribute of HTML 'script' tag. I managed to get the value for entire src attribute but not sure how to get only file name including extension. Below is the code with example.
String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern scriptPattern = Pattern.compile("<script[^>]+src\\s*=\\s*[\"'](.*?)[\"'][^>]*>");
Matcher script = scriptPattern.matcher(javaScript);
if (script.find()) {
System.out.println(script.group(1));
}
The above one prints https://www.xxx.co.uk/rta2/v-0.52.min.js
Instead of entire URL I want the file name i.e.
v-0.52.min.js
Also it should support '/' and '\' path separator.
Please help.
答案1
得分: 0
String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern pattern = Pattern.compile("<script src=\"[^\"]+(?:/|\\\\)([^\"]+)\"");
Matcher matcher = pattern.matcher(javaScript);
if (matcher.find()) {
String src = matcher.group(1);
System.out.println(src);
}
The regular expression searches for the literal string <script src=
followed by a single double quote character, i.e. "
followed by one or more characters that are not the double quote character
followed by either a single forward slash, i.e. /
, or a single backslash, i.e. \
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.
The above code displays the following:
v-0.52.min.js
Nonetheless, I wish to point out that using an HTML parser is preferred over regular expressions when it comes to parsing HTML.
英文:
String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern pattern = Pattern.compile("<script src=\"[^\"]+(?:/|\\\\)([^\"]+)\"");
Matcher matcher = pattern.matcher(javaScript);
if (matcher.find()) {
String src = matcher.group(1);
System.out.println(src);
}
The regular expression searches for the literal string <script src=
followed by a single double quote character, i.e. "
followed by one or more characters that are not the double quote character
followed by either a single forward slash, i.e. /
, or a single backslash, i.e. \
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.
The above code displays the following:
v-0.52.min.js
Nonetheless, I wish to point out that using a HTML parser is preferred over regular expressions when it comes to parsing HTML.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论