How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java

huangapple go评论105阅读模式

How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java


  1. String javaScript = "<script src=\"\" class=\"RTA2-loader\" data-hosts=\"\"></script>";
  2. Pattern scriptPattern = Pattern.compile("<script[^>]+src\\s*=\\s*[\"'](.*?)[\"'][^>]*>");
  3. Matcher script = scriptPattern.matcher(javaScript);
  4. if (script.find()) {
  5. String srcValue =;
  6. String[] pathSegments = srcValue.split("[\\\\/]");
  7. String fileName = pathSegments[pathSegments.length - 1];
  8. System.out.println(fileName);
  9. }


  1. v-0.52.min.js

I need to get the file name from the src attribute of HTML 'script' tag. I managed to get the value for entire src attribute but not sure how to get only file name including extension. Below is the code with example.

  1. String javaScript = &quot;&lt;script src=\&quot;\&quot; class=\&quot;RTA2-loader\&quot; data-hosts=\&quot;\&quot;&gt;&lt;/script&gt;&quot;;
  2. Pattern scriptPattern = Pattern.compile(&quot;&lt;script[^&gt;]+src\\s*=\\s*[\&quot;&#39;](.*?)[\&quot;&#39;][^&gt;]*&gt;&quot;);
  3. Matcher script = scriptPattern.matcher(javaScript);
  4. if (script.find()) {
  5. System.out.println(;
  6. }

The above one prints

Instead of entire URL I want the file name i.e.


Also it should support '/' and '\' path separator.

Please help.


得分: 0

  1. String javaScript = "<script src=\"\" class=\"RTA2-loader\" data-hosts=\"\"></script>";
  2. Pattern pattern = Pattern.compile("<script src=\"[^\"]+(?:/|\\\\)([^\"]+)\"");
  3. Matcher matcher = pattern.matcher(javaScript);
  4. if (matcher.find()) {
  5. String src =;
  6. System.out.println(src);
  7. }

The regular expression searches for the literal string <script src=
followed by a single double quote character, i.e. "
followed by one or more characters that are not the double quote character
followed by either a single forward slash, i.e. /, or a single backslash, i.e. \
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.

The above code displays the following:

  1. v-0.52.min.js

Nonetheless, I wish to point out that using an HTML parser is preferred over regular expressions when it comes to parsing HTML.

  1. String javaScript = &quot;&lt;script src=\&quot;\&quot; class=\&quot;RTA2-loader\&quot; data-hosts=\&quot;\&quot;&gt;&lt;/script&gt;&quot;;
  2. Pattern pattern = Pattern.compile(&quot;&lt;script src=\&quot;[^\&quot;]+(?:/|\\\\)([^\&quot;]+)\&quot;&quot;);
  3. Matcher matcher = pattern.matcher(javaScript);
  4. if (matcher.find()) {
  5. String src =;
  6. System.out.println(src);
  7. }

The regular expression searches for the literal string &lt;script src=
followed by a single double quote character, i.e. &quot;
followed by one or more characters that are not the double quote character
followed by either a single forward slash, i.e. /, or a single backslash, i.e. \
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.

The above code displays the following:

  1. v-0.52.min.js

Nonetheless, I wish to point out that using a HTML parser is preferred over regular expressions when it comes to parsing HTML.

  • 本文由 发表于 2020年7月26日 22:53:18
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
