正则表达式匹配电影文件

huangapple go评论165阅读模式
英文:

Regex to match movie file

问题

Here's the translated code part:

  1. 我试图编写一些正则表达式来匹配文件中的电影标题 正则表达式应该匹配所有示例文件中的标题 我目前只能使用此正则表达式 `^(.+).(\d{4}p)` 来使其中一些工作
  2. 我在Java中使用它来自java.util.regex
  3. 我希望它在电影文件格式为以下情况下能够工作
  4. - {标题} {年份} {分辨率}
  5. - {标题} {分辨率} {年份}
  6. - {标题} {分辨率}
  7. - {标题} {年份}
  8. - 当电影包含年份或仅为年份时如电影20122009
  9. **示例文件**
  10. ```java
  11. Film.2017.720p.BluRay.H264.AAC.mp4
  12. Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  13. 152.Seconds.2010.1080p.BluRay.x264.mp4
  14. 2015.2005.1080p.BluRay.x264.mp4

Java 代码:

  1. public static void main(String[] args)
  2. {
  3. ArrayList<String> movies = new ArrayList<>();
  4. movies.add("Film.2017.720p.BluRay.H264.AAC.mp4");
  5. movies.add("Film.And.The.Film.2017.1080p.BluRay.x264.mp4");
  6. movies.add("152.Seconds.2010.1080p.BluRay.x264.mp4");
  7. movies.add("2015.2005.1080p.BluRay.x264.mp4");
  8. for (String s : movies)
  9. {
  10. System.out.println("原始文件: \t" + s);
  11. System.out.println("新文件: \t\t" + getTitleFromFile(s) + "\n");
  12. }
  13. }
  14. private static String getTitleFromFile(String fileName)
  15. {
  16. Pattern pattern = Pattern.compile("^(.+).(\\d{4}p)");
  17. Matcher m = pattern.matcher(fileName);
  18. if (m.find())
  19. {
  20. return m.group();
  21. }
  22. else
  23. {
  24. return null;
  25. }
  26. }

实际输出:

  1. 原始文件: Film.2017.720p.BluRay.H264.AAC.mp4
  2. 新文件: null
  3. 原始文件: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  4. 新文件: null
  5. 原始文件: Film 2015 1080p BluRay x264 DTS.mp4
  6. 新文件: Film 2015 1080p
  7. 原始文件: Film.1080p.BrRip.x264.mp4
  8. 新文件: Film.1080p

预期输出:

  1. 原始文件: Film.2017.720p.BluRay.H264.AAC.mp4
  2. 新文件: Film
  3. 原始文件: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  4. 新文件: Film And The Film
  5. 原始文件: Film 2015 1080p BluRay x264 DTS.mp4
  6. 新文件: Film
  7. 原始文件: Film.1080p.BrRip.x264.mp4
  8. 新文件: Film
英文:

I was trying to write some regex to match the title of a movie from a file. The regex should match the title from all the example files. I can only get it to work for some of them currently with this regex ^(.+).(\d{4}p).
I am using this in Java from the package java.util.regex

I would like it to work when the movie file format is:

  • {title} {year} {resolution} etc.
  • {title} {resolution} {year} etc.
  • {title} {resolution} etc.
  • {title} {year} etc.
  • when the movie contains a year or is just a year like the movie: 2012 (2009)

Example files:

  1. Film.2017.720p.BluRay.H264.AAC.mp4
  2. Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  3. 152.Seconds.2010.1080p.BluRay.x264.mp4
  4. 2015.2005.1080p.BluRay.x264.mp4

Java code:

  1. public static void main(String[] args)
  2. {
  3. ArrayList&lt;String&gt; movies = new ArrayList&lt;&gt;();
  4. movies.add(&quot;Film.2017.720p.BluRay.H264.AAC.mp4&quot;);
  5. movies.add(&quot;Film.And.The.Film.2017.1080p.BluRay.x264.mp4&quot;);
  6. movies.add(&quot;152.Seconds.2010.1080p.BluRay.x264.mp4&quot;);
  7. movies.add(&quot;2015.2005.1080p.BluRay.x264.mp4&quot;);
  8. for (String s : movies)
  9. {
  10. System.out.println(&quot;original file: \t&quot; + s);
  11. System.out.println(&quot;new file: \t\t&quot; + getTitleFromFile(s) + &quot;\n&quot;);
  12. }
  13. }
  14. private static String getTitleFromFile(String fileName)
  15. {
  16. Pattern pattern = Pattern.compile(&quot;^(.+).(\\d{4}p)&quot;);
  17. Matcher m = pattern.matcher(fileName);
  18. if (m.find())
  19. {
  20. return m.group();
  21. }
  22. else
  23. {
  24. return null;
  25. }
  26. }

Actual Output:

  1. original file: Film.2017.720p.BluRay.H264.AAC.mp4
  2. new file: null
  3. original file: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  4. new file: null
  5. original file: Film 2015 1080p BluRay x264 DTS.mp4
  6. new file: Film 2015 1080p
  7. original file: Film.1080p.BrRip.x264.mp4
  8. new file: Film.1080p

Expected Output:

  1. original file: Film.2017.720p.BluRay.H264.AAC.mp4
  2. new file: Film
  3. original file: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
  4. new file: Film And The Film
  5. original file: Film 2015 1080p BluRay x264 DTS.mp4
  6. new file: Film
  7. original file: Film.1080p.BrRip.x264.mp4
  8. new file: Film

答案1

得分: 1

以下是代码部分的翻译:

  1. List<String> strs = Arrays.asList(
  2. "Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4",
  3. "Film.The.Film.2020.BrRip.x264.mp4",
  4. "Film.The.Film.720p.2020.BrRip.x264.mp4",
  5. "Film.The.Film.720p.BrRip.x264.mp4"
  6. );
  7. Pattern p = Pattern.compile("^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b");
  8. for (String str : strs) {
  9. Matcher m = p.matcher(str);
  10. if (m.find()) {
  11. System.out.println("\n--------\nName: " + m.group(1).replace(".", " "));
  12. if (m.group(2) != null) {
  13. System.out.println("Year: " + m.group(2));
  14. if (m.group(3) != null) {
  15. System.out.println("Resolution: " + m.group(3));
  16. }
  17. } else {
  18. System.out.println("Resolution: " + m.group(4));
  19. if (m.group(5) != null) {
  20. System.out.println("Year: " + m.group(5));
  21. }
  22. }
  23. }
  24. }

希望这能帮助你。

英文:

You may use

  1. ^(.*?)\W(?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?)\b

See the regex demo.

Details

  • ^ - start of string
  • (.*?) - Group 1: name, any 0 or more chars other than line break chars, as few as possible
  • \W - a non-word char
  • (?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?) - either of
    • (\d{4})(?:\W(\d+p)?) - Group 2 - four digits followed with an optional group matching a non-word char and then one or more digits and p captured in Group 3
    • | - or
    • (\d+p)(?:\W(\d{4}))? - Group 4 - one or more digits and p followed with an optional group matching a non-word char and then four digits captured in Group 5
  • \b - word boundary

Java demo:

  1. List&lt;String&gt; strs = Arrays.asList(&quot;Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4&quot;,
  2. &quot;Film.The.Film.2020.BrRip.x264.mp4&quot;,
  3. &quot;Film.The.Film.720p.2020.BrRip.x264.mp4&quot;,
  4. &quot;Film.The.Film.720p.BrRip.x264.mp4&quot;);
  5. Pattern p = Pattern.compile(&quot;^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b&quot;);
  6. for (String str : strs) {
  7. Matcher m = p.matcher(str);
  8. if (m.find()) {
  9. System.out.println(&quot;\n--------\nName: &quot; + m.group(1).replace(&quot;.&quot;, &quot; &quot;));
  10. if (m.group(2) != null) {
  11. System.out.println(&quot;Year: &quot; + m.group(2));
  12. if (m.group(3) != null) {
  13. System.out.println(&quot;Resolution: &quot; + m.group(3));
  14. }
  15. }
  16. else {
  17. System.out.println(&quot;Resolution: &quot; + m.group(4));
  18. if (m.group(5) != null) {
  19. System.out.println(&quot;Year: &quot; + m.group(5));
  20. }
  21. }
  22. }
  23. }

Output:

  1. --------
  2. Name: Film The Film
  3. Year: 2004
  4. Resolution: 720p
  5. --------
  6. Name: Film The Film
  7. Year: 2020
  8. --------
  9. Name: Film The Film
  10. Resolution: 720p
  11. Year: 2020
  12. --------
  13. Name: Film The Film
  14. Resolution: 720p

huangapple
  • 本文由 发表于 2020年8月1日 00:45:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/63195937.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定