正则表达式匹配电影文件

huangapple go评论108阅读模式
英文:

Regex to match movie file

问题

Here's the translated code part:

我试图编写一些正则表达式来匹配文件中的电影标题 正则表达式应该匹配所有示例文件中的标题 我目前只能使用此正则表达式 `^(.+).(\d{4}p)` 来使其中一些工作 
我在Java中使用它来自java.util.regex包

我希望它在电影文件格式为以下情况下能够工作
 - {标题} {年份} {分辨率}
 - {标题} {分辨率} {年份}
 - {标题} {分辨率}
 - {标题} {年份}
 - 当电影包含年份或仅为年份时如电影20122009

**示例文件**
```java
Film.2017.720p.BluRay.H264.AAC.mp4
Film.And.The.Film.2017.1080p.BluRay.x264.mp4
152.Seconds.2010.1080p.BluRay.x264.mp4
2015.2005.1080p.BluRay.x264.mp4

Java 代码:

public static void main(String[] args)
{
    ArrayList<String> movies = new ArrayList<>();
    movies.add("Film.2017.720p.BluRay.H264.AAC.mp4");
    movies.add("Film.And.The.Film.2017.1080p.BluRay.x264.mp4");
    movies.add("152.Seconds.2010.1080p.BluRay.x264.mp4");
    movies.add("2015.2005.1080p.BluRay.x264.mp4");

    for (String s : movies)
    {
        System.out.println("原始文件: \t" + s);
        System.out.println("新文件: \t\t" + getTitleFromFile(s) + "\n");
    }
}
    

private static String getTitleFromFile(String fileName)
{
    Pattern pattern = Pattern.compile("^(.+).(\\d{4}p)");
    Matcher m = pattern.matcher(fileName);

    if (m.find())
    {
        return m.group();
    }
    else
    {
        return null;
    }
}

实际输出:

原始文件: 	Film.2017.720p.BluRay.H264.AAC.mp4
新文件: 		null

原始文件: 	Film.And.The.Film.2017.1080p.BluRay.x264.mp4
新文件: 		null

原始文件: 	Film 2015 1080p BluRay x264 DTS.mp4
新文件: 		Film 2015 1080p

原始文件: 	Film.1080p.BrRip.x264.mp4
新文件: 		Film.1080p

预期输出:

原始文件: 	Film.2017.720p.BluRay.H264.AAC.mp4
新文件: 		Film

原始文件: 	Film.And.The.Film.2017.1080p.BluRay.x264.mp4
新文件: 		Film And The Film

原始文件: 	Film 2015 1080p BluRay x264 DTS.mp4
新文件: 		Film

原始文件: 	Film.1080p.BrRip.x264.mp4
新文件: 		Film
英文:

I was trying to write some regex to match the title of a movie from a file. The regex should match the title from all the example files. I can only get it to work for some of them currently with this regex ^(.+).(\d{4}p).
I am using this in Java from the package java.util.regex

I would like it to work when the movie file format is:

  • {title} {year} {resolution} etc.
  • {title} {resolution} {year} etc.
  • {title} {resolution} etc.
  • {title} {year} etc.
  • when the movie contains a year or is just a year like the movie: 2012 (2009)

Example files:

Film.2017.720p.BluRay.H264.AAC.mp4
Film.And.The.Film.2017.1080p.BluRay.x264.mp4
152.Seconds.2010.1080p.BluRay.x264.mp4
2015.2005.1080p.BluRay.x264.mp4

Java code:

public static void main(String[] args)
{
    ArrayList&lt;String&gt; movies = new ArrayList&lt;&gt;();
    movies.add(&quot;Film.2017.720p.BluRay.H264.AAC.mp4&quot;);
    movies.add(&quot;Film.And.The.Film.2017.1080p.BluRay.x264.mp4&quot;);
    movies.add(&quot;152.Seconds.2010.1080p.BluRay.x264.mp4&quot;);
    movies.add(&quot;2015.2005.1080p.BluRay.x264.mp4&quot;);

    for (String s : movies)
    {
        System.out.println(&quot;original file: \t&quot; + s);
        System.out.println(&quot;new file: \t\t&quot; + getTitleFromFile(s) + &quot;\n&quot;);
    }
}
    

private static String getTitleFromFile(String fileName)
{
    Pattern pattern = Pattern.compile(&quot;^(.+).(\\d{4}p)&quot;);
    Matcher m = pattern.matcher(fileName);

    if (m.find())
    {
        return m.group();
    }
    else
    {
        return null;
    }
}

Actual Output:

original file: 	Film.2017.720p.BluRay.H264.AAC.mp4
new file: 		null

original file: 	Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file: 		null

original file: 	Film 2015 1080p BluRay x264 DTS.mp4
new file: 		Film 2015 1080p

original file: 	Film.1080p.BrRip.x264.mp4
new file: 		Film.1080p

Expected Output:

original file: 	Film.2017.720p.BluRay.H264.AAC.mp4
new file: 		Film

original file: 	Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file: 		Film And The Film

original file: 	Film 2015 1080p BluRay x264 DTS.mp4
new file: 		Film

original file: 	Film.1080p.BrRip.x264.mp4
new file: 		Film

答案1

得分: 1

以下是代码部分的翻译:

List<String> strs = Arrays.asList(
    "Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4",
    "Film.The.Film.2020.BrRip.x264.mp4",
    "Film.The.Film.720p.2020.BrRip.x264.mp4",
    "Film.The.Film.720p.BrRip.x264.mp4"
);
Pattern p = Pattern.compile("^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b");
for (String str : strs) {
    Matcher m = p.matcher(str);
    if (m.find()) {
        System.out.println("\n--------\nName: " + m.group(1).replace(".", " "));
        if (m.group(2) != null) {
            System.out.println("Year: " + m.group(2));
            if (m.group(3) != null) {
                System.out.println("Resolution: " + m.group(3));
            }
        } else {
            System.out.println("Resolution: " + m.group(4));
            if (m.group(5) != null) {
                System.out.println("Year: " + m.group(5));
            }
        }
    }
}

希望这能帮助你。

英文:

You may use

^(.*?)\W(?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?)\b

See the regex demo.

Details

  • ^ - start of string
  • (.*?) - Group 1: name, any 0 or more chars other than line break chars, as few as possible
  • \W - a non-word char
  • (?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?) - either of
    • (\d{4})(?:\W(\d+p)?) - Group 2 - four digits followed with an optional group matching a non-word char and then one or more digits and p captured in Group 3
    • | - or
    • (\d+p)(?:\W(\d{4}))? - Group 4 - one or more digits and p followed with an optional group matching a non-word char and then four digits captured in Group 5
  • \b - word boundary

Java demo:

List&lt;String&gt; strs = Arrays.asList(&quot;Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4&quot;,
	     &quot;Film.The.Film.2020.BrRip.x264.mp4&quot;,
	     &quot;Film.The.Film.720p.2020.BrRip.x264.mp4&quot;, 
	     &quot;Film.The.Film.720p.BrRip.x264.mp4&quot;);
Pattern p = Pattern.compile(&quot;^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b&quot;);
for (String str : strs) {
	Matcher m = p.matcher(str);
	if (m.find()) {
		System.out.println(&quot;\n--------\nName: &quot; + m.group(1).replace(&quot;.&quot;, &quot; &quot;));
		if (m.group(2) != null) {
			System.out.println(&quot;Year: &quot; + m.group(2));
			if (m.group(3) != null) {
				System.out.println(&quot;Resolution: &quot; + m.group(3));
			}
		}
		else {
			System.out.println(&quot;Resolution: &quot; + m.group(4));
			if (m.group(5) != null) {
				System.out.println(&quot;Year: &quot; + m.group(5));
			}
		}
	}
}

Output:

--------
Name: Film The Film
Year: 2004
Resolution: 720p

--------
Name: Film The Film
Year: 2020

--------
Name: Film The Film
Resolution: 720p
Year: 2020

--------
Name: Film The Film
Resolution: 720p

huangapple
  • 本文由 发表于 2020年8月1日 00:45:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/63195937.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定