Java PathMatcher在Windows上无法正常工作

huangapple go评论82阅读模式
英文:

Java PathMatcher not working properly on Windows

问题

我尝试为我的SimpleFileVisitor实现一个JUnit测试,但是使用的PathMatcher在Windows上无法正常工作。问题似乎出在PathMatcher上,使用正则表达式模式在Linux和Windows上的行为不同:

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;

public class TestApp {

    public static void main(String[] args) {
        final PathMatcher glob = FileSystems.getDefault().getPathMatcher("glob:{/,/test}");
        final PathMatcher regex = FileSystems.getDefault().getPathMatcher("regex:/|/test");

        System.err.println(glob.matches(Paths.get("/")));       // --> Linux=true  Windows=true
        System.err.println(glob.matches(Paths.get("/test")));   // --> Linux=true  Windows=true
        System.err.println(glob.matches(Paths.get("/test2")));  // --> Linux=false Windows=false

        System.err.println(regex.matches(Paths.get("/")));      // --> Linux=true  Windows=false
        System.err.println(regex.matches(Paths.get("/test")));  // --> Linux=true  Windows=false
        System.err.println(regex.matches(Paths.get("/test2"))); // --> Linux=false Windows=false
    }
}

但是我的正则表达式中有一个更长的列表,涉及多个文件,这些文件不容易迁移到glob语法。另外,我还有嵌套的组,但不允许这样做,或者如果我将每个模式都写成非分组模式,则列表会更长。

在跨平台的情况下,最佳做法是什么?

英文:

I try to implement a JUnit test for my SimpleFileVisitor but the used PathMatcher doesn't work properly on Windows. The problem seems to be the PathMatcher with a regex pattern behaves different on Linux and Windows:

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;

public class TestApp{

     public static void main(String []args){
		final PathMatcher glob = FileSystems.getDefault().getPathMatcher("glob:{/,/test}");
		final PathMatcher regex = FileSystems.getDefault().getPathMatcher("regex:/|/test");

		System.err.println(glob.matches(Paths.get("/")));       // --> Linux=true  Windows=true
		System.err.println(glob.matches(Paths.get("/test")));   // --> Linux=true  Windows=true
		System.err.println(glob.matches(Paths.get("/test2")));  // --> Linux=false Windows=false

		System.err.println(regex.matches(Paths.get("/")));      // --> Linux=true  Windows=false
		System.err.println(regex.matches(Paths.get("/test")));  // --> Linux=true  Windows=false
		System.err.println(regex.matches(Paths.get("/test2"))); // --> Linux=false Windows=false
     }  
}

But I've a longer list in my regex for multiple files which are not easy to migrate to glob syntax. Otherwise I've nested groups which is not allowed or an even longer list if I wrote every pattern as a not-grouped pattern.

What is the best way to do this in a cross-platform manner?

答案1

得分: 2

首先,我想说这是在PathMatcher的glob处理语法中未记录的行为。它似乎会将反斜杠(在Windows文件系统上常见)转换为正斜杠(反之亦然)。因此,它始终在Linux和Windows之间起作用。

以下行演示了不同的输出:

System.out.println(Paths.get("/test")); // 在Windows上输出'\test',在Linux上输出'/test'

为了解决原始问题,我们需要使用一些RegexFu。

FileSystems.getDefault().getPathMatcher("regex:/|/test");

需要变成

FileSystems.getDefault().getPathMatcher("regex:(/|\\\\)|((/|\\\\)test)");
  • 第一组将在/\之间进行检查(您需要\\来转义\,但由于Java的原因,它需要输入为\\\\)。
  • 第二组由两部分组成,其中第一部分再次在/\之间进行检查,第二部分是在问题中输入的文本。

感谢@user3775041提供的更简洁的正则表达式:

FileSystems.getDefault().getPathMatcher("regex:[/\\\\]|[\\\\/]test");

这已在Windows 10和Ubuntu 20.04上进行了测试,两者都具有以下输出:

true
true
false
true
true
false

编辑:一个用于测试Java中正则表达式模式的好网站是https://www.regexplanet.com/advanced/java/index.html。

英文:

First I want to say this is undocumented behavior in the glob handling syntax of the PathMatcher. It appears to convert backward slashes (as common on the Windows filesystems) to forward slashes (or vice-versa). Thus making it always work between Linux and Windows.

The following line demonstrates the different output:

System.out.println(Paths.get("/test")); // Will output '\test' on Windows, '/test' on Linux

To solve the original question we need to get some RegexFu going.

FileSystems.getDefault().getPathMatcher("regex:/|/test");

Needs to become

FileSystems.getDefault().getPathMatcher("regex:(/|\\\\)|((/|\\\\)test)");
  • The first group will check between / and \ (you need \\ to escape the \, but because Java it needs to be input like \\\\).
  • Second group is made up out of two parts, where again the first part checks between either / or \ and the second part is the text entered in the question.

Thanks to @user3775041 for a bit cleaner regex:

FileSystems.getDefault().getPathMatcher("regex:[/\\\\]|[/\\\\]test");

This has been tested on Windows 10 and Ubuntu 20.04 with both having the following output:

true
true
false
true
true
false

Edit: a good site to test regex patterns in Java is https://www.regexplanet.com/advanced/java/index.html

答案2

得分: 1

如果您想要一个版本,在代码在Linux上运行时,正则表达式中不包含Windows文件分隔符字符,您也可以使用:

String sep = Pattern.quote(File.separator);
PathMatcher regex = FileSystems.getDefault().getPathMatcher("regex:" + sep + "|" + sep + "test");

这在Linux/Windows上输出相同的结果。

英文:

If you want a version which does not include Windows file separator character in the regex when the code is run on Linux, you can also use:

String sep = Pattern.quote(File.separator);
PathMatcher regex = FileSystems.getDefault().getPathMatcher("regex:"+sep+"|"+sep+"test");

This prints same output on Linux/Windows.

答案3

得分: 1

这段代码在Windows和Linux上都运行良好:

String pattern = "regex:\\./src/main/java/.*\\.java|\\./src/main/java/.*\\.txt";
String newPattern;

if(File.separator.equals("\\")) { // 适用于Windows
    newPattern = pattern.replace("/", "\\\\");
} else { // Linux
    newPattern = pattern;
}

PathMatcher pathMatcher = FileSystems.getDefault().getPathMatcher(newPattern);
英文:

This code worked well for window and linux:

	String pattern = "regex:\\./src/main/java/.*\\.java|\\./src/main/java/.*\\.txt";
	String newPattern;
	
	if(File.separator.equals("\\")) { //window fix
		newPattern = pattern.replace("/", "\\\\"); 
	}else { //linux
		newPattern = pattern;
	}
	
	PathMatcher pathMatcher = FileSystems.getDefault().getPathMatcher(newPattern);

huangapple
  • 本文由 发表于 2020年9月28日 20:17:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/64102053.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定