从最小的存储库中使用 JGit 检查单个子目录。

huangapple go评论71阅读模式
英文:

Check out single subdirectory from minimal repository using JGit

问题

我正在使用 Java 17 和 JGit 6.5.x。我有一个巨大的远程代码仓库(几千兆字节),但我只需要临时访问一个子目录(例如 foo/bar/)进行处理。这个子目录非常小(几百千字节)。克隆一个浅层次、裸仓库也相对较小:

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setBare(true)
    .setDepth(1)
    .call()) {
  System.out.println("克隆了浅层次、裸仓库");
}

是否有一种方法可以像上面那样克隆一个浅层次、裸仓库(或仓库的任何其他最小版本),然后将单个子目录 foo/bar 检出到其他临时目录,以便我可以使用正常的 Java 文件系统 API 处理这些文件?

请注意,我刚刚成功克隆了上面的仓库,还没有开始研究如何从这个裸仓库中只检出单个子目录。

英文:

I'm using JGit 6.5.x with Java 17. I have a remote repository that is huge (gigabytes), but I only need temporary access to a single subdirectory (e.g. foo/bar/) for processing. The single subdirectory is really small (hundreds of kilobytes). Cloning a shallow, bare repository is relatively small as well:

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setBare(true)
    .setDepth(1)
    .call()) {
  System.out.println("cloned shallow, bare repository");
}

Is there a way to clone a shallow, bare repository like that (or any other minimal version of the repository), and then check out just the single subdirectory foo/bar to some other directory temporarily so that I can process those files using the normal Java file system API?

Note that I just now succeeded in the the clone above and haven't started looking into how I might check out just a single subdirectory from this bare repository.

答案1

得分: 2

请查看以下解决方案:

注意:在应用任何git更改之前,请确保备份必要的文件。

使用git对象创建一个TreeWalk,允许您遍历存储库的树并找到您感兴趣的子目录。将起始路径指定为存储库的根目录:

try (Git git = Git.open(LOCAL_REPOSITORY_PATH.toFile())) {
    Repository repository = git.getRepository();

    // 获取存储库HEAD提交的树
    RevWalk revWalk = new RevWalk(repository);
    RevCommit commit = revWalk.parseCommit(repository.resolve(Constants.HEAD));
    RevTree tree = commit.getTree();

    // 创建一个TreeWalk,从存储库的根目录开始
    TreeWalk treeWalk = new TreeWalk(repository);
    treeWalk.addTree(tree);
    treeWalk.setRecursive(true);

    // 指定要检出的子目录的路径
    treeWalk.setFilter(PathFilter.create("foo/bar"));

    if (!treeWalk.next()) {
        throw new IllegalStateException("Subdirectory not found");
    }

    // 获取子目录树的ObjectId
    ObjectId subdirectoryTreeId = treeWalk.getObjectId(0);
    treeWalk.close();

    // 使用浅层裸存储库创建新的Git对象
    Git subGit = new Git(repository);

    // 将子目录的树检出到临时目录
    Path temporaryDirectory = Files.createTempDirectory("subdirectory");
    subGit.checkout().setStartPoint(subdirectoryTreeId.getName()).setAllPaths(true).setForce(true).setTargetPath(temporaryDirectory.toFile()).call();

    // 现在,您可以使用Java文件系统API处理临时目录中的文件

    // 完成后清理临时目录
    FileUtils.deleteDirectory(temporaryDirectory.toFile());
}

在上面的代码中,我们使用TreeWalk来遍历存储库的树并找到您指定的子目录(foo/bar)。然后,我们获取子目录树的ObjectId,并使用存储库创建一个新的Git对象。最后,我们使用checkout()将子目录的树检出到临时目录,您可以使用该目录中的Java文件系统API处理文件。完成后不要忘记清理临时目录。

请注意,代码假定您已经导入了必要的JGit和Java IO库。

英文:

Try below solution :

Note : Before apply any git changes make sure you have backup for necessary files.

Use the git object to create a TreeWalk that will allow you to traverse the repository's tree and find the subdirectory you're interested in. Specify the starting path as the root of the repository:

try (Git git = Git.open(LOCAL_REPOSITORY_PATH.toFile())) {
    Repository repository = git.getRepository();

    // Get the tree for the repository's HEAD commit
    RevWalk revWalk = new RevWalk(repository);
    RevCommit commit = revWalk.parseCommit(repository.resolve(Constants.HEAD));
    RevTree tree = commit.getTree();

    // Create a TreeWalk starting from the root of the repository
    TreeWalk treeWalk = new TreeWalk(repository);
    treeWalk.addTree(tree);
    treeWalk.setRecursive(true);
    
    // Specify the path of the subdirectory you want to check out
    treeWalk.setFilter(PathFilter.create("foo/bar"));

    if (!treeWalk.next()) {
        throw new IllegalStateException("Subdirectory not found");
    }

    // Get the ObjectId of the subdirectory's tree
    ObjectId subdirectoryTreeId = treeWalk.getObjectId(0);
    treeWalk.close();
    
    // Create a new Git object with the shallow, bare repository
    Git subGit = new Git(repository);

    // Checkout the subdirectory's tree to a temporary directory
    Path temporaryDirectory = Files.createTempDirectory("subdirectory");
    subGit.checkout().setStartPoint(subdirectoryTreeId.getName()).setAllPaths(true).setForce(true).setTargetPath(temporaryDirectory.toFile()).call();

    // Now you can use the Java file system API to process the files in the temporary directory
    
    // Clean up the temporary directory when you're done
    FileUtils.deleteDirectory(temporaryDirectory.toFile());
}

In the code above, we use a TreeWalk to traverse the repository's tree and find the subdirectory you specified (foo/bar). We then get the ObjectId of the subdirectory's tree and create a new Git object with the repository. Finally, we use checkout() to check out the subdirectory's tree to a temporary directory, and you can use the Java file system API to process the files in that directory. Don't forget to clean up the temporary directory when you're done.

Note that the code assumes you have the necessary JGit and Java IO imports in place.

答案2

得分: 1

受到另一个答案的启发,我能够进行单层深度克隆,并且仅检出单个路径,而无需进行光秃克隆,同时使用类似的最小文件系统空间。这种方法的好处在于只需要一个顶级目录;而另一种光秃存储库的方法则需要手动遍历并保存到单独的下级目录。

关键是使用 setNoCheckout(true)(除了 setDepth(1)),然后在克隆后手动执行单独的检出,指定所需的路径。请注意,您必须指定 setStartPoint("HEAD") 或指定一个哈希起点,因为由于尚未进行检出,所以不会有分支。

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setNoCheckout(true)
    .setDepth(1)
    .call()) {

  gitRepository.checkout()
    .setStartPoint("HEAD")
    .addPath("foo/bar")
    .call();

}

这似乎非常有效!我想它使用了类似于Satyajit Bhatt的答案的底层机制。

英文:

Inspired by another answer I was able get a single-depth clone and check out only a single path without needing to do a bare clone, while using similar minimal file system space. The benefit to this approach is that only a single top-level directory is needed; the bare repository approach on the other hand requires a manual traversal and saving to a separate drop-level directory.

The key is to use setNoCheckout(true) (in addition to setDepth(1)), and then after cloning manually perform a separate checkout specifying the requested path. Note that you must specify setStartPoint("HEAD") or specify a hash starting point, as there will be no branch because there is not yet a checkout.

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setNoCheckout(true)
    .setDepth(1)
    .call()) {

  gitRepository.checkout()
    .setStartPoint("HEAD")
    .addPath("foo/bar")
    .call();

}

This seems to work very nicely! I would imagine it uses something similar to Satyajit Bhatt's answer under the hood.

huangapple
  • 本文由 发表于 2023年6月1日 01:29:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定