2023年1月9日 08:23:10go评论66阅读模式

英文:

How to extract many files from multiple 7z using Python?

问题

我需要提取分散在50个7z文件中的700,000个jpg文件。我有一个txt文件，每个文件需要提取的内容都有一行。该行包含目标7z文件以及位置和文件名。

这是txt文件的内容：

A20000to22000.7z, A20000to22000/rows/A21673.Lo1sign.jpg
B20000to22000.7z, B20000to22000/rows/B21673.Lo1sign.jpg

我目前能够使用Python提取文件，但一次只能从一个7z文件中提取。我使用以下命令来执行提取操作：

7zz e A0000to22000.7z @f1.txt

然而，这花费的时间太长了。是否有办法修改命令或使用另一种方法，以便我可以一次从多个不同的7z文件中提取许多不同的文件？

英文:

I need to extract 700k jpg files that are dispersed among 50 7z files. I have a txt file that has one row for each file I need. The row contains the target 7z file and location and file name.

This is what the txt file looks like:

A20000to22000.7z, A20000to22000/rows/A21673.Lo1sign.jpg
B20000to22000.7z, B20000to22000/rows/B21673.Lo1sign.jpg

I currently am able to extract files with Python but only from one 7z at a time. I use this command to do that:

7zz e A0000to22000.7z @f1.txt

This is taking way too long though. Is there anyway to edit the command or use another approach so I can extract many different files from many different 7z files at once?

答案1

得分: 2

更新的回答

根据新的信息，需要从每个7z存档中检索大量文件，需要进行修改的方法。

首先，我们必须生成从每个7z存档中所需的文件列表，然后并行处理该列表。因此，这段代码应该可以做到：

awk -F, '{sub("7z","txt",$1); print $2 > $1}' joblist.txt

这将创建一个名为 A20000to22000.txt 的文件，其中包含从存档 A20000to22000.7z 中提取的所有文件，类似地，对于 B20000to22000.7z，它应该生成 B20000to22000.txt。

在文件以 .txt 结尾的部分看起来正确之前，请不要继续进行。

现在，我们需要使用 GNU Parallel 并行处理这些 .txt 文件。应该类似于以下内容：

parallel --dry-run 7zz e {.}.7z @{} ::: *to*.txt

我使用了 *to*.txt 以避免处理原始的 joblist.txt。

如果该命令看起来正确，请删除 --dry-run 并实际运行。

原始的回答

假设 joblist.txt 如下所示：

A20000to22000.7z, A20000to22000/rows/A21673.Lo1sign.jpg
B20000to22000.7z, B20000to22000/rows/B21673.Lo1sign.jpg

并且这对应于需要运行如下命令：

7zz e A20000to22000.7z A20000to22000/rows/A21673.Lo1sign.jpg

您可以使用 GNU Parallel 并行执行此操作，如下所示：

parallel --dry-run --colsep , 7zz e {1} {2} :::: joblist.txt

如果看起来正确，请删除 --dry-run 并实际运行。

请注意，这是在终端/Shell中完成的，而不涉及Python，因此属于您提到的“另一种方法”。

英文:

Updated Answer

With the new information that there are lots of files to retrieve from each archive, a modified approach is needed.

First we must generate a list of the files needed from each 7z archive, then process that list in parallel. So this code should do that:

awk -F, &#39;{sub(&quot;7z&quot;,&quot;txt&quot;,$1); print $2 &gt; $1}&#39; joblist.txt

That should make a file called A20000to22000.txt that contains all the files to be extracted from the archive A20000to22000.7z and similarly for B20000to22000.7z it should produce B20000to22000.txt.

Don't proceed past here till the files ending in .txt look correct.

Now we need to process the .txt files in parallel with GNU Parallel. That should look something like this:

parallel --dry-run 7zz e {.}.7z @{} ::: *to*.txt

I used *to*.txt in order to avoid processing the original joblist.txt.

If that command looks correct, remove --dry-run and run for real.

Original Answer

Assuming joblist.txt looks like this:

A20000to22000.7z, A20000to22000/rows/A21673.Lo1sign.jpg
B20000to22000.7z, B20000to22000/rows/B21673.Lo1sign.jpg

and that corresponds to needing to run a command like:

7zz e A20000to22000.7z A20000to22000/rows/A21673.Lo1sign.jpg

you can do that in parallel with GNU Parallel like this:

parallel --dry-run --colsep , 7zz e {1} {2} :::: joblist.txt

If it looks right, remove --dry-run and run for real.

Note that this is done in the terminal/shell and without Python, so it falls under the "another approach" you mentioned.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python从多个7z文件中提取多个文件？

问题

答案1

有关使用pyspark的均值和分组的问题。

Error: ImportError: 无法从’torchvision.models.vgg’导入’model_urls’。

GEKKO在约束条件具有等式时会出现@error: Solution Not Found。

如何使用快捷键同时更改代码中的变量名称和所有变量出现的地方？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论