2023年8月10日 19:50:12go评论117阅读模式

英文:

How to extract a subdir with all it's subsequent files using zipfile

问题

是的，我已经阅读了关于这个主题的其他帖子，但我遇到了一个奇怪的问题：

当我从namelist中提取特定项目时，它只给我一个空文件夹，而不是实际的文件。

我的zip文件具有以下层次结构：

myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN。

因此，有很多我不想提取的前导文件夹。我知道如何从namelist中识别我想要的文件夹：

import os
import sys
import zipfile
try:
	zip_file_path = sys.argv[1]
except IndexError:
	sys.exit('未提供zip文件。')
archive = zipfile.ZipFile(zip_file_path)
for i, file in enumerate(archive.namelist()):
	if os.path.basename(file[:-1]).startswith('ABC-'): # 识别相关文件夹
		old_name = os.path.basename(file[:-1])
		new_name = 'new_%d'%i # 创建一个新名称
		
		archive.extract(file, new_name)

这确实提取了我想要的文件夹，但出于某种原因，提取的文件夹是空的。而且不仅如此：当我提取新文件夹时，它们包含了前导的文件夹A、B和C，原因我不知道...

这里有一个测试zip文件以供您参考：

import os
import shutil
prefolders = r'testzip\FolderA\FolderB\FolderC'
try:
	os.makedirs(prefolders)
except FileExistsError:
	pass
for i in 'ABC':
	try:
		new_folder = 'ABC-Folder%s'%i
		os.mkdir(os.path.join(prefolders, new_folder))
	except FileExistsError:
		pass
	for j in range(2):
		file_path = os.path.join(prefolders, new_folder, 'somefile%s.txt'%j)
		with open(file_path, 'w'): pass
shutil.make_archive('testzip', 'zip', 'testzip')
shutil.rmtree('testzip')

我以为这会花费大约10分钟，但我正在为此疯狂...

英文:

Yes, I have read the other posts on this subject, but I am running into a weird problem:

When I extract a certain item from the namelist, it only gives me an empty folder, not the actual files inside.

My zip file has the following hierarchy:

myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN.

So there are a lot of preceeding folders I do not wish to extract. I know how to identify the ones I want from the namelist:

import os
import sys
import zipfile
try:
	zip_file_path = sys.argv[1]
except IndexError:
	sys.exit(&#39;No zip file provided.&#39;)
archive = zipfile.ZipFile(zip_file_path)
for i,file in enumerate(archive.namelist()):
	if os.path.basename(file[:-1]).startswith(&#39;ABC-&#39;): # identify relevant folders
		old_name = os.path.basename(file[:-1])
		new_name = &#39;new_%d&#39;%i # Create a new name
		
		archive.extract(file, new_name)

This does extract the folders I want, however the extracted folders are empty for some reason. And not just that: When I extract the new folders, they contain the preceeding folders A,B and C for some reason.

I do not know why it does that...
Here's a test zip for your convenience:

import os
import shutil
prefolders = r&#39;testzip\FolderA\FolderB\FolderC&#39;
try:
	os.makedirs(prefolders)
except FileExistsError:
	pass
for i in &#39;ABC&#39;:
	try:
		new_folder = &#39;ABC-Folder%s&#39;%i
		os.mkdir(os.path.join(prefolders,new_folder))
	except FileExistsError:
		pass
	for j in range(2):
		file_path = os.path.join(prefolders,new_folder,&#39;somefile%s.txt&#39;%j)
		with open(file_path,&#39;w&#39;): pass
shutil.make_archive(&#39;testzip&#39;, &#39;zip&#39;, &#39;testzip&#39;)
shutil.rmtree(&#39;testzip&#39;)

I thought this would take like 10 minutes and I am losing my mind over this...

答案1

得分: 1

你正在寻找以ABC-开头的basename()，这意味着你永远不会找到不以那个开头的文件。你示例中的文件以somefile开头。extract()只会提取以该名称命名的内容。在你的情况下，所有以ABC-开头的内容都是目录。

要查找路径中某个位置有以ABC-开头的目录的文件，你可以使用以下代码：

if os.path.basename(file) != '' and ('/ABC-' in os.path.dirname(file) or os.path.dirname(file).startswith('ABC-')):

（你可能需要在你的系统上将斜杠改为反斜杠。）

这仍然会提取文件和file中命名的所有父目录。如果你只想要new_n中的文件本身，那么你需要使用read()来读取条目，然后将数据写入所需的目标文件。

英文:

You're looking for the basename() to start with ABC-, which means you never find files that don't start with that. The files in your example start with somefile. extract() will only extract the one thing that is named. In your case, all of the things that start with ABC- are directories.

To find the files that have a directory somewhere in their path that starts with ABC-, you could:

    if os.path.basename(file) != &#39;&#39; and (&#39;/ABC-&#39; in os.path.dirname(file) or os.path.dirname(file).startswith(&#39;ABC-&#39;)):

(You may need to change the slash to a backslash on your system.)

This will still extract the file and all of the parent directories as named in file. If you want just the file by itself in new_n, then you will need to use read() on the entry, and then write the data to the desired destination file.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用zipfile提取一个子目录及其所有后续文件

问题

答案1

Attribute Error : ‘str’ object has no attribute ‘_ignore_local_proxy’ with ChromeDriverManager

如何找到一个数字中的最低位值

将RDD列表映射到具有两个参数的函数。

如何将字符串编码十次，然后解码十次 python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。