如何使用zipfile提取一个子目录及其所有后续文件

huangapple go评论87阅读模式
英文:

How to extract a subdir with all it's subsequent files using zipfile

问题

是的,我已经阅读了关于这个主题的其他帖子,但我遇到了一个奇怪的问题:

当我从namelist中提取特定项目时,它只给我一个空文件夹,而不是实际的文件。

我的zip文件具有以下层次结构:

myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN。

因此,有很多我不想提取的前导文件夹。我知道如何从namelist中识别我想要的文件夹:

import os
import sys
import zipfile

try:
	zip_file_path = sys.argv[1]
except IndexError:
	sys.exit('未提供zip文件。')

archive = zipfile.ZipFile(zip_file_path)

for i, file in enumerate(archive.namelist()):
	if os.path.basename(file[:-1]).startswith('ABC-'): # 识别相关文件夹
		old_name = os.path.basename(file[:-1])
		new_name = 'new_%d'%i # 创建一个新名称
		
		archive.extract(file, new_name)

这确实提取了我想要的文件夹,但出于某种原因,提取的文件夹是空的。而且不仅如此:当我提取新文件夹时,它们包含了前导的文件夹A、B和C,原因我不知道...

这里有一个测试zip文件以供您参考:

import os
import shutil

prefolders = r'testzip\FolderA\FolderB\FolderC'

try:
	os.makedirs(prefolders)
except FileExistsError:
	pass

for i in 'ABC':
	try:
		new_folder = 'ABC-Folder%s'%i
		os.mkdir(os.path.join(prefolders, new_folder))
	except FileExistsError:
		pass

	for j in range(2):
		file_path = os.path.join(prefolders, new_folder, 'somefile%s.txt'%j)
		with open(file_path, 'w'): pass

shutil.make_archive('testzip', 'zip', 'testzip')
shutil.rmtree('testzip')

我以为这会花费大约10分钟,但我正在为此疯狂...

英文:

Yes, I have read the other posts on this subject, but I am running into a weird problem:

When I extract a certain item from the namelist, it only gives me an empty folder, not the actual files inside.

My zip file has the following hierarchy:

myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN.

So there are a lot of preceeding folders I do not wish to extract. I know how to identify the ones I want from the namelist:

import os
import sys
import zipfile

try:
	zip_file_path = sys.argv[1]
except IndexError:
	sys.exit('No zip file provided.')

archive = zipfile.ZipFile(zip_file_path)

for i,file in enumerate(archive.namelist()):
	if os.path.basename(file[:-1]).startswith('ABC-'): # identify relevant folders
		old_name = os.path.basename(file[:-1])
		new_name = 'new_%d'%i # Create a new name
		
		archive.extract(file, new_name)

This does extract the folders I want, however the extracted folders are empty for some reason. And not just that: When I extract the new folders, they contain the preceeding folders A,B and C for some reason.

I do not know why it does that...
Here's a test zip for your convenience:

import os
import shutil

prefolders = r'testzip\FolderA\FolderB\FolderC'

try:
	os.makedirs(prefolders)
except FileExistsError:
	pass


for i in 'ABC':
	try:
		new_folder = 'ABC-Folder%s'%i
		os.mkdir(os.path.join(prefolders,new_folder))
	except FileExistsError:
		pass

	for j in range(2):
		file_path = os.path.join(prefolders,new_folder,'somefile%s.txt'%j)
		with open(file_path,'w'): pass

shutil.make_archive('testzip', 'zip', 'testzip')
shutil.rmtree('testzip')

I thought this would take like 10 minutes and I am losing my mind over this...

答案1

得分: 1

你正在寻找以ABC-开头的basename(),这意味着你永远不会找到不以那个开头的文件。你示例中的文件以somefile开头。extract()只会提取以该名称命名的内容。在你的情况下,所有以ABC-开头的内容都是目录。

要查找路径中某个位置有以ABC-开头的目录的文件,你可以使用以下代码:

if os.path.basename(file) != '' and ('/ABC-' in os.path.dirname(file) or os.path.dirname(file).startswith('ABC-')):

(你可能需要在你的系统上将斜杠改为反斜杠。)

这仍然会提取文件和file中命名的所有父目录。如果你只想要new_n中的文件本身,那么你需要使用read()来读取条目,然后将数据写入所需的目标文件。

英文:

You're looking for the basename() to start with ABC-, which means you never find files that don't start with that. The files in your example start with somefile. extract() will only extract the one thing that is named. In your case, all of the things that start with ABC- are directories.

To find the files that have a directory somewhere in their path that starts with ABC-, you could:

    if os.path.basename(file) != '' and ('/ABC-' in os.path.dirname(file) or os.path.dirname(file).startswith('ABC-')):

(You may need to change the slash to a backslash on your system.)

This will still extract the file and all of the parent directories as named in file. If you want just the file by itself in new_n, then you will need to use read() on the entry, and then write the data to the desired destination file.

huangapple
  • 本文由 发表于 2023年8月10日 19:50:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76875486.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定