英文:
TarFile.extractall base path wrong, python?
问题
我有以下代码,用于下载并解压数据集到一个目录中。问题在于以下这行代码:
housing_tarball.extractall(path='datasets')
如果我将其设置为path='dataset'
,它将被解压到:/dataset/housing。
如果我将其设置为path='dataset/housing/'
,那么它将被解压到:/dataset/housing/housing目录中。
因此,它会自动添加housing目录并将其放在其中,而无需在路径中指定它。它是否从tarfile所在的路径获取路径:tarball_path = Path("datasets/housing.tgz")?
以下是完整的代码:
from pathlib import Path
import pandas as pd
import tarfile
import urllib.request
import os
def load_housing_data():
tarball_path = Path("datasets/housing/housing.tgz")
if not tarball_path.is_file():
Path("datasets/housing").mkdir(parents=True, exist_ok=True) # 如果目录不存在,则创建目录
url = "https://github.com/ageron/data/raw/main/housing.tgz"
with urllib.request.urlopen(url) as response, tarball_path.open(mode="wb") as tarball_file: # 以二进制写入模式打开文件
tarball_file.write(response.read()) # 将响应写入文件
with tarfile.open(tarball_path) as housing_tarball: # 打开tarball
housing_tarball.extractall(path='datasets') # 解压所有文件到datasets目录
# 删除tarball文件
os.remove(tarball_path)
return pd.read_csv("datasets/housing/housing.csv")
英文:
I have the following code which downloads and extracts a dataset into a directory. The problem is the line
housing_tarball.extractall(path='datasets')
If I set it to path='dataset' it will extract it into: /dataset/housing.
if I set it to path='dataset/housing/' then it will extract it into: /dataset/housing/housing directory
so it automatically adds the housing directory and puts it in there without specifing it in the path. Does it take the path from where the tarfile lies: tarball_path = Path("datasets/housing.tgz") ?
here is the complete code:
from pathlib import Path
import pandas as pd
import tarfile
import urllib.request
import os
def load_housing_data():
tarball_path = Path("datasets/housing/housing.tgz")
if not tarball_path.is_file():
Path("datasets/housing").mkdir(parents=True, exist_ok=True) # create the directory if it does not exist
url = "https://github.com/ageron/data/raw/main/housing.tgz"
with urllib.request.urlopen(url) as response, tarball_path.open(mode="wb") as tarball_file: # open the file for writing in binary mode
tarball_file.write(response.read()) # write the response to the file
with tarfile.open(tarball_path) as housing_tarball: # open the tarball
housing_tarball.extractall(path='datasets') # extract all the files to the datasets directory
# remove the tarball flie
os.remove(tarball_path)
return pd.read_csv("datasets/housing/housing.csv")
答案1
得分: 1
是的,它会自动创建一个 /housing/
目录。
如果你考虑一下,这并不令人意外。tar 包通常是压缩的文件夹,因此它们应该解压缩为文件夹。我猜想你正在使用的 tar 包是由某人将一个文件夹打包而不是一个单个文件。
英文:
Yes, it does automatically create a /housing/
directory
If you think about it, this isn't really surprising. Tarballs are usually compressed folders, therefore they should extract as folders. I assume that whoever created the tarball you're using, tarred a folder as opposed to a single file.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论