TarFile.extractall基本路径错误,python?

huangapple go评论48阅读模式
英文:

TarFile.extractall base path wrong, python?

问题

我有以下代码,用于下载并解压数据集到一个目录中。问题在于以下这行代码:

housing_tarball.extractall(path='datasets') 

如果我将其设置为path='dataset',它将被解压到:/dataset/housing。

如果我将其设置为path='dataset/housing/',那么它将被解压到:/dataset/housing/housing目录中。

因此,它会自动添加housing目录并将其放在其中,而无需在路径中指定它。它是否从tarfile所在的路径获取路径:tarball_path = Path("datasets/housing.tgz")?

以下是完整的代码:

from pathlib import Path
import pandas as pd
import tarfile
import urllib.request
import os

def load_housing_data():
    tarball_path = Path("datasets/housing/housing.tgz")

    if not tarball_path.is_file():
        Path("datasets/housing").mkdir(parents=True, exist_ok=True) # 如果目录不存在,则创建目录
        url = "https://github.com/ageron/data/raw/main/housing.tgz"

        with urllib.request.urlopen(url) as response, tarball_path.open(mode="wb") as tarball_file: # 以二进制写入模式打开文件
            tarball_file.write(response.read()) # 将响应写入文件

    with tarfile.open(tarball_path) as housing_tarball: # 打开tarball
        housing_tarball.extractall(path='datasets') # 解压所有文件到datasets目录

    # 删除tarball文件
    os.remove(tarball_path)

    return pd.read_csv("datasets/housing/housing.csv")
英文:

I have the following code which downloads and extracts a dataset into a directory. The problem is the line

housing_tarball.extractall(path='datasets') 

If I set it to path='dataset' it will extract it into: /dataset/housing.

if I set it to path='dataset/housing/' then it will extract it into: /dataset/housing/housing directory

so it automatically adds the housing directory and puts it in there without specifing it in the path. Does it take the path from where the tarfile lies: tarball_path = Path("datasets/housing.tgz") ?

here is the complete code:

from pathlib import Path
import pandas as pd
import tarfile
import urllib.request
import os

def load_housing_data():
    tarball_path = Path("datasets/housing/housing.tgz")

    if not tarball_path.is_file():
        Path("datasets/housing").mkdir(parents=True, exist_ok=True) # create the directory if it does not exist
        url = "https://github.com/ageron/data/raw/main/housing.tgz"

        with urllib.request.urlopen(url) as response, tarball_path.open(mode="wb") as tarball_file: # open the file for writing in binary mode
            tarball_file.write(response.read()) # write the response to the file

    with tarfile.open(tarball_path) as housing_tarball: # open the tarball
        housing_tarball.extractall(path='datasets') # extract all the files to the datasets directory

    # remove the tarball flie
    os.remove(tarball_path)

    return pd.read_csv("datasets/housing/housing.csv")

答案1

得分: 1

是的,它会自动创建一个 /housing/ 目录。

如果你考虑一下,这并不令人意外。tar 包通常是压缩的文件夹,因此它们应该解压缩为文件夹。我猜想你正在使用的 tar 包是由某人将一个文件夹打包而不是一个单个文件。

英文:

Yes, it does automatically create a /housing/ directory

If you think about it, this isn't really surprising. Tarballs are usually compressed folders, therefore they should extract as folders. I assume that whoever created the tarball you're using, tarred a folder as opposed to a single file.

huangapple
  • 本文由 发表于 2023年5月29日 00:37:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76352518.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定