问题

我有这段代码，用于从URL下载zip文件并解压缩内容。但是Excel文件的名称每个月都会更改。这将导致创建重复文件。而且不可能每次新数据发布到URL时都能预测到名称。

zip_file_url = "https://www.insee.fr/en/statistiques/series/xlsx/famille/102391902"

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))

z.extractall()

最后，我需要在Pandas中加载电子表格。是否可以在不知道zip文件中电子表格名称的情况下完成？

是否可能每次都重命名电子表格，以便覆盖文件并避免创建重复文件？另外，在将来如何加载Pandas中的电子表格，而不知道文件名是什么？

因此，最好的方法是提取文件并以相同的文件名保存并覆盖以前的版本。这意味着我们也知道要在Pandas中加载的电子表格的名称。

英文:

I have this code to download a zip file from a URL and extract the contents.
But the name of the excel file changes every month. This would result in duplicates getting created. And it is not possible to predict the names each time new data gets published in the URL.

zip_file_url = &quot;https://www.insee.fr/en/statistiques/series/xlsx/famille/102391902&quot;  

import requests, zipfile, io  
r = requests.get(zip_file_url)  
z = zipfile.ZipFile(io.BytesIO(r.content))
    
z.extractall()

In the end I need to load the spreadsheet in Pandas. Can it be done without knowing the name of the spreadsheet within the zip file?

Is it possible to rename the Spreadsheet every time so that the file is overwritten and no duplicates are created? Also, how to load the spreadsheet in pandas without knowing the file name in future?

So, the best way would be to extract the file and save under same file name and overwrite the previous version. This means we also know the name of the spreadsheet to be loaded in pandas.

答案1

得分: 1

import os
import requests
import zipfile
import io

# 定义要下载的zip文件的URL
zip_file_url = "https://www.insee.fr/en/statistiques/series/xlsx/famille/102391902"

# 发送请求并获取zip文件内容
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))

# 解压zip文件内容
z.extractall()

# 获取zip归档中的文件名列表
zip_file_names = z.namelist()

# 检查当前目录中是否有一个.xlsx文件
xlsx_files = 下载地址
if len(xlsx_files) == 2:
    for name in xlsx_files:
        if name not in zip_file_names:
            xlsx_file_name = name
    # 用zip归档中的文件覆盖现有的.xlsx文件
    if len(zip_file_names) == 1 and zip_file_names[0].endswith(".xlsx"):
        zip_xlsx_file_name = zip_file_names[0]
        os.replace(zip_xlsx_file_name, xlsx_file_name)
        print("文件已成功覆盖。")

英文:

import os
import requests
import zipfile
import io

zip_file_url = &quot;https://www.insee.fr/en/statistiques/series/xlsx/famille/102391902&quot;


r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))


z.extractall()

# Get the list of file names in the zip archive
zip_file_names = z.namelist()

# Check if there is one .xlsx file in the current directory
xlsx_files = 下载地址
if len(xlsx_files) == 2:
    for name in xlsx_files:
        if name not in zip_file_names:
            xlsx_file_name = name
    # Overwrite the existing .xlsx file with the file from the zip archive
    if len(zip_file_names) == 1 and zip_file_names[0].endswith(&quot;.xlsx&quot;):
        zip_xlsx_file_name = zip_file_names[0]
        os.replace(zip_xlsx_file_name, xlsx_file_name)
        print(&quot;File overwritten successfully.&quot;)

the idea is that you know there is exactly one xlsx file in the current directory, so u can get it's name (with

)

afterwards, you know that there is exactly one xlsx file in your zip archive too, so you know which file to replace by which file

if len(xlsx_files) == 2:

is useful because the first time you use the script, there will only be one xlsx file in the directory

hope this is clear for you, you may need to adapt this code to your use case

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中下载一个Zip文件并解压其内容

问题

答案1

最佳实践是更新包含列表和字典的字典列表中的字段。

如何在我的数据上运行Hugging Face的预训练模型？

Python-Selenium: 如何切换到位于shadow DOM内部的 ‘switch_to.active_element’ 输入元素？

OpenAI认证的问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论