问题

我有一组Excel文件，我想要总结一些数据。每个Excel文件中的数据分布在5个工作表中。现在我想创建一个新的Excel文件，其中包含5个工作表，每个工作表中汇总了所有Excel文件的相应工作表数据。

我想要采用的方法是创建一个DataFrame的列表，其中每一行包含来自所有文件相应工作表的数据，然后将每一行连接起来，这样我最终会得到5个DataFrame，可以将它们写入新Excel文件的5个工作表中。我为此创建的代码如下：

import glob
import pandas as pd
from tkinter import filedialog

def select_base_path():
    root = filedialog.askdirectory(title='选择基本路径', mustexist=True)
    return root

if __name__ == '__main__':
    base_path = select_base_path()
    files = []
    for file in glob.glob(str(base_path) + '/**/10x 0.45 SFR average .xlsx', recursive=True):
        files.append(file)

    sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
    Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
    data_frames = [[]] * len(sheets)
    ids = []
    for k in range(len(files):
        ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#') + 3])

    for i, file in enumerate(files):
        if i == 0:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
        else:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')

    for m in range(len(sheets)):
        data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)

这个代码的问题在于，当它遍历Frames时，它不会写入到列表的列表中的单个位置Frames[j, i]，而是每次遍历工作表时都会写入到Frames[:, i]，因此会覆盖数据。这导致了在最后i的切片都是相同的情况。

当在调试器中查看之后（i=0，j=0），我已经在Frames[:, i]中得到了数据。我期望只在Frames[j, i]中有数据。我的误解在哪里？

英文:

I have a set of Excel files, where I want to summarize some data. The data in one Excel file is spread over 5 sheets. I now want to create a new excel file with 5 sheets, where on every sheet the data of all Excel files is summarized for the respective sheet.

The way I wanted to go, is to create a list of list of DataFrame, where on each row the data from a respective sheet of all files is collected and later on concatenate each row, so I end up with 5 DataFrames I can write to 5 sheets of a new Excel file. The code I created for this, looks like:

import glob
import pandas as pd
from tkinter import filedialog


def select_base_path():

    root = filedialog.askdirectory(
        title=&#39;Select base path&#39;,
        mustexist=True)

    return root


if __name__ == &#39;__main__&#39;:

    base_path = select_base_path()

    files = []
    for file in glob.glob(str(base_path) + &#39;\**\x 0.45 SFR average .xlsx&#39;, recursive=True):
        files.append(file)


    sheets = [&#39;Center&#39;, &#39;north west&#39;, &#39;south west&#39;, &#39;north east&#39;, &#39;south east&#39;]
    Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
    data_frames = [[]] * len(sheets)
    ids = []
    for k in range(len(files)):
        ids.append(&#39;Adapter &#39; + files[k][files[k].find(&#39;#&#39;):files[k].find(&#39;#&#39;)+3])

    for i, file in enumerate(files):
        if i == 0:
            for j, sheet in enumerate(sheets):
                if sheet == &#39;Center&#39;:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols=&#39;A,E,G,K,M,Q,S,W&#39;)
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols=&#39;A,E,G,K&#39;)
        else:
            for j, sheet in enumerate(sheets):
                if sheet == &#39;Center&#39;:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols=&#39;E,K,Q,W&#39;)
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols=&#39;E,K&#39;)

    for m in range(len(sheets)):
        data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)

The problem I am facing with this is that when it iterates through the Frames, it does not write to a single location Frames[j,i] in the list of list of DataFrame, but instead writes the data to Frames[:,i] and therefore overwriting the data, every time it is iterating through the sheets. This ends in the fact, that the slices in i are all identical in the end.

When having a look at the debugger, after the first pass (i=0, j=0) I already end up with data in Frames[:,i]. I expect just having data in Frames[j, i]. Where is my misconception here?

答案1

得分: 2

一个列表推导式生成新的不相关对象，可能会避免观察到的问题。

英文:

A list comprehension produces new unrelated objects and may avoid the observed problem

Frames2 = [[pd.DataFrame() for i in range(len(files))]
           for j in range(len(sheets))]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

DataFrame列表的列表会覆盖先前的值 (pandas, python)

问题

答案1

Pandas筛选一列，但仅当另一列小于指定值时。

在尝试使用pgadmin4连接本地数据库时遇到的问题。

如何正确访问Coinbase高级API

在数据框中通过另一列上的条件搜索数值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论