DataFrame列表的列表会覆盖先前的值 (pandas, python)

huangapple go评论70阅读模式
英文:

List of list of DataFrame overwrites previous values (pandas, python)

问题

我有一组Excel文件,我想要总结一些数据。每个Excel文件中的数据分布在5个工作表中。现在我想创建一个新的Excel文件,其中包含5个工作表,每个工作表中汇总了所有Excel文件的相应工作表数据。

我想要采用的方法是创建一个DataFrame的列表,其中每一行包含来自所有文件相应工作表的数据,然后将每一行连接起来,这样我最终会得到5个DataFrame,可以将它们写入新Excel文件的5个工作表中。我为此创建的代码如下:

import glob
import pandas as pd
from tkinter import filedialog

def select_base_path():
    root = filedialog.askdirectory(title='选择基本路径', mustexist=True)
    return root

if __name__ == '__main__':
    base_path = select_base_path()
    files = []
    for file in glob.glob(str(base_path) + '/**/10x 0.45 SFR average .xlsx', recursive=True):
        files.append(file)

    sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
    Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
    data_frames = [[]] * len(sheets)
    ids = []
    for k in range(len(files):
        ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#') + 3])

    for i, file in enumerate(files):
        if i == 0:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
        else:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')

    for m in range(len(sheets)):
        data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)

这个代码的问题在于,当它遍历Frames时,它不会写入到列表的列表中的单个位置Frames[j, i],而是每次遍历工作表时都会写入到Frames[:, i],因此会覆盖数据。这导致了在最后i的切片都是相同的情况。

当在调试器中查看之后(i=0,j=0),我已经在Frames[:, i]中得到了数据。我期望只在Frames[j, i]中有数据。我的误解在哪里?

英文:

I have a set of Excel files, where I want to summarize some data. The data in one Excel file is spread over 5 sheets. I now want to create a new excel file with 5 sheets, where on every sheet the data of all Excel files is summarized for the respective sheet.

The way I wanted to go, is to create a list of list of DataFrame, where on each row the data from a respective sheet of all files is collected and later on concatenate each row, so I end up with 5 DataFrames I can write to 5 sheets of a new Excel file. The code I created for this, looks like:

import glob
import pandas as pd
from tkinter import filedialog


def select_base_path():

    root = filedialog.askdirectory(
        title='Select base path',
        mustexist=True)

    return root


if __name__ == '__main__':

    base_path = select_base_path()

    files = []
    for file in glob.glob(str(base_path) + '\**\x 0.45 SFR average .xlsx', recursive=True):
        files.append(file)


    sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
    Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
    data_frames = [[]] * len(sheets)
    ids = []
    for k in range(len(files)):
        ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#')+3])

    for i, file in enumerate(files):
        if i == 0:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
        else:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')

    for m in range(len(sheets)):
        data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)

The problem I am facing with this is that when it iterates through the Frames, it does not write to a single location Frames[j,i] in the list of list of DataFrame, but instead writes the data to Frames[:,i] and therefore overwriting the data, every time it is iterating through the sheets. This ends in the fact, that the slices in i are all identical in the end.

When having a look at the debugger, after the first pass (i=0, j=0) I already end up with data in Frames[:,i]. I expect just having data in Frames[j, i]. Where is my misconception here?

DataFrame列表的列表会覆盖先前的值 (pandas, python)

答案1

得分: 2

一个列表推导式生成新的不相关对象,可能会避免观察到的问题。

英文:

A list comprehension produces new unrelated objects and may avoid the observed problem

Frames2 = [[pd.DataFrame() for i in range(len(files))]
           for j in range(len(sheets))]

huangapple
  • 本文由 发表于 2023年3月7日 17:34:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660182.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定