How can I deep copy a pandas DataFrame where some values are lists, make changes in the copied dataframe, and not change the list in the original?

huangapple go评论59阅读模式
英文:

How can I deep copy a pandas DataFrame where some values are lists, make changes in the copied dataframe, and not change the list in the original?

问题

如果一个Pandas DataFrame具有包含列表的列,并且该DataFrame被(深度)复制,新DataFrame上该列的更改会影响原始DataFrame。这不受如何初始化该列的方式的影响。如何避免这种情况?这是否是在DataFrame中使用列表的固有缺陷?

import pandas as pd
import numpy as np

df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
df['Flavors'] = np.empty((len(df), 0)).tolist()
df['Colors'] = [[] for _ in range(len(df))]
df['Friends'] = [[],[],[]]
print(df)
df2 = df.copy(deep=True)
df2.loc[0, 'Flavors'].append('Apple')
df2.loc[0, 'Colors'].append('Red')
df2.loc[0, 'Friends'].append('Rick')
print(df)
print(df2)

给出的输出是:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []

我期望的输出是这样的:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]     []      []
1    Bob   30       []     []      []
2  Alice   40       []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []
英文:

If a Pandas DataFrame has a column where each row contains a list, and that DataFrame is (deep) copied, the changes in the new DataFrame on that column effect the original DataFrame. This happens regardless of how I instantiate the column. How can this be avoided? Is this an inherent flaw to using lists inside a DataFrame?

import pandas as pd
import numpy as np

df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
df['Flavors'] = np.empty((len(df), 0)).tolist()
df['Colors'] = [[] for _ in range(len(df))]
df['Friends'] = [[],[],[]]
print(df)
df2 = df.copy(deep=True)
df2.loc[0, 'Flavors'].append('Apple')
df2.loc[0, 'Colors'].append('Red')
df2.loc[0, 'Friends'].append('Rick')
print(df)
print(df2)

Gives output:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []

I would expect the output to be this:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]     []      []
1    Bob   30       []     []      []
2  Alice   40       []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []

答案1

得分: 3

以下是翻译好的内容:

df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
df['Flavors'] = np.empty((len(df), 0)).tolist()
df['Colors'] = [[] for _ in range(len df)) 
df['Friends'] = [[], [], []]
print(df)

with io.BytesIO() as buf:
    df.to_pickle(buf)
    buf.seek(0)
    df2 = pd.read_pickle(buf)

df2.loc[0, 'Flavors'].append('Apple')
df2.loc[0, 'Colors'].append('Red')
df2.loc[0, 'Friends'].append('Rick')
print(df)
print(df2)

输出:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []
英文:

It's not the best solution but an idea is to store your dataframe on disk (or memory) then reload it from buffer:

df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
df['Flavors'] = np.empty((len(df), 0)).tolist()
df['Colors'] = [[] for _ in range(len(df))]
df['Friends'] = [[],[],[]]
print(df)

with io.BytesIO() as buf:
    df.to_pickle(buf)
    buf.seek(0)
    df2 = pd.read_pickle(buf)

df2.loc[0, 'Flavors'].append('Apple')
df2.loc[0, 'Colors'].append('Red')
df2.loc[0, 'Friends'].append('Rick')
print(df)
print(df2)

Output:

  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age Flavors Colors Friends
0   Jack   20      []     []      []
1    Bob   30      []     []      []
2  Alice   40      []     []      []
  Person  Age  Flavors Colors Friends
0   Jack   20  [Apple]  [Red]  [Rick]
1    Bob   30       []     []      []
2  Alice   40       []     []      []

huangapple
  • 本文由 发表于 2023年2月16日 04:28:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465146.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定