How can I deep copy a pandas DataFrame where some values are lists, make changes in the copied dataframe, and not change the list in the original?

huangapple go评论81阅读模式
英文:

How can I deep copy a pandas DataFrame where some values are lists, make changes in the copied dataframe, and not change the list in the original?

问题

如果一个Pandas DataFrame具有包含列表的列,并且该DataFrame被(深度)复制,新DataFrame上该列的更改会影响原始DataFrame。这不受如何初始化该列的方式的影响。如何避免这种情况?这是否是在DataFrame中使用列表的固有缺陷?

  1. import pandas as pd
  2. import numpy as np
  3. df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
  4. df['Flavors'] = np.empty((len(df), 0)).tolist()
  5. df['Colors'] = [[] for _ in range(len(df))]
  6. df['Friends'] = [[],[],[]]
  7. print(df)
  8. df2 = df.copy(deep=True)
  9. df2.loc[0, 'Flavors'].append('Apple')
  10. df2.loc[0, 'Colors'].append('Red')
  11. df2.loc[0, 'Friends'].append('Rick')
  12. print(df)
  13. print(df2)

给出的输出是:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [Apple] [Red] [Rick]
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []

我期望的输出是这样的:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [Apple] [] []
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []
英文:

If a Pandas DataFrame has a column where each row contains a list, and that DataFrame is (deep) copied, the changes in the new DataFrame on that column effect the original DataFrame. This happens regardless of how I instantiate the column. How can this be avoided? Is this an inherent flaw to using lists inside a DataFrame?

  1. import pandas as pd
  2. import numpy as np
  3. df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
  4. df['Flavors'] = np.empty((len(df), 0)).tolist()
  5. df['Colors'] = [[] for _ in range(len(df))]
  6. df['Friends'] = [[],[],[]]
  7. print(df)
  8. df2 = df.copy(deep=True)
  9. df2.loc[0, 'Flavors'].append('Apple')
  10. df2.loc[0, 'Colors'].append('Red')
  11. df2.loc[0, 'Friends'].append('Rick')
  12. print(df)
  13. print(df2)

Gives output:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [Apple] [Red] [Rick]
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []

I would expect the output to be this:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [Apple] [] []
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []

答案1

得分: 3

以下是翻译好的内容:

  1. df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
  2. df['Flavors'] = np.empty((len(df), 0)).tolist()
  3. df['Colors'] = [[] for _ in range(len df))
  4. df['Friends'] = [[], [], []]
  5. print(df)
  6. with io.BytesIO() as buf:
  7. df.to_pickle(buf)
  8. buf.seek(0)
  9. df2 = pd.read_pickle(buf)
  10. df2.loc[0, 'Flavors'].append('Apple')
  11. df2.loc[0, 'Colors'].append('Red')
  12. df2.loc[0, 'Friends'].append('Rick')
  13. print(df)
  14. print(df2)

输出:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [] [] []
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []
英文:

It's not the best solution but an idea is to store your dataframe on disk (or memory) then reload it from buffer:

  1. df = pd.DataFrame({'Person': ['Jack', 'Bob', 'Alice'], 'Age': [20, 30, 40]})
  2. df['Flavors'] = np.empty((len(df), 0)).tolist()
  3. df['Colors'] = [[] for _ in range(len(df))]
  4. df['Friends'] = [[],[],[]]
  5. print(df)
  6. with io.BytesIO() as buf:
  7. df.to_pickle(buf)
  8. buf.seek(0)
  9. df2 = pd.read_pickle(buf)
  10. df2.loc[0, 'Flavors'].append('Apple')
  11. df2.loc[0, 'Colors'].append('Red')
  12. df2.loc[0, 'Friends'].append('Rick')
  13. print(df)
  14. print(df2)

Output:

  1. Person Age Flavors Colors Friends
  2. 0 Jack 20 [] [] []
  3. 1 Bob 30 [] [] []
  4. 2 Alice 40 [] [] []
  5. Person Age Flavors Colors Friends
  6. 0 Jack 20 [] [] []
  7. 1 Bob 30 [] [] []
  8. 2 Alice 40 [] [] []
  9. Person Age Flavors Colors Friends
  10. 0 Jack 20 [Apple] [Red] [Rick]
  11. 1 Bob 30 [] [] []
  12. 2 Alice 40 [] [] []

huangapple
  • 本文由 发表于 2023年2月16日 04:28:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465146.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定