在pandas中有条件地向列表的列表中追加值。

huangapple go评论85阅读模式
英文:

conditionally append value to list of lists in pandas

问题

以下是您提供的代码的中文翻译:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  4. df
  5. A B
  6. 0 1 [[1], [1], [1]]
  7. 1 2 [[1], [1], [1]]
  8. 2 3 [[1], [1], [1]]
  9. # 尝试将B列中的第一个列表添加2
  10. df['B'] = df['B'].mask(df.A == 2, df['B'].apply(lambda x: x[0].append(2)))
  11. df
  12. A B
  13. 0 1 [[1, 2, 2, 2], [1], [1]]
  14. 1 2 None
  15. 2 3 [[1, 2, 2, 2], [1], [1]]
  16. # 期望的结果是:
  17. df['B'] = [[[1],[1],[1]],[[1,2],[1],[1]],[[1],[1],[1]]]
  18. df
  19. A B
  20. 0 1 [[1], [1], [1]]
  21. 1 2 [[1, 2], [1], [1]]
  22. 2 3 [[1], [1], [1]]

请注意,我已经按照您的要求,只返回代码部分的中文翻译。如果您有任何其他问题或需要进一步的帮助,请随时提问。

英文:

I'm trying to conditionally append a list of lists in pandas:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  4. df
  5. A B
  6. 0 1 [[1], [1], [1]]
  7. 1 2 [[1], [1], [1]]
  8. 2 3 [[1], [1], [1]]
  9. # attempting to append 1st list of lists in B column with 2
  10. df['B'] = df['B'].mask(df.A == 2, df['B'].apply(lambda x: x[0].append(2)))
  11. df
  12. A B
  13. 0 1 [[1, 2, 2, 2], [1], [1]]
  14. 1 2 None
  15. 2 3 [[1, 2, 2, 2], [1], [1]]
  16. #expected result I'm hoping for is:
  17. df['B'] = [[[1],[1],[1]],[[1,2],[1],[1]],[[1],[1],[1]]]
  18. df
  19. A B
  20. 0 1 [[1], [1], [1]]
  21. 1 2 [[1, 2], [1], [1]]
  22. 2 3 [[1], [1], [1]]

答案1

得分: 2

尝试使用lambda函数条件性地修改列表的列表,如下所示:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  4. df['B'] = df.apply(lambda row: row['B'] if row['A'] != 2 else [row['B'][0] + [2]] + row['B'][1:], axis=1)
  5. print(df)

在这里,我们创建了一个新的列表,其中包括修改后的列表,然后返回它。使用apply方法,以axis=1对整个DataFrame进行逐行应用lambda函数。

英文:

try using a lambda function to conditionally modify the list of lists like this:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  4. df['B'] = df.apply(lambda row: row['B'] if row['A'] != 2 else [row['B'][0] + [2]] + row['B'][1:], axis=1)
  5. print(df)

here we create a new list of lists that includes the modified list and then return that. apply method is called on the entire DataFrame with axis=1 to apply the lambda function row-wise.

答案2

得分: 1

list.append 是原地操作,所以实际上返回的是 None 而不是列表。这就是为什么你的新 df 在第二行有 None

以下是一种向列表添加 2 的方法。我们取第二行中的第一个列表,添加 [2],然后展开其余的列表以形成预期输出:

  1. df['B'].mask(df['A'].eq(2), lambda x: x.map(lambda x: [x[0] + [2], *x[1:]]))

输出:

  1. 0 [[1], [1], [1]]
  2. 1 [[1, 2], [1], [1]]
  3. 2 [[1], [1], [1]]
英文:

list.append works in place, so it actually returns None instead of a list. This is why your new df has None on the second row.

Below is a way to add 2 to the list. We take the first list in the second row and add [2], then unpack the rest of the lists to form the expected output:

  1. df['B'].mask(df['A'].eq(2),lambda x: x.map(lambda x: [x[0] + [2],*x[1:]]))

Output:

  1. 0 [[1], [1], [1]]
  2. 1 [[1, 2], [1], [1]]
  3. 2 [[1], [1], [1]]

答案3

得分: 0

问题出在你生成数据框的方式上:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  4. df

[[[1],[1],[1]]] * df.shape[0] 在Python中有点棘手。因为df 中不同行的所有[[1],[1],[1]]都指向相同的对象,即列表[[1], [1], [1]]

尝试这样做:

  1. df.at[1, 'B'][0] = [1, 2]

.at[]允许你同时使用索引号和列名进行访问,[0]选择访问后返回的列表的第一个元素。所以这里不需要使用lambda

有人可能认为只有第二行的'B'列列表中的第一个元素[[1], [1], [1]]被更改为[[1, 2], [1], [1]]。但如果你查看整个数据框:

  1. df

它返回:

  1. A B
  2. 0 1 [[1, 2], [1], [1]]
  3. 1 2 [[1, 2], [1], [1]]
  4. 2 3 [[1, 2], [1], [1]]

因为'B'中的元素都指向相同的对象,它们一起被改变。

这就是为什么在生成这种结构时应避免使用[ ] * <number>。而应该使用列表推导式:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]] for _ in range(df.shape[0])]
  4. df
  5. # 然后通过以下方式进行更改:
  6. df.at[1, 'B'][0] = [1, 2]
  7. df

避免在lambda中使用append

你可以尝试:

  1. import pandas as pd
  2. df = pd.DataFrame(data={'A': [1, 2, 3]})
  3. df['B'] = [[[1],[1],[1]] for _ in range(df.shape[0])]
  4. df
  5. import copy
  6. def myfun(x):
  7. l = copy.deepcopy(x)
  8. l[0] = l[0] + [2]
  9. return l
  10. df.at[1, 'B'] = myfun(df.at[1, 'B'])
  11. df

你也可以使用np.where()来获取行索引:

  1. df.at[np.where(df['A'] == 2)[0][0], 'B'] = myfun(df.at[np.where(df['A'] == 2)[0][0], 'B'])
英文:

The problem is the way how you generate your data frame:

  1. import pandas as pd
  2. df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
  3. df[&#39;B&#39;] = [[[1],[1],[1]]] * df.shape[0]
  4. df

The [[[1],[1],[1]]] * df.shape[0] is a tricky thing in Python.
Because all [[1],[1],[1]] in the different rows of you df point to the
same object, a list [[1], [1], [1]].

Try this:

  1. df.at[1, &#39;B&#39;][0] = [1, 2]

.at[ ] allows you to access using index number and column name at the same time. the [0] chooses the first element of the list returned after access.
So no lambda needed here.

One thinks that only the first element in the second row's 'B' columns' list
[[1], [1], [1]] is changed to [[1, 2], [1], [1]].
but if you look at the entire data frame:

  1. df

it returns:

  1. A B
  2. 0 1 [[1, 2], [1], [1]]
  3. 1 2 [[1, 2], [1], [1]]
  4. 2 3 [[1, 2], [1], [1]]

Because the elements in B point all to the identical object, they all get mutated at once.

That is why you should avoid [ ] * &lt;number&gt; when generating such constructs.

Instead, use e.g. a list comprehension:

  1. import pandas as pd
  2. df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
  3. df[&#39;B&#39;] = [[[1],[1],[1]] for _ in range(df.shape[0])]
  4. df
  5. # and mutate by:
  6. df.at[1, &#39;B&#39;][0] = [1, 2]
  7. df
  8. A B
  9. 0 1 [[1], [1], [1]]
  10. 1 2 [[1, 2], [1], [1]]
  11. 2 3 [[1], [1], [1]]

Avoid append in your lambda

How about:

  1. import pandas as pd
  2. df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
  3. df[&#39;B&#39;] = [[[1],[1],[1]] for _ in range(df.shape[0])]
  4. df
  5. import copy
  6. def myfun(x):
  7. l = copy.deepcopy(x)
  8. l[0] = l[0] + [2]
  9. return l
  10. df.at[1, &#39;B&#39;] = myfun(df.at[1, &#39;B&#39;])
  11. df
  12. A B
  13. 0 1 [[1], [1], [1]]
  14. 1 2 [[1, 2], [1], [1]]
  15. 2 3 [[1], [1], [1]]

One could use np.where() to get row index:

  1. df.at[np.where(df[&#39;A&#39;] == 2)[0][0], &#39;B&#39;] = myfun(df.at[np.where(df[&#39;A&#39;] == 2)[0][0], &#39;B&#39;])

答案4

得分: 0

以下是您提供的代码的翻译部分:

  1. import pandas as pd
  2. import copy
  3. df = pd.DataFrame(data={'A': [1, 2, 3]})
  4. df['B'] = [[[1],[1],[1]]] * df.shape[0]
  5. i, j = 1, 1
  6. l = copy.deepcopy(df.iat[i, j]) # 使用深拷贝以避免指针问题
  7. l[0] = [1,2]
  8. df.iat[i, j] = l
  9. print(df)
  1. A B
  2. 0 1 [[1], [1], [1]]
  3. 1 2 [[1, 2], [1], [1]]
  4. 2 3 [[1], [1], [1]]
英文:
  1. import pandas as pd
  2. import copy
  3. df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
  4. df[&#39;B&#39;] = [[[1],[1],[1]]] * df.shape[0]
  5. i,j = 1,1
  6. l = copy.deepcopy(df.iat[i, j]) # Deepcopy to avoid pointer problem
  7. l[0] = [1,2]
  8. df.iat[i, j] = l
  9. print(df)
  1. A B
  2. 0 1 [[1], [1], [1]]
  3. 1 2 [[1, 2], [1], [1]]
  4. 2 3 [[1], [1], [1]]

huangapple
  • 本文由 发表于 2023年4月20日 06:00:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059118.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定