在pandas中有条件地向列表的列表中追加值。

huangapple go评论59阅读模式
英文:

conditionally append value to list of lists in pandas

问题

以下是您提供的代码的中文翻译:

import pandas as pd
df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 
df

       A                B
    0  1  [[1], [1], [1]]
    1  2  [[1], [1], [1]]
    2  3  [[1], [1], [1]]

# 尝试将B列中的第一个列表添加2
df['B'] = df['B'].mask(df.A == 2, df['B'].apply(lambda x: x[0].append(2)))
df
       A                         B
    0  1  [[1, 2, 2, 2], [1], [1]]
    1  2                      None
    2  3  [[1, 2, 2, 2], [1], [1]]

# 期望的结果是:
df['B'] = [[[1],[1],[1]],[[1,2],[1],[1]],[[1],[1],[1]]]
df

       A                   B
    0  1     [[1], [1], [1]]
    1  2  [[1, 2], [1], [1]]
    2  3     [[1], [1], [1]]

请注意,我已经按照您的要求,只返回代码部分的中文翻译。如果您有任何其他问题或需要进一步的帮助,请随时提问。

英文:

I'm trying to conditionally append a list of lists in pandas:

import pandas as pd
df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 
df

       A                B
    0  1  [[1], [1], [1]]
    1  2  [[1], [1], [1]]
    2  3  [[1], [1], [1]]

# attempting to append 1st list of lists in B column with 2
df['B'] = df['B'].mask(df.A == 2, df['B'].apply(lambda x: x[0].append(2)))
df
       A                         B
    0  1  [[1, 2, 2, 2], [1], [1]]
    1  2                      None
    2  3  [[1, 2, 2, 2], [1], [1]]

#expected result I'm hoping for is: 
df['B'] = [[[1],[1],[1]],[[1,2],[1],[1]],[[1],[1],[1]]]
df

       A                   B
    0  1     [[1], [1], [1]]
    1  2  [[1, 2], [1], [1]]
    2  3     [[1], [1], [1]]

答案1

得分: 2

尝试使用lambda函数条件性地修改列表的列表,如下所示:

import pandas as pd

df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 

df['B'] = df.apply(lambda row: row['B'] if row['A'] != 2 else [row['B'][0] + [2]] + row['B'][1:], axis=1)

print(df)

在这里,我们创建了一个新的列表,其中包括修改后的列表,然后返回它。使用apply方法,以axis=1对整个DataFrame进行逐行应用lambda函数。

英文:

try using a lambda function to conditionally modify the list of lists like this:

import pandas as pd

df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 

df['B'] = df.apply(lambda row: row['B'] if row['A'] != 2 else [row['B'][0] + [2]] + row['B'][1:], axis=1)

print(df)

here we create a new list of lists that includes the modified list and then return that. apply method is called on the entire DataFrame with axis=1 to apply the lambda function row-wise.

答案2

得分: 1

list.append 是原地操作,所以实际上返回的是 None 而不是列表。这就是为什么你的新 df 在第二行有 None

以下是一种向列表添加 2 的方法。我们取第二行中的第一个列表,添加 [2],然后展开其余的列表以形成预期输出:

df['B'].mask(df['A'].eq(2), lambda x: x.map(lambda x: [x[0] + [2], *x[1:]]))

输出:

0       [[1], [1], [1]]
1    [[1, 2], [1], [1]]
2       [[1], [1], [1]]
英文:

list.append works in place, so it actually returns None instead of a list. This is why your new df has None on the second row.

Below is a way to add 2 to the list. We take the first list in the second row and add [2], then unpack the rest of the lists to form the expected output:

df['B'].mask(df['A'].eq(2),lambda x: x.map(lambda x: [x[0] + [2],*x[1:]]))

Output:

0       [[1], [1], [1]]
1    [[1, 2], [1], [1]]
2       [[1], [1], [1]]

答案3

得分: 0

问题出在你生成数据框的方式上:

import pandas as pd
df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 
df

[[[1],[1],[1]]] * df.shape[0] 在Python中有点棘手。因为df 中不同行的所有[[1],[1],[1]]都指向相同的对象,即列表[[1], [1], [1]]

尝试这样做:

df.at[1, 'B'][0] = [1, 2] 

.at[]允许你同时使用索引号和列名进行访问,[0]选择访问后返回的列表的第一个元素。所以这里不需要使用lambda

有人可能认为只有第二行的'B'列列表中的第一个元素[[1], [1], [1]]被更改为[[1, 2], [1], [1]]。但如果你查看整个数据框:

df

它返回:

   A                   B
0  1  [[1, 2], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3  [[1, 2], [1], [1]]

因为'B'中的元素都指向相同的对象,它们一起被改变。

这就是为什么在生成这种结构时应避免使用[ ] * <number>。而应该使用列表推导式:

import pandas as pd
df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]] for _ in range(df.shape[0])] 
df

# 然后通过以下方式进行更改:
df.at[1, 'B'][0] = [1, 2]

df

避免在lambda中使用append

你可以尝试:

import pandas as pd
df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]] for _ in range(df.shape[0])] 
df

import copy

def myfun(x):
    l = copy.deepcopy(x)
    l[0] = l[0] + [2]
    return l
    
df.at[1, 'B'] = myfun(df.at[1, 'B']) 

df

你也可以使用np.where()来获取行索引:

df.at[np.where(df['A'] == 2)[0][0], 'B'] = myfun(df.at[np.where(df['A'] == 2)[0][0], 'B'])
英文:

The problem is the way how you generate your data frame:

import pandas as pd
df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
df[&#39;B&#39;] = [[[1],[1],[1]]] * df.shape[0] 
df

The [[[1],[1],[1]]] * df.shape[0] is a tricky thing in Python.
Because all [[1],[1],[1]] in the different rows of you df point to the
same object, a list [[1], [1], [1]].

Try this:

df.at[1, &#39;B&#39;][0] = [1, 2] 

.at[ ] allows you to access using index number and column name at the same time. the [0] chooses the first element of the list returned after access.
So no lambda needed here.

One thinks that only the first element in the second row's 'B' columns' list
[[1], [1], [1]] is changed to [[1, 2], [1], [1]].
but if you look at the entire data frame:

df

it returns:

   A                   B
0  1  [[1, 2], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3  [[1, 2], [1], [1]]

Because the elements in B point all to the identical object, they all get mutated at once.

That is why you should avoid [ ] * &lt;number&gt; when generating such constructs.

Instead, use e.g. a list comprehension:

import pandas as pd
df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
df[&#39;B&#39;] = [[[1],[1],[1]] for _ in range(df.shape[0])] 
df

# and mutate by:
df.at[1, &#39;B&#39;][0] = [1, 2]

df

   A                   B
0  1     [[1], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3     [[1], [1], [1]]

Avoid append in your lambda

How about:

import pandas as pd
df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
df[&#39;B&#39;] = [[[1],[1],[1]] for _ in range(df.shape[0])] 
df


import copy

def myfun(x):
    l = copy.deepcopy(x)
    l[0] = l[0] + [2]
    return l
    
df.at[1, &#39;B&#39;] = myfun(df.at[1, &#39;B&#39;]) 

df

   A                   B
0  1     [[1], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3     [[1], [1], [1]]

One could use np.where() to get row index:

df.at[np.where(df[&#39;A&#39;] == 2)[0][0], &#39;B&#39;] = myfun(df.at[np.where(df[&#39;A&#39;] == 2)[0][0], &#39;B&#39;])

答案4

得分: 0

以下是您提供的代码的翻译部分:

import pandas as pd
import copy

df = pd.DataFrame(data={'A': [1, 2, 3]})
df['B'] = [[[1],[1],[1]]] * df.shape[0] 

i, j = 1, 1
l = copy.deepcopy(df.iat[i, j]) # 使用深拷贝以避免指针问题
l[0] = [1,2]
df.iat[i, j] = l

print(df)
   A                   B
0  1     [[1], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3     [[1], [1], [1]]
英文:
import pandas as pd
import copy

df = pd.DataFrame(data={&#39;A&#39;: [1, 2, 3]})
df[&#39;B&#39;] = [[[1],[1],[1]]] * df.shape[0] 


i,j = 1,1
l = copy.deepcopy(df.iat[i, j]) # Deepcopy to avoid pointer problem
l[0] = [1,2]
df.iat[i, j] = l

print(df)
   A                   B
0  1     [[1], [1], [1]]
1  2  [[1, 2], [1], [1]]
2  3     [[1], [1], [1]]

huangapple
  • 本文由 发表于 2023年4月20日 06:00:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059118.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定