pandas:添加一个列,其值在前一行中可用但在当前行不可用,来自另一列

huangapple go评论75阅读模式
英文:

pandas: add column whose value is available in previous row but not in current, of another column

问题

假设这是我的 `df`:
```python
{'accuracy': [0.773, 0.841, 0.862, 0.874, 0.883, 0.913],
 'code': [('D',),('D', 'F'),('B', 'D', 'F'),
  ('B', 'F', 'K'), ('B', 'F', 'I', 'K'),
  ('F', 'I', 'K')]}

df
   accuracy   	    code
0 	0.773 	        (D,)
1 	0.841 	      (D, F)
2 	0.862 	   (B, D, F)
3 	0.874 	   (B, F, K)
4 	0.883 	(B, F, I, K)
5 	0.913 	   (F, I, K)

我想添加一列 dropped,其值是在当前行中前一行的 code 中不存在的项目。

期望结果:

 	accuracy 	    code 	dropped
0 	0.773 	        (D,) 	  -
1 	0.841 	      (D, F) 	  -
2 	0.862 	   (B, D, F) 	  -
3 	0.874 	   (B, F, K) 	  D
4 	0.883 	(B, F, I, K) 	  -
5 	0.913 	   (F, I, K) 	  B

<details>
<summary>英文:</summary>

Suppose this is my `df`:
```python
{&#39;accuracy&#39;: [0.773, 0.841, 0.862, 0.874, 0.883, 0.913],
 &#39;code&#39;: [(&#39;D&#39;,),(&#39;D&#39;, &#39;F&#39;),(&#39;B&#39;, &#39;D&#39;, &#39;F&#39;),
  (&#39;B&#39;, &#39;F&#39;, &#39;K&#39;), (&#39;B&#39;, &#39;F&#39;, &#39;I&#39;, &#39;K&#39;),
  (&#39;F&#39;, &#39;I&#39;, &#39;K&#39;)]}

df
   accuracy   	    code
0 	0.773 	        (D,)
1 	0.841 	      (D, F)
2 	0.862 	   (B, D, F)
3 	0.874 	   (B, F, K)
4 	0.883 	(B, F, I, K)
5 	0.913 	   (F, I, K)

I would like to add a column dropped whose value is the item in code in previous row is not available in the current row.

Expected:

 	accuracy 	    code 	dropped
0 	0.773 	        (D,) 	  -
1 	0.841 	      (D, F) 	  -
2 	0.862 	   (B, D, F) 	  -
3 	0.874 	   (B, F, K) 	  D
4 	0.883 	(B, F, I, K) 	  -
5 	0.913 	   (F, I, K) 	  B

答案1

得分: 4

以下是代码的翻译部分:

s = df['code'].apply(set)

df['dropped'] = s.shift(fill_value=set()) - s

输出:

   accuracy          code dropped
0     0.773          (D,)      {}
1     0.841        (D, F)      {}
2     0.862     (B, D, F)      {}
3     0.874     (B, F, K)     {D}
4     0.883  (B, F, I, K)      {}
5     0.913     (F, I, K)     {B}

如果您坚持要按照这种格式(并且每行最多只有一个被删除的项目):

s = df['code'].apply(set)

df['dropped'] = (s.shift(fill_value=set()).sub(s)
                  .apply(list).str[0].fillna('-')
                )

输出:

   accuracy          code dropped
0     0.773          (D,)       -
1     0.841        (D, F)       -
2     0.862     (B, D, F)       -
3     0.874     (B, F, K)       D
4     0.883  (B, F, I, K)       -
5     0.913     (F, I, K)       B
英文:

It's very easy if you use sets and shift:

s = df[&#39;code&#39;].apply(set)

df[&#39;dropped&#39;] = s.shift(fill_value=set())-s

Output:

   accuracy          code dropped
0     0.773          (D,)      {}
1     0.841        (D, F)      {}
2     0.862     (B, D, F)      {}
3     0.874     (B, F, K)     {D}
4     0.883  (B, F, I, K)      {}
5     0.913     (F, I, K)     {B}

If you insist on the format (and have at most one dropped item per row):

s = df[&#39;code&#39;].apply(set)

df[&#39;dropped&#39;] = (s.shift(fill_value=set()).sub(s)
                  .apply(list).str[0].fillna(&#39;-&#39;)
                )

Output:

   accuracy          code dropped
0     0.773          (D,)       -
1     0.841        (D, F)       -
2     0.862     (B, D, F)       -
3     0.874     (B, F, K)       D
4     0.883  (B, F, I, K)       -
5     0.913     (F, I, K)       B

huangapple
  • 本文由 发表于 2023年6月22日 01:40:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525891.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定