英文:
pandas: add column whose value is available in previous row but not in current, of another column
问题
假设这是我的 `df`:
```python
{'accuracy': [0.773, 0.841, 0.862, 0.874, 0.883, 0.913],
'code': [('D',),('D', 'F'),('B', 'D', 'F'),
('B', 'F', 'K'), ('B', 'F', 'I', 'K'),
('F', 'I', 'K')]}
df
accuracy code
0 0.773 (D,)
1 0.841 (D, F)
2 0.862 (B, D, F)
3 0.874 (B, F, K)
4 0.883 (B, F, I, K)
5 0.913 (F, I, K)
我想添加一列 dropped
,其值是在当前行中前一行的 code
中不存在的项目。
期望结果:
accuracy code dropped
0 0.773 (D,) -
1 0.841 (D, F) -
2 0.862 (B, D, F) -
3 0.874 (B, F, K) D
4 0.883 (B, F, I, K) -
5 0.913 (F, I, K) B
<details>
<summary>英文:</summary>
Suppose this is my `df`:
```python
{'accuracy': [0.773, 0.841, 0.862, 0.874, 0.883, 0.913],
'code': [('D',),('D', 'F'),('B', 'D', 'F'),
('B', 'F', 'K'), ('B', 'F', 'I', 'K'),
('F', 'I', 'K')]}
df
accuracy code
0 0.773 (D,)
1 0.841 (D, F)
2 0.862 (B, D, F)
3 0.874 (B, F, K)
4 0.883 (B, F, I, K)
5 0.913 (F, I, K)
I would like to add a column dropped
whose value is the item in code
in previous row is not available in the current row.
Expected:
accuracy code dropped
0 0.773 (D,) -
1 0.841 (D, F) -
2 0.862 (B, D, F) -
3 0.874 (B, F, K) D
4 0.883 (B, F, I, K) -
5 0.913 (F, I, K) B
答案1
得分: 4
以下是代码的翻译部分:
s = df['code'].apply(set)
df['dropped'] = s.shift(fill_value=set()) - s
输出:
accuracy code dropped
0 0.773 (D,) {}
1 0.841 (D, F) {}
2 0.862 (B, D, F) {}
3 0.874 (B, F, K) {D}
4 0.883 (B, F, I, K) {}
5 0.913 (F, I, K) {B}
如果您坚持要按照这种格式(并且每行最多只有一个被删除的项目):
s = df['code'].apply(set)
df['dropped'] = (s.shift(fill_value=set()).sub(s)
.apply(list).str[0].fillna('-')
)
输出:
accuracy code dropped
0 0.773 (D,) -
1 0.841 (D, F) -
2 0.862 (B, D, F) -
3 0.874 (B, F, K) D
4 0.883 (B, F, I, K) -
5 0.913 (F, I, K) B
英文:
It's very easy if you use sets
and shift
:
s = df['code'].apply(set)
df['dropped'] = s.shift(fill_value=set())-s
Output:
accuracy code dropped
0 0.773 (D,) {}
1 0.841 (D, F) {}
2 0.862 (B, D, F) {}
3 0.874 (B, F, K) {D}
4 0.883 (B, F, I, K) {}
5 0.913 (F, I, K) {B}
If you insist on the format (and have at most one dropped item per row):
s = df['code'].apply(set)
df['dropped'] = (s.shift(fill_value=set()).sub(s)
.apply(list).str[0].fillna('-')
)
Output:
accuracy code dropped
0 0.773 (D,) -
1 0.841 (D, F) -
2 0.862 (B, D, F) -
3 0.874 (B, F, K) D
4 0.883 (B, F, I, K) -
5 0.913 (F, I, K) B
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论