英文:
Pandas VLOOKUP values that do not match
问题
以下是翻译的代码部分:
例如,我有两个表格:
`w_results.xlsx`:
[![在这里输入图片描述][1]][1]
项目 类型 得分
0 mashed potatoes 食物 30
1 丹尼尔 人员 99
2 纸张 物品 24
3 约翰 人员 195
4 灯 物品 5
5 杰西卡 人员 92
6 披萨 食物 42
7 肉 食物 2
8 雅各布 人员 83
`w_runnable.xlsx`:
[![在这里输入图片描述][2]][2]
项目 类型 得分
0 mashed potatoes 食物 30
1 约翰 人员 195
2 树莓 食物 50
3 苹果手机 物品 75
4 小船 物品 5
5 杰克 人员 25
6 披萨 食物 42
7 肉 食物 2
8 丹尼尔 人员 99
如何获取在`w_results`的“项目”列中出现但在`w_runnable`的“项目”列中未出现的所有值的表格?
我知道使用`pd.merge`可能是正确的方法,但它只输出匹配项。我尝试如下:
```python
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'
df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
left_join = pd.merge(df_results_mylist,
df_runnable_mylist,
on='项目',
how='left')
print(left_join)
输出:
项目 类型_x 得分 类型_y 得分
0 mashed potatoes 食物 30 食物 30.0
1 丹尼尔 人员 99 人员 99.0
2 纸张 物品 24 NaN NaN
3 约翰 人员 195 人员 195.0
4 灯 物品 5 NaN NaN
5 杰西卡 人员 92 NaN NaN
6 披萨 食物 42 食物 42.0
7 肉 食物 2 食物 2.0
8 雅各布 人员 83 NaN NaN
不匹配的值显示为NaN
(纸张,灯,杰西卡,雅各布)。
不知道是否有更简洁的方法来执行此操作,以便这些值出现在单独的表格中。感谢任何帮助。
期望的输出:
<details>
<summary>英文:</summary>
For example I have two tables:
`w_results.xlsx`:
[![enter image description here][1]][1]
Item Type Score
0 mashpotatoes food 30
1 daniel person 99
2 paper thing 24
3 john person 195
4 lamp thing 5
5 jessica person 92
6 pizza food 42
7 meat food 2
8 jacob person 83
`w_runnable.xlsx`:
[![enter image description here][2]][2]
Item Type Score
0 mashpotatoes food 30
1 john person 195
2 raspberry food 50
3 iphone thing 75
4 boat thing 5
5 jake person 25
6 pizza food 42
7 meat food 2
8 daniel person 99
How can I get a table of all values that appear on w_results 'Item' column but not in w_runnable "Item" column?
I know that using `pd.merge` is probably the way to go, but it just outputs the matches. My attempt below:
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'
df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
left_join = pd.merge(df_results_mylist,
df_runnable_mylist,
on = 'Item',
how = 'left')
print(left_join)
Output:
Item Type_x Score Type_y Score
0 mashpotatoes food 30 food 30.0
1 daniel person 99 person 99.0
2 paper thing 24 NaN NaN
3 john person 195 person 195.0
4 lamp thing 5 NaN NaN
5 jessica person 92 NaN NaN
6 pizza food 42 food 42.0
7 meat food 2 food 2.0
8 jacob person 83 NaN NaN
The values that do not match appear as `NaN` (paper, lamp, jessica, jacob).
Don't know if there's a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.
Desired output:
[![enter image description here][3]][3]
[1]: https://i.stack.imgur.com/xialb.png
[2]: https://i.stack.imgur.com/HETwK.png
[3]: https://i.stack.imgur.com/ptMJA.png
</details>
# 答案1
**得分**: 1
**代码**
将“Item”列和“Type”列转换为Series中的列表
```python
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)
结果:
0 [mashpotatoes, food]
1 [daniel, person]
2 [paper, thing]
3 [john, person]
4 [lamp, thing]
5 [jessica, person]
6 [pizza, food]
7 [meat, food]
8 [jacob, person]
dtype: object
使用isin
函数创建条件的类似方式
cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
条件:
0 True
1 True
2 False
3 True
4 False
5 False
6 True
7 True
8 False
dtype: bool
如果仅需要检查“Item”列的相等性,请使用以下代码创建条件,而不是上面的代码
cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
无论如何生成cond
,让我们通过~cond
进行布尔索引
df_results_mylist[~cond]
输出:
Item Type Score
2 paper thing 24
4 lamp thing 5
5 jessica person 92
8 jacob person 83
英文:
Code
make Item column and Type column to list in Series
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)
result:
0 [mashpotatoes, food]
1 [daniel, person]
2 [paper, thing]
3 [john, person]
4 [lamp, thing]
5 [jessica, person]
6 [pizza, food]
7 [meat, food]
8 [jacob, person]
dtype: object
create condition using isin
function in similar way
cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
cond :
0 True
1 True
2 False
3 True
4 False
5 False
6 True
7 True
8 False
dtype: bool
If it is necessary to check equality only in Item
column, create condition with code below instead of code above.
cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
<br>
no matter how cond was made, lets do boolean indexing by ~cond
df_results_mylist[~cond]
output:
Item Type Score
2 paper thing 24
4 lamp thing 5
5 jessica person 92
8 jacob person 83
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论