英文:
Pandas VLOOKUP values that do not match
问题
以下是翻译的代码部分:
例如,我有两个表格:
`w_results.xlsx`:
[![在这里输入图片描述][1]][1]
               项目      类型  得分
    0  mashed potatoes    食物     30
    1           丹尼尔   人员     99
    2            纸张   物品     24
    3            约翰   人员    195
    4            灯    物品      5
    5         杰西卡   人员     92
    6           披萨     食物     42
    7            肉     食物      2
    8          雅各布   人员     83
`w_runnable.xlsx`:
[![在这里输入图片描述][2]][2]    
               项目      类型  得分    
    0  mashed potatoes    食物         30
    1            约翰    人员        195
    2           树莓    食物         50
    3           苹果手机  物品         75
    4           小船     物品          5
    5           杰克    人员         25
    6           披萨     食物         42
    7            肉     食物          2
    8           丹尼尔   人员         99
如何获取在`w_results`的“项目”列中出现但在`w_runnable`的“项目”列中未出现的所有值的表格?
我知道使用`pd.merge`可能是正确的方法,但它只输出匹配项。我尝试如下:
```python
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'
df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
left_join = pd.merge(df_results_mylist,
                     df_runnable_mylist,
                     on='项目',
                     how='left')
print(left_join)
输出:
      项目    类型_x  得分  类型_y  得分    
0  mashed potatoes    食物     30    食物       30.0
1        丹尼尔  人员     99  人员       99.0
2         纸张   物品     24     NaN        NaN
3          约翰  人员    195  人员      195.0
4          灯    物品      5     NaN        NaN
5       杰西卡  人员     92     NaN        NaN
6         披萨    食物     42    食物       42.0
7          肉     食物      2    食物        2.0
8         雅各布  人员     83     NaN        NaN
不匹配的值显示为NaN(纸张,灯,杰西卡,雅各布)。
不知道是否有更简洁的方法来执行此操作,以便这些值出现在单独的表格中。感谢任何帮助。
期望的输出:
<details>
<summary>英文:</summary>
For example I have two tables:
`w_results.xlsx`:
[![enter image description here][1]][1]
               Item    Type  Score
    0  mashpotatoes    food     30
    1        daniel  person     99
    2         paper   thing     24
    3          john  person    195
    4          lamp   thing      5
    5       jessica  person     92
    6         pizza    food     42
    7          meat    food      2
    8         jacob  person     83
`w_runnable.xlsx`:
[![enter image description here][2]][2]    
               Item    Type  Score    
    0  mashpotatoes    food         30
    1          john  person        195
    2     raspberry    food         50
    3        iphone   thing         75
    4         boat    thing          5
    5          jake  person         25
    6         pizza    food         42
    7          meat    food          2
    8        daniel  person         99
How can I get a table of all values that appear on w_results 'Item' column but not in w_runnable "Item" column? 
I know that using `pd.merge` is probably the way to go, but it just outputs the matches. My attempt below:
    import pandas as pd
    w_results = 'w_results.xlsx'
    w_runnable = 'w_runnable.xlsx'
    
    df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
    df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
    left_join = pd.merge(df_results_mylist,
                         df_runnable_mylist,
                         on = 'Item',
                         how = 'left')
    print(left_join)
Output:
          Item  Type_x  Score  Type_y  Score    
    0  mashpotatoes    food     30    food       30.0
    1        daniel  person     99  person       99.0
    2         paper   thing     24     NaN        NaN
    3          john  person    195  person      195.0
    4          lamp   thing      5     NaN        NaN
    5       jessica  person     92     NaN        NaN
    6         pizza    food     42    food       42.0
    7          meat    food      2    food        2.0
    8         jacob  person     83     NaN        NaN
The values that do not match appear as `NaN` (paper, lamp, jessica, jacob). 
Don't know if there's a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.
Desired output:
[![enter image description here][3]][3]
  [1]: https://i.stack.imgur.com/xialb.png
  [2]: https://i.stack.imgur.com/HETwK.png
  [3]: https://i.stack.imgur.com/ptMJA.png
</details>
# 答案1
**得分**: 1
**代码**
将“Item”列和“Type”列转换为Series中的列表
```python
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)
结果:
0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object
使用isin函数创建条件的类似方式
cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
条件:
0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool
如果仅需要检查“Item”列的相等性,请使用以下代码创建条件,而不是上面的代码
cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
无论如何生成cond,让我们通过~cond进行布尔索引
df_results_mylist[~cond]
输出:
    Item     Type  Score
2  paper    thing     24
4   lamp    thing      5
5 jessica  person     92
8  jacob   person     83
英文:
Code
make Item column and Type column to list in Series
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)
result:
0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object
create condition using isin function in similar way
cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
cond :
0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool
If it is necessary to check equality only in Item column, create  condition with code below instead of code above.
cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
<br>
no matter how cond was made, lets do boolean indexing by ~cond
df_results_mylist[~cond]
output:
    Item	Type	Score
2	paper	thing	24
4	lamp	thing	5
5	jessica	person	92
8	jacob	person	83
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论