Pandas VLOOKUP 未匹配的值

huangapple go评论53阅读模式
英文:

Pandas VLOOKUP values that do not match

问题

以下是翻译的代码部分:

例如我有两个表格

`w_results.xlsx`:

[![在这里输入图片描述][1]][1]

               项目      类型  得分
    0  mashed potatoes    食物     30
    1           丹尼尔   人员     99
    2            纸张   物品     24
    3            约翰   人员    195
    4            灯    物品      5
    5         杰西卡   人员     92
    6           披萨     食物     42
    7            肉     食物      2
    8          雅各布   人员     83

`w_runnable.xlsx`:

[![在这里输入图片描述][2]][2]    

               项目      类型  得分    
    0  mashed potatoes    食物         30
    1            约翰    人员        195
    2           树莓    食物         50
    3           苹果手机  物品         75
    4           小船     物品          5
    5           杰克    人员         25
    6           披萨     食物         42
    7            肉     食物          2
    8           丹尼尔   人员         99

如何获取在`w_results`项目列中出现但在`w_runnable`项目列中未出现的所有值的表格
我知道使用`pd.merge`可能是正确的方法但它只输出匹配项我尝试如下

```python
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'

df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')

left_join = pd.merge(df_results_mylist,
                     df_runnable_mylist,
                     on='项目',
                     how='left')

print(left_join)

输出:

      项目    类型_x  得分  类型_y  得分    
0  mashed potatoes    食物     30    食物       30.0
1        丹尼尔  人员     99  人员       99.0
2         纸张   物品     24     NaN        NaN
3          约翰  人员    195  人员      195.0
4          灯    物品      5     NaN        NaN
5       杰西卡  人员     92     NaN        NaN
6         披萨    食物     42    食物       42.0
7          肉     食物      2    食物        2.0
8         雅各布  人员     83     NaN        NaN

不匹配的值显示为NaN(纸张,灯,杰西卡,雅各布)。
不知道是否有更简洁的方法来执行此操作,以便这些值出现在单独的表格中。感谢任何帮助。

期望的输出:

Pandas VLOOKUP 未匹配的值


<details>
<summary>英文:</summary>

For example I have two tables:

`w_results.xlsx`:

[![enter image description here][1]][1]

               Item    Type  Score
    0  mashpotatoes    food     30
    1        daniel  person     99
    2         paper   thing     24
    3          john  person    195
    4          lamp   thing      5
    5       jessica  person     92
    6         pizza    food     42
    7          meat    food      2
    8         jacob  person     83

`w_runnable.xlsx`:

[![enter image description here][2]][2]    

               Item    Type  Score    
    0  mashpotatoes    food         30
    1          john  person        195
    2     raspberry    food         50
    3        iphone   thing         75
    4         boat    thing          5
    5          jake  person         25
    6         pizza    food         42
    7          meat    food          2
    8        daniel  person         99
How can I get a table of all values that appear on w_results &#39;Item&#39; column but not in w_runnable &quot;Item&quot; column? 
I know that using `pd.merge` is probably the way to go, but it just outputs the matches. My attempt below:

    import pandas as pd
    w_results = &#39;w_results.xlsx&#39;
    w_runnable = &#39;w_runnable.xlsx&#39;
    
    df_results_mylist = pd.read_excel(w_results, sheet_name=&#39;my_list&#39;)
    df_runnable_mylist = pd.read_excel(w_runnable, sheet_name=&#39;my_list&#39;)

    left_join = pd.merge(df_results_mylist,
                         df_runnable_mylist,
                         on = &#39;Item&#39;,
                         how = &#39;left&#39;)

    print(left_join)


Output:

          Item  Type_x  Score  Type_y  Score    
    0  mashpotatoes    food     30    food       30.0
    1        daniel  person     99  person       99.0
    2         paper   thing     24     NaN        NaN
    3          john  person    195  person      195.0
    4          lamp   thing      5     NaN        NaN
    5       jessica  person     92     NaN        NaN
    6         pizza    food     42    food       42.0
    7          meat    food      2    food        2.0
    8         jacob  person     83     NaN        NaN

The values that do not match appear as `NaN` (paper, lamp, jessica, jacob). 
Don&#39;t know if there&#39;s a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.

Desired output:

[![enter image description here][3]][3]


  [1]: https://i.stack.imgur.com/xialb.png
  [2]: https://i.stack.imgur.com/HETwK.png
  [3]: https://i.stack.imgur.com/ptMJA.png

</details>


# 答案1
**得分**: 1

**代码**

将“Item”列和“Type”列转换为Series中的列表

```python
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)

结果:

0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object

使用isin函数创建条件的类似方式

cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))

条件:

0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool

如果仅需要检查“Item”列的相等性,请使用以下代码创建条件,而不是上面的代码

cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())

无论如何生成cond,让我们通过~cond进行布尔索引

df_results_mylist[~cond]

输出:

    Item     Type  Score
2  paper    thing     24
4   lamp    thing      5
5 jessica  person     92
8  jacob   person     83
英文:

Code

make Item column and Type column to list in Series

cols = [&#39;Item&#39;, &#39;Type&#39;]
df_results_mylist[cols].agg(list, axis=1)

result:

0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object

create condition using isin function in similar way

cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))

cond :

0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool

If it is necessary to check equality only in Item column, create condition with code below instead of code above.

cond = df_results_mylist[&#39;Item&#39;].isin(df_runnable_mylist[&#39;Item&#39;].unique())

<br>


no matter how cond was made, lets do boolean indexing by ~cond

df_results_mylist[~cond]

output:

    Item	Type	Score
2	paper	thing	24
4	lamp	thing	5
5	jessica	person	92
8	jacob	person	83

huangapple
  • 本文由 发表于 2023年7月6日 10:58:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76625215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定