2023年7月6日 10:58:49go评论74阅读模式

英文:

Pandas VLOOKUP values that do not match

问题

以下是翻译的代码部分：

例如，我有两个表格：
`w_results.xlsx`：
[![在这里输入图片描述][1]][1]
               项目      类型  得分
    0  mashed potatoes    食物     30
    1           丹尼尔   人员     99
    2            纸张   物品     24
    3            约翰   人员    195
    4            灯    物品      5
    5         杰西卡   人员     92
    6           披萨     食物     42
    7            肉     食物      2
    8          雅各布   人员     83
`w_runnable.xlsx`：
[![在这里输入图片描述][2]][2]    
               项目      类型  得分    
    0  mashed potatoes    食物         30
    1            约翰    人员        195
    2           树莓    食物         50
    3           苹果手机  物品         75
    4           小船     物品          5
    5           杰克    人员         25
    6           披萨     食物         42
    7            肉     食物          2
    8           丹尼尔   人员         99
如何获取在`w_results`的“项目”列中出现但在`w_runnable`的“项目”列中未出现的所有值的表格？
我知道使用`pd.merge`可能是正确的方法，但它只输出匹配项。我尝试如下：
```python
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'
df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
left_join = pd.merge(df_results_mylist,
                     df_runnable_mylist,
                     on='项目',
                     how='left')
print(left_join)

输出：

      项目    类型_x  得分  类型_y  得分    
0  mashed potatoes    食物     30    食物       30.0
1        丹尼尔  人员     99  人员       99.0
2         纸张   物品     24     NaN        NaN
3          约翰  人员    195  人员      195.0
4          灯    物品      5     NaN        NaN
5       杰西卡  人员     92     NaN        NaN
6         披萨    食物     42    食物       42.0
7          肉     食物      2    食物        2.0
8         雅各布  人员     83     NaN        NaN

不匹配的值显示为NaN（纸张，灯，杰西卡，雅各布）。
不知道是否有更简洁的方法来执行此操作，以便这些值出现在单独的表格中。感谢任何帮助。

期望的输出：


<details>
<summary>英文:</summary>
For example I have two tables:
`w_results.xlsx`:
[![enter image description here][1]][1]
               Item    Type  Score
    0  mashpotatoes    food     30
    1        daniel  person     99
    2         paper   thing     24
    3          john  person    195
    4          lamp   thing      5
    5       jessica  person     92
    6         pizza    food     42
    7          meat    food      2
    8         jacob  person     83
`w_runnable.xlsx`:
[![enter image description here][2]][2]    
               Item    Type  Score    
    0  mashpotatoes    food         30
    1          john  person        195
    2     raspberry    food         50
    3        iphone   thing         75
    4         boat    thing          5
    5          jake  person         25
    6         pizza    food         42
    7          meat    food          2
    8        daniel  person         99
How can I get a table of all values that appear on w_results &#39;Item&#39; column but not in w_runnable &quot;Item&quot; column? 
I know that using `pd.merge` is probably the way to go, but it just outputs the matches. My attempt below:
    import pandas as pd
    w_results = &#39;w_results.xlsx&#39;
    w_runnable = &#39;w_runnable.xlsx&#39;
    
    df_results_mylist = pd.read_excel(w_results, sheet_name=&#39;my_list&#39;)
    df_runnable_mylist = pd.read_excel(w_runnable, sheet_name=&#39;my_list&#39;)
    left_join = pd.merge(df_results_mylist,
                         df_runnable_mylist,
                         on = &#39;Item&#39;,
                         how = &#39;left&#39;)
    print(left_join)
Output:
          Item  Type_x  Score  Type_y  Score    
    0  mashpotatoes    food     30    food       30.0
    1        daniel  person     99  person       99.0
    2         paper   thing     24     NaN        NaN
    3          john  person    195  person      195.0
    4          lamp   thing      5     NaN        NaN
    5       jessica  person     92     NaN        NaN
    6         pizza    food     42    food       42.0
    7          meat    food      2    food        2.0
    8         jacob  person     83     NaN        NaN
The values that do not match appear as `NaN` (paper, lamp, jessica, jacob). 
Don&#39;t know if there&#39;s a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.
Desired output:
[![enter image description here][3]][3]
  [1]: https://i.stack.imgur.com/xialb.png
  [2]: https://i.stack.imgur.com/HETwK.png
  [3]: https://i.stack.imgur.com/ptMJA.png
</details>
# 答案1
**得分**: 1
**代码**
将“Item”列和“Type”列转换为Series中的列表
```python
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)

结果:

0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object

使用isin函数创建条件的类似方式

cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))

条件：

0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool

如果仅需要检查“Item”列的相等性，请使用以下代码创建条件，而不是上面的代码

cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())

无论如何生成cond，让我们通过~cond进行布尔索引

df_results_mylist[~cond]

输出：

    Item     Type  Score
2  paper    thing     24
4   lamp    thing      5
5 jessica  person     92
8  jacob   person     83

英文:

Code

make Item column and Type column to list in Series

cols = [&#39;Item&#39;, &#39;Type&#39;]
df_results_mylist[cols].agg(list, axis=1)

result:

0    [mashpotatoes, food]
1        [daniel, person]
2          [paper, thing]
3          [john, person]
4           [lamp, thing]
5       [jessica, person]
6           [pizza, food]
7            [meat, food]
8         [jacob, person]
dtype: object

create condition using isin function in similar way

cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))

cond :

0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False
dtype: bool

If it is necessary to check equality only in Item column, create condition with code below instead of code above.

cond = df_results_mylist[&#39;Item&#39;].isin(df_runnable_mylist[&#39;Item&#39;].unique())

<br>

no matter how cond was made, lets do boolean indexing by ~cond

df_results_mylist[~cond]

output:

    Item	Type	Score
2	paper	thing	24
4	lamp	thing	5
5	jessica	person	92
8	jacob	person	83

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas VLOOKUP 未匹配的值

问题

在Excel表格中，使用多个数组来相乘不同的范围。

在Pandas中计算两个日期之间的有效天数。

在另一列的指定组中查找存在重复项的行。

Pandas多值上的日期范围合并

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。