2023年7月10日 20:43:46go评论99阅读模式

英文:

Finding List Index of Values in Dataframe Column

问题

在给定的数据框中，如何创建一个名为"MemWeight"的新列，其中包含"Weighting"列中每个值在"mem_list"中的索引位置？

data = {'MemRef': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'MemName': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'Weighting': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)
mem_list = [1.96, 1.97, 1.98, 1.99, 2]
以下方法不起作用并返回下面的错误：
```python
df["MemWeight"] = mem_list.index(df["Weighting"])
Traceback (most recent call last):
  File "E:/Documents/PycharmProjects/test.py", line 270, in <module>
    df["MemWeight"] = mem_list.index(df["Weighting"])
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
错误中提供的建议都不起作用，它们会产生其他各种错误。

要解决这个问题，您可以使用apply方法和一个自定义的函数来为"MemWeight"列赋值，如下所示：

df["MemWeight"] = df["Weighting"].apply(lambda x: mem_list.index(x) if x in mem_list else -1)

这将为"MemWeight"列中的每个值赋上对应的索引位置，如果值不在"mem_list"中，则赋值为-1。

英文:

Given the following dataframe how do I create a new column called "MemWeight" containing the index position in "mem_list" of each value in the Weighting column?

data = {&#39;MemRef&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], &#39;MemName&#39;: [&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;], &#39;Weighting&#39;: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)
mem_list = [1.96, 1.97, 1.98, 1.99, 2]

The following does not work and returns the error below:

df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])
Traceback (most recent call last):
  File &quot;E:/Documents/PycharmProjects/test.py&quot;, line 270, in &lt;module&gt;
    df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])
  File &quot;C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py&quot;, line 1538, in __nonzero__
    f&quot;The truth value of a {type(self).__name__} is ambiguous. &quot;
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

None of the suggestions in the error work. They give a miriad of other errors.

答案1

得分: 1

可以使用 [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate)、字典推导式和 [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html)：
```python
df['MemWeight'] = df['Weighting'].map({k: v for v, k in enumerate(mem_list)})

或者使用 merge（创建一个新的 DataFrame）：

out = df.merge(pd.DataFrame({'Weighting': mem_list,
                             'MemWeight': range(len(mem_list)),
                            }), how='left')

注意，你也可以使用 df['MemWeight'] = df['Weighting'].apply(mem_list.index)，但这会效率低下（对所有行再次搜索列表）。

输出：

    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4


<details>
<summary>英文:</summary>
You can use [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate), a dictionary comprehension, and [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):

df['MemWeight'] = df['Weighting'].map({k: v for v,k in enumerate(mem_list)})

Or [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) (creating a new DataFrame):

out = df.merge(pd.DataFrame({'Weighting': mem_list,
'MemWeight': range(len(mem_list)),
}), how='left')

*Note that you could also use `df[&#39;MemWeight&#39;] = df[&#39;Weighting&#39;].apply(mem_list.index)` but that would be inefficient (searching again the list for all rows).*
Output:

MemRef MemName  Weighting  MemWeight

0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4


</details>
# 答案2
**得分**: 1
你可以使用 [`pd.cut`][1]，因为你的 `mem_list` 已经排序：
```python
df['MemWeight'] = pd.cut(df['Weighting'], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)
# 输出结果
    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4

或者使用 np.searchsorted：

df['MemWeight'] = np.searchsorted(mem_list, df['Weighting'], side='left')

英文:

You can use pd.cut since your mem_list is sorted:

df[&#39;MemWeight&#39;] = pd.cut(df[&#39;Weighting&#39;], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)
# Output
    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4

Or with np.searchsorted:

df[&#39;MemWeight&#39;] = np.searchsorted(mem_list, df[&#39;Weighting&#39;], side=&#39;left&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框列中查找值的列表索引。

问题

答案1

如何在OpenCV Python中将HoughlinesP坐标合并为一条线？

在Pandas数据框中计算单词频率花费的时间太长。

避免随机变量的对象重复实例

在Render部署Django应用后遇到了问题。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。