在数据框列中查找值的列表索引。

huangapple go评论61阅读模式
英文:

Finding List Index of Values in Dataframe Column

问题

在给定的数据框中,如何创建一个名为"MemWeight"的新列,其中包含"Weighting"列中每个值在"mem_list"中的索引位置?

data = {'MemRef': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'MemName': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'Weighting': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)

mem_list = [1.96, 1.97, 1.98, 1.99, 2]

以下方法不起作用并返回下面的错误

```python
df["MemWeight"] = mem_list.index(df["Weighting"])

Traceback (most recent call last):
  File "E:/Documents/PycharmProjects/test.py", line 270, in <module>
    df["MemWeight"] = mem_list.index(df["Weighting"])
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

错误中提供的建议都不起作用它们会产生其他各种错误

要解决这个问题,您可以使用apply方法和一个自定义的函数来为"MemWeight"列赋值,如下所示:

df["MemWeight"] = df["Weighting"].apply(lambda x: mem_list.index(x) if x in mem_list else -1)

这将为"MemWeight"列中的每个值赋上对应的索引位置,如果值不在"mem_list"中,则赋值为-1。

英文:

Given the following dataframe how do I create a new column called "MemWeight" containing the index position in "mem_list" of each value in the Weighting column?

data = {&#39;MemRef&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], &#39;MemName&#39;: [&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;], &#39;Weighting&#39;: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)

mem_list = [1.96, 1.97, 1.98, 1.99, 2]

The following does not work and returns the error below:

df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])

Traceback (most recent call last):
  File &quot;E:/Documents/PycharmProjects/test.py&quot;, line 270, in &lt;module&gt;
    df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])
  File &quot;C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py&quot;, line 1538, in __nonzero__
    f&quot;The truth value of a {type(self).__name__} is ambiguous. &quot;
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

None of the suggestions in the error work. They give a miriad of other errors.

答案1

得分: 1

可以使用 [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate)、字典推导式和 [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):

```python
df['MemWeight'] = df['Weighting'].map({k: v for v, k in enumerate(mem_list)})

或者使用 merge(创建一个新的 DataFrame):

out = df.merge(pd.DataFrame({'Weighting': mem_list,
                             'MemWeight': range(len(mem_list)),
                            }), how='left')

注意,你也可以使用 df['MemWeight'] = df['Weighting'].apply(mem_list.index),但这会效率低下(对所有行再次搜索列表)。

输出:

    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4

<details>
<summary>英文:</summary>

You can use [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate), a dictionary comprehension, and [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):

df['MemWeight'] = df['Weighting'].map({k: v for v,k in enumerate(mem_list)})

Or [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) (creating a new DataFrame):

out = df.merge(pd.DataFrame({'Weighting': mem_list,
'MemWeight': range(len(mem_list)),
}), how='left')

*Note that you could also use `df[&#39;MemWeight&#39;] = df[&#39;Weighting&#39;].apply(mem_list.index)` but that would be inefficient (searching again the list for all rows).*

Output:
MemRef MemName  Weighting  MemWeight

0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4


</details>



# 答案2
**得分**: 1

你可以使用 [`pd.cut`][1],因为你的 `mem_list` 已经排序:

```python
df['MemWeight'] = pd.cut(df['Weighting'], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)

# 输出结果
    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4

或者使用 np.searchsorted

df['MemWeight'] = np.searchsorted(mem_list, df['Weighting'], side='left')
英文:

You can use pd.cut since your mem_list is sorted:

df[&#39;MemWeight&#39;] = pd.cut(df[&#39;Weighting&#39;], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)

# Output
    MemRef MemName  Weighting  MemWeight
0        1       a       2.00          4
1        2       a       2.00          4
2        3       a       2.00          4
3        4       a       2.00          4
4        5       a       2.00          4
5        6       a       2.00          4
6        7       a       2.00          4
7        8       a       2.00          4
8        9       a       2.00          4
9       10       a       2.00          4
10      11       a       2.00          4
11      12       a       1.97          1
12      13       a       2.00          4
13      14       a       2.00          4
14      15       a       2.00          4
15      16       a       2.00          4
16      17       a       2.00          4
17      18       a       2.00          4
18      19       a       2.00          4
19      20       a       2.00          4

Or with np.searchsorted:

df[&#39;MemWeight&#39;] = np.searchsorted(mem_list, df[&#39;Weighting&#39;], side=&#39;left&#39;)

huangapple
  • 本文由 发表于 2023年7月10日 20:43:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定