英文:
Finding List Index of Values in Dataframe Column
问题
在给定的数据框中,如何创建一个名为"MemWeight"的新列,其中包含"Weighting"列中每个值在"mem_list"中的索引位置?
data = {'MemRef': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'MemName': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'Weighting': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)
mem_list = [1.96, 1.97, 1.98, 1.99, 2]
以下方法不起作用并返回下面的错误:
```python
df["MemWeight"] = mem_list.index(df["Weighting"])
Traceback (most recent call last):
File "E:/Documents/PycharmProjects/test.py", line 270, in <module>
df["MemWeight"] = mem_list.index(df["Weighting"])
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
错误中提供的建议都不起作用,它们会产生其他各种错误。
要解决这个问题,您可以使用apply
方法和一个自定义的函数来为"MemWeight"列赋值,如下所示:
df["MemWeight"] = df["Weighting"].apply(lambda x: mem_list.index(x) if x in mem_list else -1)
这将为"MemWeight"列中的每个值赋上对应的索引位置,如果值不在"mem_list"中,则赋值为-1。
英文:
Given the following dataframe how do I create a new column called "MemWeight" containing the index position in "mem_list" of each value in the Weighting column?
data = {'MemRef': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'MemName': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'Weighting': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)
mem_list = [1.96, 1.97, 1.98, 1.99, 2]
The following does not work and returns the error below:
df["MemWeight"] = mem_list.index(df["Weighting"])
Traceback (most recent call last):
File "E:/Documents/PycharmProjects/test.py", line 270, in <module>
df["MemWeight"] = mem_list.index(df["Weighting"])
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
None of the suggestions in the error work. They give a miriad of other errors.
答案1
得分: 1
可以使用 [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate)、字典推导式和 [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):
```python
df['MemWeight'] = df['Weighting'].map({k: v for v, k in enumerate(mem_list)})
或者使用 merge
(创建一个新的 DataFrame):
out = df.merge(pd.DataFrame({'Weighting': mem_list,
'MemWeight': range(len(mem_list)),
}), how='left')
注意,你也可以使用 df['MemWeight'] = df['Weighting'].apply(mem_list.index)
,但这会效率低下(对所有行再次搜索列表)。
输出:
MemRef MemName Weighting MemWeight
0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4
<details>
<summary>英文:</summary>
You can use [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate), a dictionary comprehension, and [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):
df['MemWeight'] = df['Weighting'].map({k: v for v,k in enumerate(mem_list)})
Or [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) (creating a new DataFrame):
out = df.merge(pd.DataFrame({'Weighting': mem_list,
'MemWeight': range(len(mem_list)),
}), how='left')
*Note that you could also use `df['MemWeight'] = df['Weighting'].apply(mem_list.index)` but that would be inefficient (searching again the list for all rows).*
Output:
MemRef MemName Weighting MemWeight
0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4
</details>
# 答案2
**得分**: 1
你可以使用 [`pd.cut`][1],因为你的 `mem_list` 已经排序:
```python
df['MemWeight'] = pd.cut(df['Weighting'], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)
# 输出结果
MemRef MemName Weighting MemWeight
0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4
或者使用 np.searchsorted
:
df['MemWeight'] = np.searchsorted(mem_list, df['Weighting'], side='left')
英文:
You can use pd.cut
since your mem_list
is sorted:
df['MemWeight'] = pd.cut(df['Weighting'], bins=[-np.inf, *mem_list, np.inf], labels=False)
print(df)
# Output
MemRef MemName Weighting MemWeight
0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4
Or with np.searchsorted
:
df['MemWeight'] = np.searchsorted(mem_list, df['Weighting'], side='left')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论