在数据框列中查找值的列表索引。

huangapple go评论99阅读模式
英文:

Finding List Index of Values in Dataframe Column

问题

在给定的数据框中,如何创建一个名为"MemWeight"的新列,其中包含"Weighting"列中每个值在"mem_list"中的索引位置?

  1. data = {'MemRef': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'MemName': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'Weighting': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
  2. df = pd.DataFrame.from_dict(data)
  3. mem_list = [1.96, 1.97, 1.98, 1.99, 2]
  4. 以下方法不起作用并返回下面的错误
  5. ```python
  6. df["MemWeight"] = mem_list.index(df["Weighting"])
  7. Traceback (most recent call last):
  8. File "E:/Documents/PycharmProjects/test.py", line 270, in <module>
  9. df["MemWeight"] = mem_list.index(df["Weighting"])
  10. File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
  11. f"The truth value of a {type(self).__name__} is ambiguous. "
  12. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
  13. 错误中提供的建议都不起作用它们会产生其他各种错误

要解决这个问题,您可以使用apply方法和一个自定义的函数来为"MemWeight"列赋值,如下所示:

  1. df["MemWeight"] = df["Weighting"].apply(lambda x: mem_list.index(x) if x in mem_list else -1)

这将为"MemWeight"列中的每个值赋上对应的索引位置,如果值不在"mem_list"中,则赋值为-1。

英文:

Given the following dataframe how do I create a new column called "MemWeight" containing the index position in "mem_list" of each value in the Weighting column?

  1. data = {&#39;MemRef&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], &#39;MemName&#39;: [&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;], &#39;Weighting&#39;: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1.97, 2, 2, 2, 2, 2, 2, 2, 2]}
  2. df = pd.DataFrame.from_dict(data)
  3. mem_list = [1.96, 1.97, 1.98, 1.99, 2]

The following does not work and returns the error below:

  1. df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])
  2. Traceback (most recent call last):
  3. File &quot;E:/Documents/PycharmProjects/test.py&quot;, line 270, in &lt;module&gt;
  4. df[&quot;MemWeight&quot;] = mem_list.index(df[&quot;Weighting&quot;])
  5. File &quot;C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py&quot;, line 1538, in __nonzero__
  6. f&quot;The truth value of a {type(self).__name__} is ambiguous. &quot;
  7. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

None of the suggestions in the error work. They give a miriad of other errors.

答案1

得分: 1

  1. 可以使用 [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate)、字典推导式和 [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):
  2. ```python
  3. df['MemWeight'] = df['Weighting'].map({k: v for v, k in enumerate(mem_list)})

或者使用 merge(创建一个新的 DataFrame):

  1. out = df.merge(pd.DataFrame({'Weighting': mem_list,
  2. 'MemWeight': range(len(mem_list)),
  3. }), how='left')

注意,你也可以使用 df['MemWeight'] = df['Weighting'].apply(mem_list.index),但这会效率低下(对所有行再次搜索列表)。

输出:

  1. MemRef MemName Weighting MemWeight
  2. 0 1 a 2.00 4
  3. 1 2 a 2.00 4
  4. 2 3 a 2.00 4
  5. 3 4 a 2.00 4
  6. 4 5 a 2.00 4
  7. 5 6 a 2.00 4
  8. 6 7 a 2.00 4
  9. 7 8 a 2.00 4
  10. 8 9 a 2.00 4
  11. 9 10 a 2.00 4
  12. 10 11 a 2.00 4
  13. 11 12 a 1.97 1
  14. 12 13 a 2.00 4
  15. 13 14 a 2.00 4
  16. 14 15 a 2.00 4
  17. 15 16 a 2.00 4
  18. 16 17 a 2.00 4
  19. 17 18 a 2.00 4
  20. 18 19 a 2.00 4
  21. 19 20 a 2.00 4
  1. <details>
  2. <summary>英文:</summary>
  3. You can use [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate), a dictionary comprehension, and [`map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html):

df['MemWeight'] = df['Weighting'].map({k: v for v,k in enumerate(mem_list)})

  1. Or [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) (creating a new DataFrame):

out = df.merge(pd.DataFrame({'Weighting': mem_list,
'MemWeight': range(len(mem_list)),
}), how='left')

  1. *Note that you could also use `df[&#39;MemWeight&#39;] = df[&#39;Weighting&#39;].apply(mem_list.index)` but that would be inefficient (searching again the list for all rows).*
  2. Output:
  1. MemRef MemName Weighting MemWeight

0 1 a 2.00 4
1 2 a 2.00 4
2 3 a 2.00 4
3 4 a 2.00 4
4 5 a 2.00 4
5 6 a 2.00 4
6 7 a 2.00 4
7 8 a 2.00 4
8 9 a 2.00 4
9 10 a 2.00 4
10 11 a 2.00 4
11 12 a 1.97 1
12 13 a 2.00 4
13 14 a 2.00 4
14 15 a 2.00 4
15 16 a 2.00 4
16 17 a 2.00 4
17 18 a 2.00 4
18 19 a 2.00 4
19 20 a 2.00 4

  1. </details>
  2. # 答案2
  3. **得分**: 1
  4. 你可以使用 [`pd.cut`][1],因为你的 `mem_list` 已经排序:
  5. ```python
  6. df['MemWeight'] = pd.cut(df['Weighting'], bins=[-np.inf, *mem_list, np.inf], labels=False)
  7. print(df)
  8. # 输出结果
  9. MemRef MemName Weighting MemWeight
  10. 0 1 a 2.00 4
  11. 1 2 a 2.00 4
  12. 2 3 a 2.00 4
  13. 3 4 a 2.00 4
  14. 4 5 a 2.00 4
  15. 5 6 a 2.00 4
  16. 6 7 a 2.00 4
  17. 7 8 a 2.00 4
  18. 8 9 a 2.00 4
  19. 9 10 a 2.00 4
  20. 10 11 a 2.00 4
  21. 11 12 a 1.97 1
  22. 12 13 a 2.00 4
  23. 13 14 a 2.00 4
  24. 14 15 a 2.00 4
  25. 15 16 a 2.00 4
  26. 16 17 a 2.00 4
  27. 17 18 a 2.00 4
  28. 18 19 a 2.00 4
  29. 19 20 a 2.00 4

或者使用 np.searchsorted

  1. df['MemWeight'] = np.searchsorted(mem_list, df['Weighting'], side='left')
英文:

You can use pd.cut since your mem_list is sorted:

  1. df[&#39;MemWeight&#39;] = pd.cut(df[&#39;Weighting&#39;], bins=[-np.inf, *mem_list, np.inf], labels=False)
  2. print(df)
  3. # Output
  4. MemRef MemName Weighting MemWeight
  5. 0 1 a 2.00 4
  6. 1 2 a 2.00 4
  7. 2 3 a 2.00 4
  8. 3 4 a 2.00 4
  9. 4 5 a 2.00 4
  10. 5 6 a 2.00 4
  11. 6 7 a 2.00 4
  12. 7 8 a 2.00 4
  13. 8 9 a 2.00 4
  14. 9 10 a 2.00 4
  15. 10 11 a 2.00 4
  16. 11 12 a 1.97 1
  17. 12 13 a 2.00 4
  18. 13 14 a 2.00 4
  19. 14 15 a 2.00 4
  20. 15 16 a 2.00 4
  21. 16 17 a 2.00 4
  22. 17 18 a 2.00 4
  23. 18 19 a 2.00 4
  24. 19 20 a 2.00 4

Or with np.searchsorted:

  1. df[&#39;MemWeight&#39;] = np.searchsorted(mem_list, df[&#39;Weighting&#39;], side=&#39;left&#39;)

huangapple
  • 本文由 发表于 2023年7月10日 20:43:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定