在数据框中通过另一列上的条件搜索数值。

huangapple go评论96阅读模式
英文:

Search values in dataframe by condition on another column

问题

  1. 我需要获取每个 'trigger' 的倍数中 B 中最接近的值所对应的A列的数值。
  2. 例如在下面的数据框中
  3. ```python
  4. import random
  5. trigger = 100
  6. info2 = {'A': [0]*100,'B': [0]*100}
  7. dfA = pd.DataFrame(info2)
  8. for i in range(1, len(dfA)):
  9. dfA.loc[i,'B'] = i*3.78
  10. dfA.loc[i,'A'] = i*10
  11. dfA

由于最接近 trigger1 的值是来自第 26 行的 98.28
最接近 trigger
2 的值是来自第 53 行的 200.34
最接近 trigger*3 的值是来自第 79 行的 298.62

期望的结果是:
result = [260, 530, 790]

  1. <details>
  2. <summary>英文:</summary>
  3. I need to get the value in column A for the closest value in column B for each multiple of &#39;trigger&#39;
  4. for instance, in the dataframe below :

import random

trigger = 100

info2 = {'A': [0]*100,'B': [0]*100}
dfA = pd.DataFrame(info2)

for i in range(1, len(dfA)):
dfA.loc[i,'B'] = i3.78
dfA.loc[i,'A'] = i
10

dfA

  1. [![enter image description here][1]][1]
  2. Since the closest value to trigger*1 would be 98.28 from row n&#176;26
  3. The closest value to trigger*2 would be 200.34 from row n&#176;53
  4. The closest value to trigger*3 would be 298.62 from row n&#176;79
  5. The expected result would be :
  6. result = [260,530,790]
  7. [1]: https://i.stack.imgur.com/LPP2v.png
  8. </details>
  9. # 答案1
  10. **得分**: 1

这可以做

  1. import numpy as np
  2. 触发器 = {'100': 100, '200': 200, '300': 300}
  3. for k, v in 触发器.items():
  4. dfA['delta_val'] = np.abs(dfA['B'] - v)
  5. 触发器[k] = dfA[dfA.delta_val == dfA.delta_val.min()]['A'].values[0]
  6. print(触发器)
  7. # {'100': 260, '200': 530, '300': 790}
英文:

This could do

  1. import numpy as np
  2. triggers = {&#39;100&#39;: 100, &#39;200&#39;: 200, &#39;300&#39;: 300}
  3. for k, v in triggers.items():
  4. dfA[&#39;delta_val&#39;] = np.abs(dfA[&#39;B&#39;] - v)
  5. triggers[k] = dfA[dfA.delta_val == dfA.delta_val.min()][&#39;A&#39;].values[0]
  6. print(triggers)
  7. # {&#39;100&#39;: 260, &#39;200&#39;: 530, &#39;300&#39;: 790}

答案2

得分: 1

另一种方法是:

  1. import pandas as pd
  2. import numpy as np
  3. trigger = 100
  4. info2 = {'A': [0]*100, 'B': [0]*100}
  5. dfA = pd.DataFrame(info2)
  6. for i in range(1, len(dfA)):
  7. dfA.loc[i, 'B'] = i * 3.78
  8. dfA.loc[i, 'A'] = i * 10
  9. result = []
  10. for t in np.arange(trigger, trigger * 4, trigger):
  11. idx = (np.abs(dfA['B'] - t)).idxmin()
  12. result.append(dfA.loc[idx, 'A'])
  13. print(result)

这段代码会产生你期望的结果。

英文:

Another approach is :

  1. import pandas as pd
  2. import numpy as np
  3. trigger = 100
  4. info2 = {&#39;A&#39;: [0]*100,&#39;B&#39;: [0]*100}
  5. dfA = pd.DataFrame(info2)
  6. for i in range(1, len(dfA)):
  7. dfA.loc[i,&#39;B&#39;] = i*3.78
  8. dfA.loc[i,&#39;A&#39;] = i*10
  9. result = []
  10. for t in np.arange(trigger, trigger*4, trigger):
  11. idx = (np.abs(dfA[&#39;B&#39;] - t)).idxmin()
  12. result.append(dfA.loc[idx, &#39;A&#39;])
  13. print(result)

which gives what you expected.

答案3

得分: 0

difference = abs(dfA - target)
min_index = difference.sum(axis=1).idxmin()
result = dfA.loc[min_index, :]
print(result)

英文:
  1. difference = abs(dfA - target)
  2. min_index = difference.sum(axis=1).idxmin()
  3. result = dfA.loc[min_index, :]
  4. print(result)

答案4

得分: 0

使用 merge_asof 函数:

  1. pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='B'),
  2. dfA, on='B', direction='nearest')

注意:首先需要对 dfA 按 B 列进行排序。

输出结果:

  1. B A
  2. 0 100.0 260
  3. 1 200.0 530
  4. 2 300.0 790

如果您还想要 B 列的值:

  1. pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='trigger'),
  2. dfA, left_on='trigger', right_on='B', direction='nearest')

输出结果:

  1. trigger A B
  2. 0 100.0 260 98.28
  3. 1 200.0 530 200.34
  4. 2 300.0 790 298.62
英文:

Use merge_asof:

  1. pd.merge_asof(pd.Series(np.arange(trigger, dfA[&#39;B&#39;].max(), trigger), name=&#39;B&#39;),
  2. dfA, on=&#39;B&#39;, direction=&#39;nearest&#39;)

NB. dfA must first be sorted on B.

Output:

  1. B A
  2. 0 100.0 260
  3. 1 200.0 530
  4. 2 300.0 790

If you also want the value of B:

  1. pd.merge_asof(pd.Series(np.arange(trigger, dfA[&#39;B&#39;].max(), trigger), name=&#39;trigger&#39;),
  2. dfA, left_on=&#39;trigger&#39;, right_on=&#39;B&#39;, direction=&#39;nearest&#39;)

Output:

  1. trigger A B
  2. 0 100.0 260 98.28
  3. 1 200.0 530 200.34
  4. 2 300.0 790 298.62

huangapple
  • 本文由 发表于 2023年2月8日 20:43:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385976.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定