Pandas:返回第一行,其中列值满足与值列表的条件相符。

huangapple go评论72阅读模式
英文:

Pandas: return first row where column value satisfies condition against a list of values

问题

我有一个名为df的数据帧和一个浮点数列表Tdf.B是按照时间顺序排序的值的时间序列,其中第0个索引是最近的时间戳,而最后一个索引是最旧的时间戳。

df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # 浮点数列表

我想要做的是

我想逐一比较列B的值与值列表T,并返回满足对t的条件的df的第一行。对于df的第一行,我是指在时间序列中遍历,并找到在任何tdf.B的值首次大于值t的时间点。

我尝试过的:

df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'

df2 = df.query('B >= @T')
# => 'Lengths must match to compare'

[ df[df['B'] >= t] for t in T ]
# => 从技术上讲,这可以工作,然后我可以再次迭代以检索第一行,但我会收到警告 -- pydevd warning: Computing repr of a (list) was slow 

编辑,示例:

T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]

# 我希望期望的输出将对应于`df.B`中以下值的行:
[7.7, 7.7, 8.8, 8.8, None, None]

希望这能帮助你解决问题。

英文:

I have a dataframe df and a list of floats T. df.B is a time series of values sorted in chronological order, where the 0th index is the most recent timestamp and the last index is the oldest timestamp.

df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # floats

What I am looking to do

I would like to compare the values of column B against the list of values T, one t at a time, and return the first row of df that satisfies the condition against t. By first row of df I mean walk through the timeseries (essentially) and find the first instance in time where the values in df.B become larger than the value t for any t in T.

What I've attempted:

df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'

df2 = df.query('B >= @T')
# => 'Lengths must match to compare'

[ df[df['B'] >= t] for t in T ]
# => Technically this works and then I can iterate again to retrieve the first row, but I get the warning -- pydevd warning: Computing repr of a (list) was slow 

EDIT, an example:

T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]

# I'm hoping that the expected output would have the rows corresponding to the following values in `df.B`:
[7.7, 7.7, 8.8, 8.8, None, None]

答案1

得分: 0

你可以对B列的值进行排序,然后使用numpy.searchsorted

import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])

要获取数据框的行,首先找到索引:

indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)

然后检索相应的行:

[df.iloc[i] if i < len(df) else None for i in indices]

[A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    3.3
B    7.7
Name: 2, dtype: float64, None, None]
英文:

You can sort values in column B, and then use numpy.searchsorted:

import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])

To get data frame rows, first find the indices:

indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)

Then retrieve corresponding rows:

[df.iloc[i] if i &lt; len(df) else None for i in indices]

[A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    3.3
B    7.7
Name: 2, dtype: float64, None, None]

huangapple
  • 本文由 发表于 2023年1月9日 06:31:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051724.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定