英文:
Pandas: return first row where column value satisfies condition against a list of values
问题
我有一个名为df
的数据帧和一个浮点数列表T
。df.B
是按照时间顺序排序的值的时间序列,其中第0个索引是最近的时间戳,而最后一个索引是最旧的时间戳。
df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # 浮点数列表
我想要做的是
我想逐一比较列B
的值与值列表T
,并返回满足对t
的条件的df
的第一行。对于df
的第一行,我是指在时间序列中遍历,并找到在任何t
中df.B
的值首次大于值t
的时间点。
我尝试过的:
df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'
df2 = df.query('B >= @T')
# => 'Lengths must match to compare'
[ df[df['B'] >= t] for t in T ]
# => 从技术上讲,这可以工作,然后我可以再次迭代以检索第一行,但我会收到警告 -- pydevd warning: Computing repr of a (list) was slow
编辑,示例:
T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]
# 我希望期望的输出将对应于`df.B`中以下值的行:
[7.7, 7.7, 8.8, 8.8, None, None]
希望这能帮助你解决问题。
英文:
I have a dataframe df
and a list of floats T
. df.B
is a time series of values sorted in chronological order, where the 0th index is the most recent timestamp and the last index is the oldest timestamp.
df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # floats
What I am looking to do
I would like to compare the values of column B
against the list of values T
, one t
at a time, and return the first row of df
that satisfies the condition against t
. By first row of df
I mean walk through the timeseries (essentially) and find the first instance in time where the values in df.B
become larger than the value t
for any t
in T
.
What I've attempted:
df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'
df2 = df.query('B >= @T')
# => 'Lengths must match to compare'
[ df[df['B'] >= t] for t in T ]
# => Technically this works and then I can iterate again to retrieve the first row, but I get the warning -- pydevd warning: Computing repr of a (list) was slow
EDIT, an example:
T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]
# I'm hoping that the expected output would have the rows corresponding to the following values in `df.B`:
[7.7, 7.7, 8.8, 8.8, None, None]
答案1
得分: 0
你可以对B列的值进行排序,然后使用numpy.searchsorted
:
import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])
要获取数据框的行,首先找到索引:
indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)
然后检索相应的行:
[df.iloc[i] if i < len(df) else None for i in indices]
[A 1.1
B 5.5
Name: 0, dtype: float64, A 1.1
B 5.5
Name: 0, dtype: float64, A 1.1
B 5.5
Name: 0, dtype: float64, A 3.3
B 7.7
Name: 2, dtype: float64, None, None]
英文:
You can sort values in column B, and then use numpy.searchsorted
:
import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])
To get data frame rows, first find the indices:
indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)
Then retrieve corresponding rows:
[df.iloc[i] if i < len(df) else None for i in indices]
[A 1.1
B 5.5
Name: 0, dtype: float64, A 1.1
B 5.5
Name: 0, dtype: float64, A 1.1
B 5.5
Name: 0, dtype: float64, A 3.3
B 7.7
Name: 2, dtype: float64, None, None]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论