2023年1月9日 06:31:24go评论95阅读模式

英文:

Pandas: return first row where column value satisfies condition against a list of values

问题

我有一个名为df的数据帧和一个浮点数列表T。df.B是按照时间顺序排序的值的时间序列，其中第0个索引是最近的时间戳，而最后一个索引是最旧的时间戳。

df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # 浮点数列表

我想要做的是

我想逐一比较列B的值与值列表T，并返回满足对t的条件的df的第一行。对于df的第一行，我是指在时间序列中遍历，并找到在任何t中df.B的值首次大于值t的时间点。

我尝试过的：

df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'
df2 = df.query('B >= @T')
# => 'Lengths must match to compare'
[ df[df['B'] >= t] for t in T ]
# => 从技术上讲，这可以工作，然后我可以再次迭代以检索第一行，但我会收到警告 -- pydevd warning: Computing repr of a (list) was slow

编辑，示例：

T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]
# 我希望期望的输出将对应于`df.B`中以下值的行：
[7.7, 7.7, 8.8, 8.8, None, None]

希望这能帮助你解决问题。

英文:

I have a dataframe df and a list of floats T. df.B is a time series of values sorted in chronological order, where the 0th index is the most recent timestamp and the last index is the oldest timestamp.

df = pd.DataFrame({&#39;A&#39;: [1.1, 2.2, 3.3, 4.4], &#39;B&#39;: [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # floats

What I am looking to do

I would like to compare the values of column B against the list of values T, one t at a time, and return the first row of df that satisfies the condition against t. By first row of df I mean walk through the timeseries (essentially) and find the first instance in time where the values in df.B become larger than the value t for any t in T.

What I've attempted:

df.loc[df.apply(lambda x: x.B &gt;= T, axis=1)]
# =&gt; TypeError: unhashable type: &#39;numpy.ndarray&#39;
df2 = df.query(&#39;B &gt;= @T&#39;)
# =&gt; &#39;Lengths must match to compare&#39;
[ df[df[&#39;B&#39;] &gt;= t] for t in T ]
# =&gt; Technically this works and then I can iterate again to retrieve the first row, but I get the warning -- pydevd warning: Computing repr of a (list) was slow

EDIT, an example:

T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]
# I&#39;m hoping that the expected output would have the rows corresponding to the following values in `df.B`:
[7.7, 7.7, 8.8, 8.8, None, None]

答案1

得分: 0

你可以对B列的值进行排序，然后使用numpy.searchsorted：

import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])

要获取数据框的行，首先找到索引：

indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)

然后检索相应的行：

[df.iloc[i] if i < len(df) else None for i in indices]
[A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    3.3
B    7.7
Name: 2, dtype: float64, None, None]

英文:

You can sort values in column B, and then use numpy.searchsorted:

import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])

To get data frame rows, first find the indices:

indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)

Then retrieve corresponding rows:

[df.iloc[i] if i &lt; len(df) else None for i in indices]
[A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    3.3
B    7.7
Name: 2, dtype: float64, None, None]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas：返回第一行，其中列值满足与值列表的条件相符。

问题

答案1

在pyspark中使用Params。

向量连接

TypeError: ‘decimal.Decimal’ object cannot be interpreted as an integer

如何按照4个元素分组

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。