2020年1月6日 18:20:28go评论100阅读模式

英文:

Pandas DataFrame checking condition before a specific row

问题

我有上面的DataFrame，有数百万行数据，希望按['Instrument'，'Date']进行分组以进行一些数据分析。

我希望将每个组的最后一行与之前的值进行比较，这个值是第一个大于或等于最后一行值的值。例如，如图所示，Instrument AAD在Date 4/18/2012上的Value为32437.5，时间为9:59:44 AM。首个大于或等于该值的值是时间为9:42:39 AM，Value为37491.87 --> 这是我想要的结果。

如果我想使用Pandas Python进行编码，我可以知道在这种情况下哪个代码最好吗？

谢谢。

英文:

DataFrame

I have the above DataFrame with millions of rows and wish to groupby(['Instrument', 'Date']) for some data analysis.

I wish to compare the last row of each group with the Value before, which is the first to be equal or exceed the Value of the last row. For instance, as shown in the Image, Instrument AAD on Date 4/18/2012 has a Value of 32437.5, at Time 9:59:44 AM. The first to exceed or equal that Value is at Time 9:42:39 AM with a Value of 37491.87 --> this is the result that I want.

If I wish to code with Pandas Python, may I know what code is best for this scenario?

Thank you.

答案1

得分: 0

除非您编辑了数据框以使最后一个值成为第一个以便进行比较，或者创建了某种临时数组/缓冲区来存储和比较这些值，否则您将不得不运行两次检查，首先查找组的最后一行，然后查找组中第一个超越值。我建议您创建一个数组，存储组的值，然后取最后一个值并运行一个“while not”语句

group = [1, 2, 3, 4, 5, 6, 3]
overtake = False
while not overtake:
  for i in group:
    if group[i] >= group[-1]:
        overtake_value = group[i]
        overtake = True
        break
print(overtake_value)
>> 3

您只需要一种方法将组中的值的列分配给临时数组，以使此方法有效

编辑说明：数组/列表应该包含组的值的条目，即仅为一维数组。

英文:

Unless you edited the dataframe for the last value to become the first in order to compare against it, or you created some sort of temporary array/buffer to store and compare the values, you'd have to run two checks, first to find the final row of the group, then to find the first overtaking value in the group. I recommend you create an array, storing the values of the group, then taking the last value and running a 'while not' statement

group = [1,2,3,4,5,6,3]
overtake = False
while not overtake:
  for i in group:
    if group[i] &gt;= group[-1]:
        overtake_value = group[i]
        overtake = True
        break
print(overtake_value)
&gt;&gt; 3

You just need a way to get the column of values in the group assigned to temporary array for this method to work

edit note: the array/list should contain the entries of the values of the group, meaning only a 1 dimensional array.

答案2

得分: 0

这应该可以工作，

def f(grp):
    return grp.loc[(grp >= grp.iloc[-1])].iloc[0]
res = df.groupby(['Instrument', 'Date'])['Value'].agg(lambda x: f(x))
res.head()

如果您不确定是否总是会有一个高于最后一行的值，请使用以下的 f()。

def f(grp):
    try:
        return grp.loc[(grp >= grp.iloc[-1])].iloc[0]
    except IndexError:
        return np.nan

英文:

This should work,

def f(grp):
  
  return grp.loc[(grp&gt;=grp.iloc[-1])].iloc[0]
res = df.groupby([&#39;Instrument&#39;, &#39;Date&#39;])[&#39;Value&#39;].agg(lambda x: f(x))
res.head()

If you are not certain that always there's going to be a value higher than last row, use the following f().

def f(grp):
  try:
    return grp.loc[(grp&gt;=grp.iloc[-1])].iloc[0]
  except IndexError:
    return np.nan

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas DataFrame在特定行之前检查条件

问题

答案1

答案2

如何区分 xgboost 的 XGBRFClassifier 和 XGBClassifier 模型类型。

一个用元组表示的链表 Python

在Python的Hypothesis库中，为什么text()策略会导致自定义策略重试？

如何退出Python的TCP接受函数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。