Pandas DataFrame在特定行之前检查条件

huangapple go评论82阅读模式
英文:

Pandas DataFrame checking condition before a specific row

问题

我有上面的DataFrame,有数百万行数据,希望按['Instrument','Date']进行分组以进行一些数据分析。

我希望将每个组的最后一行与之前的值进行比较,这个值是第一个大于或等于最后一行值的值。例如,如图所示,Instrument AAD在Date 4/18/2012上的Value为32437.5,时间为9:59:44 AM。首个大于或等于该值的值是时间为9:42:39 AM,Value为37491.87 --> 这是我想要的结果。

如果我想使用Pandas Python进行编码,我可以知道在这种情况下哪个代码最好吗?

谢谢。

英文:

DataFrame

I have the above DataFrame with millions of rows and wish to groupby(['Instrument', 'Date']) for some data analysis.

I wish to compare the last row of each group with the Value before, which is the first to be equal or exceed the Value of the last row. For instance, as shown in the Image, Instrument AAD on Date 4/18/2012 has a Value of 32437.5, at Time 9:59:44 AM. The first to exceed or equal that Value is at Time 9:42:39 AM with a Value of 37491.87 --> this is the result that I want.

If I wish to code with Pandas Python, may I know what code is best for this scenario?

Thank you.

答案1

得分: 0

除非您编辑了数据框以使最后一个值成为第一个以便进行比较,或者创建了某种临时数组/缓冲区来存储和比较这些值,否则您将不得不运行两次检查,首先查找组的最后一行,然后查找组中第一个超越值。我建议您创建一个数组,存储组的值,然后取最后一个值并运行一个“while not”语句

group = [1, 2, 3, 4, 5, 6, 3]

overtake = False
while not overtake:
  for i in group:
    if group[i] >= group[-1]:
        overtake_value = group[i]
        overtake = True
        break
print(overtake_value)

>> 3

您只需要一种方法将组中的值的列分配给临时数组,以使此方法有效

编辑说明:数组/列表应该包含组的值的条目,即仅为一维数组。

英文:

Unless you edited the dataframe for the last value to become the first in order to compare against it, or you created some sort of temporary array/buffer to store and compare the values, you'd have to run two checks, first to find the final row of the group, then to find the first overtaking value in the group. I recommend you create an array, storing the values of the group, then taking the last value and running a 'while not' statement

group = [1,2,3,4,5,6,3]

overtake = False
while not overtake:
  for i in group:
    if group[i] >= group[-1]:
        overtake_value = group[i]
        overtake = True
        break
print(overtake_value)

>> 3

You just need a way to get the column of values in the group assigned to temporary array for this method to work

edit note: the array/list should contain the entries of the values of the group, meaning only a 1 dimensional array.

答案2

得分: 0

这应该可以工作,

def f(grp):
    return grp.loc[(grp >= grp.iloc[-1])].iloc[0]

res = df.groupby(['Instrument', 'Date'])['Value'].agg(lambda x: f(x))
res.head()

如果您不确定是否总是会有一个高于最后一行的值,请使用以下的 f()

def f(grp):
    try:
        return grp.loc[(grp >= grp.iloc[-1])].iloc[0]
    except IndexError:
        return np.nan
英文:

This should work,

def f(grp):
  
  return grp.loc[(grp>=grp.iloc[-1])].iloc[0]

res = df.groupby(['Instrument', 'Date'])['Value'].agg(lambda x: f(x))
res.head()

If you are not certain that always there's going to be a value higher than last row, use the following f().

def f(grp):
  try:
    return grp.loc[(grp>=grp.iloc[-1])].iloc[0]
  except IndexError:
    return np.nan

huangapple
  • 本文由 发表于 2020年1月6日 18:20:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610318.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定