Pandas 更新先前的记录,因为未来的峰值不可能。

huangapple go评论140阅读模式
英文:

Pandas update previous records because future peaking is not possible

问题

以下是您要翻译的代码部分:

import numpy as np
import pandas_ta as ta
from pandas import DataFrame, pandas

df = pandas.DataFrame({"color": [None, None, 'blue', None, None, None, 'orange', None, None, None, None],
                       'bottom': [1, 2, 7, 5, 9, 9, 5, 4, 5, 5, 3],
                       'top': [5, 5, 11, 8, 10, 10, 9, 7, 10, 6, 7])

print(df)

# lookback period
N = 3

# Pivot each color to own column and shift
df2 = (df.pivot(columns='color', values=['top', 'bottom'])
         .drop(columns=np.nan, level=1)
         .ffill(limit=N-1).shift()
       )

# compare current top with bottom & top from color occurance
out = df.join((df2['bottom'].le(df['top'], axis=0)
               & df2['top'].ge(df['top'], axis=0)).astype(int))
print(out)

希望这对您有所帮助。如果您有任何其他问题,请随时提出。

英文:

This is what I have so far:

import numpy as np
import pandas_ta as ta
from pandas import DataFrame, pandas

df = pandas.DataFrame({"color": [None, None, 'blue', None, None, None, 'orange', None, None, None, None],
                       'bottom': [1, 2, 7, 5, 9, 9, 5, 4, 5, 5, 3],
                       'top': [5, 5, 11, 8, 10, 10, 9, 7, 10, 6, 7]})

print(df)

"""
     color  down  top
0     None     1    5
1     None     2    5
2     blue     7   11
3     None     5    8
4     None     9   10
5     None     9   10
6   orange     5    9
7     None     4    7
8     None     5   10
9     None     5    6
10    None     3    7
"""

# lookback period
N = 3

# Pivot each color to own column and shift
df2 = (df.pivot(columns='color', values=['top', 'bottom'])
         .drop(columns=np.nan, level=1)
         .ffill(limit=N-1).shift()
       )


# compare current top with bottom & top from color occurance
out = df.join((df2['bottom'].le(df['top'], axis=0)
               & df2['top'].ge(df['top'], axis=0)).astype(int))
print(out)


"""
     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     1       0
5     None       9   10     1       0
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       1
10    None       3    7     0       0
"""

Question:

I only want to consume each color once. That means that for every blue or orange occurrence there can only be only one 1 in the upcoming 3 rows.
( 2 blues after each other will result in two 1s. One 1 for every blue.)

"""
     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     1       0 --> this should be 0, blue already consumed on row 3
5     None       9   10     1       0 --> this should be 0, blue already consumed on row 3
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       1 --> this should be 0, orange already consumed on row 7
10    None       3    7     0       0
"""

One bottleneck is that for this to function correctly I am not allowed to peak in to the future. So I am not allowed to use .shift(-3) or iloc[-1] for example.

That sort of kills my initial thinking about keeping track of a consumed state by using something like .rolling(-3).max() == 1 .

答案1

得分: 1

你可以对输出进行后处理,只保留每个组的第一个1:

使用循环:

cols = list(df['color'].dropna().unique())

g = out.groupby(df['color'].notna().cumsum())
for c in cols:
    out[c] = np.where(out[c].eq(1) & df.index.isin(g[c].idxmax()), 1, 0)

输出:

     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     0       0
5     None       9   10     0       0
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       0
10    None       3    7     0       0

请注意,上述代码是对给定代码的输出进行后处理以保留每个组的第一个1。

英文:

You can post-process the output to only keep the first 1 per group:

# lookback period
N = 3

# Pivot each color to own column and shift
df2 = (df.pivot(columns='color', values=['top', 'bottom'])
         .drop(columns=np.nan, level=1)
         .ffill(limit=N-1).shift()
       )

# compare current top with bottom & top from color occurance
out = df.join((df2['bottom'].le(df['top'], axis=0)
               & df2['top'].ge(df['top'], axis=0)).astype(int))

# post process the output to keep only the first 1
cols = list(df['color'].dropna().unique())

out[cols] = out[cols].mask(out[cols].ne(out.groupby(df['color'].notna().cumsum())[cols].cumsum()), 0)

Or with a loop:

cols = list(df['color'].dropna().unique())

g = out.groupby(df['color'].notna().cumsum())
for c in cols:
    out[c] = np.where(out[c].eq(1) & df.index.isin(g[c].idxmax()), 1, 0)

Output:

     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     0       0
5     None       9   10     0       0
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       0
10    None       3    7     0       0

huangapple
  • 本文由 发表于 2023年3月8日 17:10:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定