应用Groupby和np.where函数来检测模式。

huangapple go评论64阅读模式
英文:

Applying Groupby with an np.where function to detect a pattern

问题

I've made a function that finds a pattern in a series in a dataframe using np.where. The function finds a series of three <0 values where each is lower than the previous. If the 4th value is higher than the 3rd, the function returns 1.

Here is the working code:

import pandas as pd
import numpy as np

def PFunc1():
    val = np.where((
        (df1['Score'].shift(+3) < 0) &
        (df1['Score'].shift(+1) < 0) &
        (df1['Score'].shift(+2) < df1['Score'].shift(+3)) &
        (df1['Score'].shift(+1) < df1['Score'].shift(+2)) &
        (df1['Score'] > df1['Score'].shift(+1))), 1, 0)
    return val

df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
               'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,3,df1.shape[0])
df1['Pattern'] = PFunc1()
df1.head(50)

When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.

df1['Pattern2'] = df1.groupby('Name')['Score'].apply(PFunc1())

Is this possible with np.where or is a different approach needed?
Many thanks

英文:

I've made a function that finds a pattern in a series in a dataframe using np.where. The function finds a series of three <0 values where each is lower than the previous. If the 4th value is higher than the 3rd, the function returns 1.
The function works but I need to use groupby to apply it to all the names in the table.

Here is the working code:

import pandas as pd
import numpy as np
def PFunc1():
    val = np.where((
        (df1[&#39;Score&#39;].shift(+3)&lt;0) &amp;
        (df1[&#39;Score&#39;].shift(+1)&lt;0) &amp;
        (df1[&#39;Score&#39;].shift(+2) &lt; df1[&#39;Score&#39;].shift(+3)) &amp; 
        (df1[&#39;Score&#39;].shift(+1) &lt; df1[&#39;Score&#39;].shift(+2)) &amp; 
        (df1[&#39;Score&#39;] &gt; df1[&#39;Score&#39;].shift(+1))),1,0)
    return val

df1 = pd.DataFrame()
df1[&#39;Name&#39;] = [&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,
               &#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;]
df1[&#39;Score&#39;] = np.random.randint(-4,3,df1.shape[0])
df1[&#39;Pattern&#39;] = PFunc1()
df1.head(50)

When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.

df1[&#39;Pattern2&#39;] = df1.groupby(&#39;Name&#39;)[&#39;Score&#39;].apply(PFunc1())

Is this possible with np.where or is a different approach needed?
Many thanks

答案1

得分: 1

以下是您要翻译的内容:

The function works but I need to use groupby to apply it to all the names in the table.

It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use GroupBy.apply(), use GroupBy.transform().

The transforming function needs to accept a Series as input, so you should modify PFunc1 to accept an argument.

Also, PFunc1 can be slightly streamlined:

  • It doesn't require np.where
  • You don't need to check if scores.shift(+1) < 0
    • (If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)
import pandas as pd
import numpy as np

def PFunc1(scores):
    return (
        (scores.shift(+3) < 0) &
        (scores.shift(+2) < scores.shift(+3)) &
        (scores.shift(+1) < scores.shift(+2)) &
        (scores > scores.shift(+1))
    ).astype(int)

df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
               'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,0,df1.shape[0])

# Process all Scores without respect to Name
df1['Pattern'] = PFunc1(df1['Score'])

# Process Scores for each Name independently
df1['Pattern2'] = df1.groupby('Name')['Score'].transform(PFunc1)
英文:

>The function works but I need to use groupby to apply it to all the names in the table.

It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use GroupBy.apply(), use GroupBy.transform().

The transforming function needs to accept a Series as input, so you should modify PFunc1 to accept an argument.

Also, PFunc1 can be slightly streamlined:

  • It doesn't require np.where
  • You don't need to check if scores.shift(+1) &lt; 0
    • (If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)
import pandas as pd
import numpy as np

def PFunc1(scores):
    return (
        (scores.shift(+3)&lt;0) &amp;
        (scores.shift(+2) &lt; scores.shift(+3)) &amp;
        (scores.shift(+1) &lt; scores.shift(+2)) &amp;
        (scores &gt; scores.shift(+1))
    ).astype(int)


df1 = pd.DataFrame()
df1[&#39;Name&#39;] = [&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,
               &#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;]
df1[&#39;Score&#39;] = np.random.randint(-4,0,df1.shape[0])

# Process all Scores without respect to Name
df1[&#39;Pattern&#39;] = PFunc1(df1[&#39;Score&#39;])

# Process Scores for each Name independently
df1[&#39;Pattern2&#39;] = df1.groupby(&#39;Name&#39;)[&#39;Score&#39;].transform(PFunc1)

huangapple
  • 本文由 发表于 2023年5月20日 23:44:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76296050.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定