英文:
Applying Groupby with an np.where function to detect a pattern
问题
I've made a function that finds a pattern in a series in a dataframe using np.where. The function finds a series of three <0 values where each is lower than the previous. If the 4th value is higher than the 3rd, the function returns 1.
Here is the working code:
import pandas as pd
import numpy as np
def PFunc1():
val = np.where((
(df1['Score'].shift(+3) < 0) &
(df1['Score'].shift(+1) < 0) &
(df1['Score'].shift(+2) < df1['Score'].shift(+3)) &
(df1['Score'].shift(+1) < df1['Score'].shift(+2)) &
(df1['Score'] > df1['Score'].shift(+1))), 1, 0)
return val
df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,3,df1.shape[0])
df1['Pattern'] = PFunc1()
df1.head(50)
When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.
df1['Pattern2'] = df1.groupby('Name')['Score'].apply(PFunc1())
Is this possible with np.where or is a different approach needed?
Many thanks
英文:
I've made a function that finds a pattern in a series in a dataframe using np.where. The function finds a series of three <0 values where each is lower than the previous. If the 4th value is higher than the 3rd, the function returns 1.
The function works but I need to use groupby to apply it to all the names in the table.
Here is the working code:
import pandas as pd
import numpy as np
def PFunc1():
val = np.where((
(df1['Score'].shift(+3)<0) &
(df1['Score'].shift(+1)<0) &
(df1['Score'].shift(+2) < df1['Score'].shift(+3)) &
(df1['Score'].shift(+1) < df1['Score'].shift(+2)) &
(df1['Score'] > df1['Score'].shift(+1))),1,0)
return val
df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,3,df1.shape[0])
df1['Pattern'] = PFunc1()
df1.head(50)
When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.
df1['Pattern2'] = df1.groupby('Name')['Score'].apply(PFunc1())
Is this possible with np.where or is a different approach needed?
Many thanks
答案1
得分: 1
以下是您要翻译的内容:
The function works but I need to use groupby to apply it to all the names in the table.
It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use
GroupBy.apply()
, useGroupBy.transform()
.The transforming function needs to accept a
Series
as input, so you should modifyPFunc1
to accept an argument.Also,
PFunc1
can be slightly streamlined:
- It doesn't require
np.where
- You don't need to check if
scores.shift(+1) < 0
- (If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)
import pandas as pd import numpy as np def PFunc1(scores): return ( (scores.shift(+3) < 0) & (scores.shift(+2) < scores.shift(+3)) & (scores.shift(+1) < scores.shift(+2)) & (scores > scores.shift(+1)) ).astype(int) df1 = pd.DataFrame() df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A', 'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B'] df1['Score'] = np.random.randint(-4,0,df1.shape[0]) # Process all Scores without respect to Name df1['Pattern'] = PFunc1(df1['Score']) # Process Scores for each Name independently df1['Pattern2'] = df1.groupby('Name')['Score'].transform(PFunc1)
英文:
>The function works but I need to use groupby to apply it to all the names in the table.
It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use GroupBy.apply()
, use GroupBy.transform()
.
The transforming function needs to accept a Series
as input, so you should modify PFunc1
to accept an argument.
Also, PFunc1
can be slightly streamlined:
- It doesn't require
np.where
- You don't need to check if
scores.shift(+1) < 0
- (If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)
import pandas as pd
import numpy as np
def PFunc1(scores):
return (
(scores.shift(+3)<0) &
(scores.shift(+2) < scores.shift(+3)) &
(scores.shift(+1) < scores.shift(+2)) &
(scores > scores.shift(+1))
).astype(int)
df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,0,df1.shape[0])
# Process all Scores without respect to Name
df1['Pattern'] = PFunc1(df1['Score'])
# Process Scores for each Name independently
df1['Pattern2'] = df1.groupby('Name')['Score'].transform(PFunc1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论