2023年5月20日 23:44:28go评论88阅读模式

英文:

Applying Groupby with an np.where function to detect a pattern

问题

I've made a function that finds a pattern in a series in a dataframe using np.where. The function finds a series of three <0 values where each is lower than the previous. If the 4th value is higher than the 3rd, the function returns 1.

Here is the working code:

import pandas as pd
import numpy as np
def PFunc1():
    val = np.where((
        (df1['Score'].shift(+3) < 0) &
        (df1['Score'].shift(+1) < 0) &
        (df1['Score'].shift(+2) < df1['Score'].shift(+3)) &
        (df1['Score'].shift(+1) < df1['Score'].shift(+2)) &
        (df1['Score'] > df1['Score'].shift(+1))), 1, 0)
    return val
df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
               'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,3,df1.shape[0])
df1['Pattern'] = PFunc1()
df1.head(50)

When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.

df1['Pattern2'] = df1.groupby('Name')['Score'].apply(PFunc1())

Is this possible with np.where or is a different approach needed?
Many thanks

英文:

Here is the working code:

import pandas as pd
import numpy as np
def PFunc1():
    val = np.where((
        (df1[&#39;Score&#39;].shift(+3)&lt;0) &amp;
        (df1[&#39;Score&#39;].shift(+1)&lt;0) &amp;
        (df1[&#39;Score&#39;].shift(+2) &lt; df1[&#39;Score&#39;].shift(+3)) &amp; 
        (df1[&#39;Score&#39;].shift(+1) &lt; df1[&#39;Score&#39;].shift(+2)) &amp; 
        (df1[&#39;Score&#39;] &gt; df1[&#39;Score&#39;].shift(+1))),1,0)
    return val
df1 = pd.DataFrame()
df1[&#39;Name&#39;] = [&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,
               &#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;]
df1[&#39;Score&#39;] = np.random.randint(-4,3,df1.shape[0])
df1[&#39;Pattern&#39;] = PFunc1()
df1.head(50)

When I run the below I get the error: TypeError: unhashable type: 'numpy.ndarray'
Applying the same with a Lambda function results in Nans.

df1[&#39;Pattern2&#39;] = df1.groupby(&#39;Name&#39;)[&#39;Score&#39;].apply(PFunc1())

Is this possible with np.where or is a different approach needed?
Many thanks

答案1

得分: 1

以下是您要翻译的内容：

The function works but I need to use groupby to apply it to all the names in the table.

It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use GroupBy.apply(), use GroupBy.transform().

The transforming function needs to accept a Series as input, so you should modify PFunc1 to accept an argument.

Also, PFunc1 can be slightly streamlined:

It doesn't require np.where

You don't need to check if scores.shift(+1) < 0

(If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)
import pandas as pd
import numpy as np
def PFunc1(scores):
    return (
        (scores.shift(+3) < 0) &
        (scores.shift(+2) < scores.shift(+3)) &
        (scores.shift(+1) < scores.shift(+2)) &
        (scores > scores.shift(+1))
    ).astype(int)
df1 = pd.DataFrame()
df1['Name'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
               'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B']
df1['Score'] = np.random.randint(-4,0,df1.shape[0])
# Process all Scores without respect to Name
df1['Pattern'] = PFunc1(df1['Score'])
# Process Scores for each Name independently
df1['Pattern2'] = df1.groupby('Name')['Score'].transform(PFunc1)

英文:

>The function works but I need to use groupby to apply it to all the names in the table.

It looks like you aren't performing an aggregation; you're performing an item-by-item transformation. Therefore, don't use GroupBy.apply(), use GroupBy.transform().

The transforming function needs to accept a Series as input, so you should modify PFunc1 to accept an argument.

Also, PFunc1 can be slightly streamlined:

It doesn't require np.where
You don't need to check if scores.shift(+1) < 0
- (If the first item is less than 0 and the next two items are even less than the first, then there's no need to check if they are also less than 0.)

import pandas as pd
import numpy as np
def PFunc1(scores):
    return (
        (scores.shift(+3)&lt;0) &amp;
        (scores.shift(+2) &lt; scores.shift(+3)) &amp;
        (scores.shift(+1) &lt; scores.shift(+2)) &amp;
        (scores &gt; scores.shift(+1))
    ).astype(int)
df1 = pd.DataFrame()
df1[&#39;Name&#39;] = [&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,
               &#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;]
df1[&#39;Score&#39;] = np.random.randint(-4,0,df1.shape[0])
# Process all Scores without respect to Name
df1[&#39;Pattern&#39;] = PFunc1(df1[&#39;Score&#39;])
# Process Scores for each Name independently
df1[&#39;Pattern2&#39;] = df1.groupby(&#39;Name&#39;)[&#39;Score&#39;].transform(PFunc1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

应用Groupby和np.where函数来检测模式。

问题

答案1

从Pandas时间戳中获取日期的更清晰方法

Pandas根据两列中的分隔符拆分对应的行，并复制其他所有内容。

Excel数据验证在Python中

XLSX Writer的num_format函数在Excel中不可视。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。