2023年2月24日 01:05:34go评论86阅读模式

英文:

Pandas comparing values based on month and year

问题

在我的当前问题陈述中，我想比较与特定月份对齐的两个不同列中的值，并返回一个作为标识符的值。我的数据框如下所示：

    Account    year   month   value_1  value_2   
    A          2021   Jan              9
    A          2021   Feb              
    A          2021   Mar       7       
    A          2021   Apr       8      
    A          2021   May              
    B          2021   Jan       2       
    B          2021   Feb              10
    B          2021   Mar       5      
    B          2021   Apr       7

现在，在上述情况下，对于帐户A，值9在value_1列中出现在比value_2早的月份，而在帐户B中，值2在value_2列中出现在值10之前的月份。基本上，我想要比较每个帐户的value_2列值是否在value_1列之前出现，并且这些列已经按月份和年份排序。

我想要做的是创建一个结果数据框，显示以下内容：

    account      result
    A            value_2在value_1之前出现
    B            value_1在value_2之前出现

如何实现这一点？

英文:

In my current problem statement, I would like to compare values in two different columns which are aligned to a specific month and return a value that would act as a identifier. My dataframe looks like the following:

Account    year   month   value_1  value_2   
A          2021   Jan              9
A          2021   Feb              
A          2021   Mar       7       
A          2021   Apr       8      
A          2021   May              
B          2021   Jan       2       
B          2021   Feb              10
B          2021   Mar       5      
B          2021   Apr       7

Now in the above scenario, for account A, the value 9 appeared in a earlier month in value_2 column as compared to value_1 and in account B, the value 2 appeared before the value 10 in value_2 column. Essentially I want to compare if value_2 column values appeared before value_1 column for every account and these columns are sorted by month and year already.

What I want to do is to create a resultant dataframe that shows the following:

account      result
A            value_2 appeared before value_1
B            value_1 appeared before value_2

How can I achieve this?

答案1

得分: 1

以下是代码的翻译部分：

假设每个组至少有1个值，且同一行永远不会有2个值... 这应该适用于您。
def check_order(group):
    value_1_idx = group['value_1'].dropna().index.min() 
    value_2_idx = group['value_2'].dropna().index.min()
    if (value_1_idx.min() < value_2_idx) or ((value_2_idx is np.nan) & (value_1_idx != np.nan)):
        return 'value_1 在 value_2 之前出现'
    elif (value_2_idx < value_1_idx) or ((value_1_idx is np.nan) & (value_2_idx != np.nan)):
        return 'value_2 在 value_1 之前出现'
result = df.groupby('Account').apply(check_order).reset_index(name='result')

请注意，翻译中的"出现"的含义可能根据上下文需要进行进一步调整。

英文:

Assuming you will only have at least 1 value per group and never 2 values in the same row... this should work for you.

def check_order(group):
    value_1_idx = group[&#39;value_1&#39;].dropna().index.min() 
    value_2_idx = group[&#39;value_2&#39;].dropna().index.min()
    if (value_1_idx.min() &lt; value_2_idx) or ((value_2_idx is np.nan) &amp; (value_1_idx != np.nan)):
        return &#39;value_1 appeared before value_2&#39;
    elif (value_2_idx &lt; value_1_idx) or ((value_1_idx is np.nan) &amp; (value_2_idx != np.nan)):
        return &#39;value_2 appeared before value_1&#39;
result = df.groupby(&#39;Account&#39;).apply(check_order).reset_index(name=&#39;result&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas基于月份和年份比较数值。

问题

答案1

Pandas将.xlsx列读取为日期时间而不是浮点数。

asyncpg和psycopg2之间的区别

ModuleNotFoundError for 'sklearn' as subdependency of numpy

GitHub不反映pandas样式

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。