2023年2月24日 15:41:47go评论71阅读模式

英文:

Iterating on group of columns in a dataframe from custom list - pandas

问题

我有一个名为df的数据框，内容如下：

    TxnId     TxnDate           TxnCount
      100     2023-02-01      2
      500     2023-02-01      1
      400     2023-02-01      4
      100     2023-02-02      3
      500     2023-02-02      5
      100     2023-02-03      3
      500     2023-02-03      5
      400     2023-02-03      2

我有以下自定义列表：

    datelist = [datetime.date(2023,02,03), datetime.date(2023,02,02)]
    txnlist = [400,500]

我希望按照以下逻辑迭代df：

for every txn in txnlist:
     sum = 0
     for every date in datelist:
           sum += df[txn][date].TxnCount

我还希望了解如何找到已筛选的TxnIds的TxnCount的平均值。

在Sum步骤之后，基于上述输入和筛选条件：

     TxnId         TxnCount
      400          2
      500          10

与TxnId 400 相关的平均值 = (2+0)/2 = 1

与TxnId 500 相关的平均值 = (5+5)/2 = 5

如果平均值大于3，则将数据框中的行添加到breachList：

breachList =[[500,10]]

请帮助我如何在pandas中完成这些操作。

英文:

I have a dataframe df like this

TxnId     TxnDate           TxnCount
  100     2023-02-01      2
  500     2023-02-01      1
  400     2023-02-01      4
  100     2023-02-02      3
  500     2023-02-02      5
  100     2023-02-03      3
  500     2023-02-03      5
  400     2023-02-03      2

I have the following custom lists

datelist = [datetime.date(2023,02,03), datetime.date(2023,02,02)]
txnlist = [400,500]

I want to iterate the df as per below logic:

for every txn in txnlist:
     sum = 0
     for every date in datelist:
           sum += df[txn][date].TxnCount

I would also be interested to understand how to find average of TxnCount for filtered TxnIds.

After Sum step based on above input and filters:

 TxnId         TxnCount
  400          2
  500          10

Average corresponding to TxnId 400 = (2+0)/2 = 1

Average corresponding to TxnId 500 = (5+5)/2 = 5

If average > 3 , add row from dataframe to breachList

breachList =[[500,10]]

Please help me how to do this in pandas

答案1

得分: 2

使用两个列表首先通过boolean indexing和Series.isin来过滤DataFrame：

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

然后，对TxnCount列按组进行求和：

out = df1.groupby('TxnId', as_index=False)['TxnCount'].sum()
print (out)
   TxnId  TxnCount
0    400         2
1    500        10

如果需要按TxnId的平均值进行筛选，使用如下方法：

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

# 按TxnId创建平均值
out = df1.groupby('TxnId')['TxnCount'].mean()
print (out)
TxnId
400    2
500    5
Name: TxnCount, dtype: int64

# 获取TxnId大于4的值
TxnId = out[out > 4].index
print (TxnId)
Int64Index([500], dtype='int64', name='TxnId')

对df或df1的行进行筛选：

df2 = df[df['TxnId'].isin(TxnId)]
print(df2)
   TxnId     TxnDate  TxnCount
1    500  2023-02-01         1
4    500  2023-02-02         5
6    500  2023-02-03         5

df3 = df1[df1['TxnId'].isin(TxnId)]
print(df3)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5

编辑1：为了获得预期的输出，首先按列表筛选（以避免处理所有行）：

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

对TxnDate/TxnId的所有组合进行数据透视：

out = df1.pivot_table(index='TxnId', 
                      columns='TxnDate', 
                      values='TxnCount', 
                      aggfunc='sum', 
                      fill_value=0)
print (out)
TxnDate  2023-02-02  2023-02-03
TxnId                          
400               0           2
500               5           5

最后，通过计算每行的平均值并将其转换为列表来筛选和求和：

breachList = out.sum(axis=1)[out.mean(axis=1).gt(3)].reset_index().to_numpy().tolist()
print (breachList)
[[500, 10]]

英文:

Filter DataFrame by both lists first by boolean indexing with Series.isin:

df1 = df[df[&#39;TxnId&#39;].isin(txnlist) &amp; pd.to_datetime(df[&#39;TxnDate&#39;]).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

And then for sum of column TxnCount per groups:

out = df1.groupby(&#39;TxnId&#39;, as_index=False)[&#39;TxnCount&#39;].sum()
print (out)
   TxnId  TxnCount
0    400         2
1    500        10

EDIT: If need filter TxnId by average, here greater like 4 use:

df1 = df[df[&#39;TxnId&#39;].isin(txnlist) &amp; pd.to_datetime(df[&#39;TxnDate&#39;]).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

#create averages per TxnId
out = df1.groupby(&#39;TxnId&#39;)[&#39;TxnCount&#39;].mean()
print (out)
TxnId
400    2
500    5
Name: TxnCount, dtype: int64

#get TxnId greater like 4
TxnId = out[out &gt; 4].index
print (TxnId)
Int64Index([500], dtype=&#39;int64&#39;, name=&#39;TxnId&#39;)

Filter rows in df or df1:

df2 = df[df[&#39;TxnId&#39;].isin(TxnId)]
print(df2)
   TxnId     TxnDate  TxnCount
1    500  2023-02-01         1
4    500  2023-02-02         5
6    500  2023-02-03         5

df3 = df1[df1[&#39;TxnId&#39;].isin(TxnId)]
print(df3)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5

EDIT1: For expected ouput use:

First filter by lists (for avoid processig all rows):

df1 = df[df[&#39;TxnId&#39;].isin(txnlist) &amp; pd.to_datetime(df[&#39;TxnDate&#39;]).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

Pivoting for all combinations TxnDate/TxnId :

out = df1.pivot_table(index=&#39;TxnId&#39;, 
                      columns=&#39;TxnDate&#39;, 
                      values=&#39;TxnCount&#39;, 
                      aggfunc=&#39;sum&#39;, 
                      fill_value=0)
print (out)
TxnDate  2023-02-02  2023-02-03
TxnId                          
400               0           2
500               5           5

Last filtered summed values by means per rows and convert to lists:

breachList = out.sum(axis=1)[out.mean(axis=1).gt(3)].reset_index().to_numpy().tolist()
print (breachList)
[[500, 10]]

答案2

得分: 0

代码中的部分不需要翻译，以下是已翻译的内容：

这个嵌套循环的使用方式让人想起了2D pivot_table（或crosstab）：

df['TxnDate'] = pd.to_datetime(df['TxnDate'])

out = (df.pivot_table(index='TxnId', columns='TxnDate',
                      values='TxnCount', aggfunc='sum',
                      fill_value=0)
         .reindex(txnlist, datelist)
       )

输出：

TxnDate  2023-02-03  2023-02-02
TxnId                          
400               2           0
500               5           5

如果你想进一步按Id（或日期）进行聚合：

out.sum(axis=1)

TxnId
400     2
500    10
dtype: int64

英文:

The fact that your are using a nested loop is reminiscent of a 2D pivot_table (or crosstab):

df[&#39;TxnDate&#39;] = pd.to_datetime(df[&#39;TxnDate&#39;])

out = (df.pivot_table(index=&#39;TxnId&#39;, columns=&#39;TxnDate&#39;,
                      values=&#39;TxnCount&#39;, aggfunc=&#39;sum&#39;
                      fill_value=0)
         .reindex(txnlist, datelist)
       )

Output:

TxnDate  2023-02-03  2023-02-02
TxnId                          
400               2           0
500               5           5

And if you want to further aggregate on Ids (or Date):

out.sum(axis=1)

TxnId
400     2
500    10
dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在 pandas 中从自定义列表中迭代数据帧中的一组列 – pandas

问题

答案1

答案2

将Liquid模板渲染为Python字典。

OpenCV C++ 中与 np.ma.masked_where 等价的函数是什么？

使用JSON作为SQL查询生成器。

如何在`QRunnable`类内部使用`pyqtSlot()`装饰器正确地为函数添加类型提示？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论