将一个 Pandas 的 groupby 对象保存到一个 CSV 文件中。

huangapple go评论82阅读模式
英文:

save a pandas groupby object into a csv file

问题

你想要将每个供应商分组的数据保存到单独的CSV文件中,对吗?以下是您提供的代码的翻译部分:

import pandas as pd

# 创建一个DataFrame
dff = pd.DataFrame({'SKU': ['001', '002', '003'],
                    'revenue_contribution_in_percentage': [0.2, 0.5, 0.3],
                    'BuyPrice': [2, 3, 4],
                    'SellPrice': [5, 6, 6],
                    'margin': [3, 3, 2],
                    'Avg_per_week': [3, 2, 5],
                    'StockOnHand': [4, 10, 20],
                    'StockOnOrder': [0, 0, 0],
                    'Supplier': ['ABC', 'ABC', 'ABZ'],
                    'SupplierLeadTime': [5, 5, 5],
                    'cumul_value': [0.4, 0.6, 1],
                    'class_mention': ['A', 'A', 'B'],
                    'std_week': [1, 2, 1],
                    'review_time': [2, 2, 2],
                    'holding_cost': [0.35, 0.35, 0.35],
                    'aggregate_order_placement_cost': [200, 230, 210]})

# 根据'Supplier'列进行分组
groups = [group.reset_index().set_index(['SKU'])[['revenue_contribution_in_percentage',
                                                  'BuyPrice',
                                                  'SellPrice',
                                                  'margin',
                                                  'Avg_per_week',
                                                  'StockOnHand',
                                                  'StockOnOrder',
                                                  'Supplier',
                                                  'SupplierLeadTime',
                                                  'cumul_value',
                                                  'class_mention',
                                                  'std_week',
                                                  'review_time',
                                                  'holding_cost',
                                                  'aggregate_order_placement_cost',
                                                  'periods']] for _, group in dff.groupby('Supplier')]

# 创建一个新的DataFrame并将分组数据合并
df_group = pd.concat(groups)

# 将合并后的数据保存为CSV文件
df_group.to_csv('results.csv', index=False)

希望这可以帮助您将数据按供应商分组并保存到单独的CSV文件中。如果您需要更多帮助,请随时提出问题。

英文:

I have an issue that I cannot figure out although I read similar posts such as https://stackoverflow.com/questions/47602097/pandas-groupby-to-to-csv. It does not work for me. I am trying to write code to separate each group from a groupby object and save each group into its own excel spreadsheet.

I attached a toy example of the code that I did to get my groupby object on pandas with some columns.

Now, I need to save each group from this object into a separate csv file, or at least in a separate worksheet in excel.

dff = pd.DataFrame({'SKU': ['001', '002', '003'],
'revenue_contribution_in_percentage': [0.2, 0.5, 0.3],
'BuyPrice' : [2,3,4],
'SellPrice' : [5,6,6],
'margin' : [3,3,2],
'Avg_per_week' : [3,2,5],
'StockOnHand' : [4,10,20],
'StockOnOrder': [0,0,0],
'Supplier' : ['ABC', 'ABC', 'ABZ' ],
'SupplierLeadTime': [5,5,5],
'cumul_value':[0.4,0.6,1],
'class_mention':['A','A','B'],
'std_week':[1,2,1],
'review_time' : [2,2,2],
'holding_cost': [0.35, 0.35, 0.35],
'aggregate_order_placement_cost': [200, 230,210]
})

I have done the following to get a groupby supplier object

groups = [group.reset_index().set_index(['SKU'])[[
'revenue_contribution_in_percentage',
'BuyPrice',
'SellPrice',
'margin',
'Avg_per_week',
'StockOnHand',
'StockOnOrder',
'Supplier',
'SupplierLeadTime',
'cumul_value',
'class_mention',
'std_week',
'review_time',
'holding_cost',
'aggregate_order_placement_cost',
'periods']] for _, group in dff.groupby('Supplier')]
df_group = pd.DataFrame(groups).sum()
group_to_excel = df_group.to_csv('results.csv')

and the output that I would like to get is the folowing: two distinct datasets that can saved in csv format and look like this:

   SKU  revenue_contribution_in_percentage  BuyPrice  SellPrice  margin  \
0  001                                 0.2         2          5       3   
1  002                                 0.5         3          6       3   
Avg_per_week  StockOnHand  StockOnOrder Supplier  SupplierLeadTime  \
0             3            4             0      ABC                 5   
1             2           10             0      ABC                 5   
cumul_value class_mention  std_week  review_time  holding_cost  \
0          0.4             A         1            2          0.35   
1          0.6             A         2            2          0.35   
aggregate_order_placement_cost  
0                             200  
1                             230  

and

   SKU  revenue_contribution_in_percentage  BuyPrice  SellPrice  margin  \
0  003                                 0.3         4          6       2   
Avg_per_week  StockOnHand  StockOnOrder Supplier  SupplierLeadTime  \
0             5           20             0      ABZ                 5   
cumul_value class_mention  std_week  review_time  holding_cost  \
0            1             B         1            2          0.35   
aggregate_order_placement_cost  
0                             210  

At this point my code give one and only worksheet (horrendous worksheet) with pretty much nothing on it. I am not sure what is wrong at this point.
I would greatly appreciate some help on this one! thanks a lot!

答案1

得分: 1

不需要使用groupby,因为您没有在进行聚合操作。您真正想要的是按每个唯一的供应商切片dff,然后将它们导出到各自的文件中。请尝试以下代码:

cols = [
    'SKU',
    'revenue_contribution_in_percentage',
    'BuyPrice',
    'SellPrice',
    'margin',
    'Avg_per_week',
    'StockOnHand',
    'StockOnOrder',
    'Supplier',
    'SupplierLeadTime',
    'cumul_value',
    'class_mention',
    'std_week',
    'review_time',
    'holding_cost',
    'aggregate_order_placement_cost'
]

for supplier in dff['Supplier'].unique():
    sub_dff = dff[dff['Supplier'] == supplier][cols]
    sub_dff.to_csv(f'{supplier}_data.csv')
英文:

You don't need groupby as you are not aggregating anything. What you really want is to slice dff by each unique supplier and export them to their own files. Try this:

cols = [
'SKU',
'revenue_contribution_in_percentage',
'BuyPrice',
'SellPrice',
'margin',
'Avg_per_week',
'StockOnHand',
'StockOnOrder',
'Supplier',
'SupplierLeadTime',
'cumul_value',
'class_mention',
'std_week',
'review_time',
'holding_cost',
'aggregate_order_placement_cost'
]
for supplier in dff['Supplier'].unique():
sub_dff = dff[dff['Supplier'] == supplier][cols]
sub_dff.to_csv(f'{supplier}_data.csv')

huangapple
  • 本文由 发表于 2020年1月6日 21:36:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613095.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定