英文:
Pandas: Apply function to each group and store result in new column
问题
你可以尝试以下代码来实现你的需求:
import pandas as pd
import numpy as np
item_df = pd.DataFrame({'BarCode': ['12345678AAAA', '12345678BBBB', '12345678CCCC',
'12345678ABCD', '12345678EFGH', '12345678IJKL',
'67890123XXXX', '67890123YYYY', '67890123ZZZZ',
'67890123ABCD', '67890123EFGH', '67890123IJKL'],
'Extracted_Code': ['12345678','12345678', '12345678','12345678','12345678','12345678',
'67890123','67890123', '67890123','67890123','67890123','67890123'],
'Description': ['Fruits', 'Fruits', 'Fruits', 'Apples', 'Oranges', 'Mangoes',
'Snacks', 'Snacks', 'Snacks', 'Yoghurt', 'Cookies', 'Oats'],
'Category': ['H', 'H', 'H', 'M', 'T', 'S', 'H', 'H', 'H', 'M', 'M', 'F'],
'Code': ['0', '2', '3', '1', '2', '4', '0', '2', '3', '3', '4', '2'],
'Quantity': [99, 77, 10, 52, 11, 90, 99, 77, 10, 52, 11, 90],
'Price': [12.0, 10.5, 11.0, 15.6, 12.9, 67.0, 12.0, 10.5, 11.0, 15.6, 12.9, 67.0]})
item_df = item_df.sort_values(by=['Extracted_Code', 'Category', 'Code'])
item_df['Combined'] = np.NaN
def create_combined(row, group):
if row['Category'] == 'H':
return np.NaN
else:
group_h = group[group['Category'] == 'H']
group_h = group_h[group_h['Code'] <= row['Code']]
return group_h.to_dict('records')
item_df['Combined'] = item_df.groupby(['Extracted_Code']).apply(lambda group: group.apply(lambda row: create_combined(row, group), axis=1)).reset_index(drop=True)
print(item_df)
这段代码将为每个组应用条件并创建Combined列。希望这可以满足你的需求。
英文:
I have an item dataframe such as:
item_df = pd.DataFrame({'BarCode': ['12345678AAAA', '12345678BBBB', '12345678CCCC',
'12345678ABCD', '12345678EFGH', '12345678IJKL',
'67890123XXXX', '67890123YYYY', '67890123ZZZZ',
'67890123ABCD', '67890123EFGH', '67890123IJKL'],
'Extracted_Code': ['12345678','12345678', '12345678','12345678','12345678','12345678',
'67890123','67890123', '67890123','67890123', '67890123','67890123'],
'Description': ['Fruits', 'Fruits', 'Fruits', 'Apples', 'Oranges', 'Mangoes',
'Snacks', 'Snacks', 'Snacks', 'Yoghurt', 'Cookies', 'Oats'],
'Category': ['H', 'H', 'H', 'M', 'T', 'S', 'H', 'H', 'H', 'M', 'M', 'F'],
'Code': ['0', '2', '3', '1', '2', '4', '0', '2', '3', '3', '4', '2'],
'Quantity': [99, 77, 10, 52, 11, 90, 99, 77, 10, 52, 11, 90],
'Price': [12.0, 10.5, 11.0, 15.6, 12.9, 67.0, 12.0, 10.5, 11.0, 15.6, 12.9, 67.0]})
item_df = item_df.sort_values(by=['Extracted_Code', 'Category', 'Code'])
item_df['Combined'] = np.NaN
What I am trying to achieve is a bit tricky. I have to perform groupby on ['Extracted_Code']
and for each group, create a new column Combined
. The column Combined
will have value based on:
- For rows with Category='H', Combined will have NaN values.
- For rows with Category other than 'H', suppose if we take a row with Category='M', then Combined column of that particular row will have a list of row jsons that has Category='H' in the same group and whose Code is less than or equal to Code of that particular row.
My desired result is:
BarCode Extracted_Code Description Category Code Quantity Price Combined
0 12345678AAAA 12345678 Fruits H 0 99 12.0 NaN
1 12345678BBBB 12345678 Fruits H 2 77 10.5 NaN
2 12345678CCCC 12345678 Fruits H 3 10 11.0 NaN
3 12345678ABCD 12345678 Apples M 1 52 15.6 [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0}]
4 12345678IJKL 12345678 Mangoes S 4 90 67.0 [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0},
{'BarCode': '12345678BBBB', 'Description': 'Fruits', 'Category': 'H', 'Code': '2', 'Quantity': 77, 'Price': 10.5},
{'BarCode': '12345678CCCC', 'Description': 'Fruits', 'Category': 'H', 'Code': '3', 'Quantity': 10, 'Price': 11.0}]
5 12345678EFGH 12345678 Oranges T 2 11 12.9 [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0},
{'BarCode': '12345678BBBB', 'Description': 'Fruits', 'Category': 'H', 'Code': '2', 'Quantity': 77, 'Price': 10.5}]
6 67890123IJKL 67890123 Oats F 2 90 67.0 [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0},
{'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': '2', 'Quantity': 77, 'Price': 10.5}]
7 67890123XXXX 67890123 Snacks H 0 99 12.0 NaN
8 67890123YYYY 67890123 Snacks H 2 77 10.5 NaN
9 67890123ZZZZ 67890123 Snacks H 3 10 11.0 NaN
10 67890123ABCD 67890123 Yoghurt M 3 52 15.6 [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0},
{'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': '2', 'Quantity': 77, 'Price': 10.5},
{'BarCode': '67890123ZZZZ', 'Description': 'Snacks', 'Category': 'H', 'Code': '3', 'Quantity': 10, 'Price': 11.0}]
11 67890123EFGH 67890123 Cookies M 4 11 12.9 [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': '0', 'Quantity': 99, 'Price': 12.0},
{'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': '2', 'Quantity': 77, 'Price': 10.5},
{'BarCode': '67890123ZZZZ', 'Description': 'Snacks', 'Category': 'H', 'Code': '3', 'Quantity': 10, 'Price': 11.0}]
This is what I have done to get list of row jsons:
item_df.groupby(['Extracted_Code', 'Category', 'Code']).apply(lambda x: x.to_dict('records')).reset_index(name='Combined')
But I am confused on how to apply the condition to each group without losing any columns in the end result.
答案1
得分: 2
你可以执行自我合并,并筛选出符合条件的行:
```python
m = df.reset_index().merge(df, on="Extracted_Code", suffixes=("_x", ""))
m = m[(m["Category"] == "H") & (m["Code"] <= m["Code_x"]) & (m["Category_x"] != "H")]
# .reset_index() 允许你执行 .groupby("index"),然后可以将 .to_dict("records") 添加到:
combined = m.groupby("index").apply(lambda group:
group[["BarCode", "Description", "Category", "Code", "Quantity", "Price"]].to_dict("records")
).rename("Combined")
# 然后你可以执行 .join:
df.join(combined)
英文:
You could perform a self-merge, and filter out rows that match your criteria:
m = df.reset_index().merge(df, on="Extracted_Code", suffixes=("_x", ""))
m = m[ (m["Category"] == "H") & (m["Code"] <= m["Code_x"]) & (m["Category_x"] != "H") ]
index BarCode_x Extracted_Code Description_x Category_x Code_x Quantity_x Price_x BarCode Description Category Code Quantity Price
18 3 12345678ABCD 12345678 Apples M 1 52 15.6 12345678AAAA Fruits H 0 99 12.0
24 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678AAAA Fruits H 0 99 12.0
25 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678BBBB Fruits H 2 77 10.5
26 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678CCCC Fruits H 3 10 11.0
30 4 12345678EFGH 12345678 Oranges T 2 11 12.9 12345678AAAA Fruits H 0 99 12.0
31 4 12345678EFGH 12345678 Oranges T 2 11 12.9 12345678BBBB Fruits H 2 77 10.5
37 11 67890123IJKL 67890123 Oats F 2 90 67.0 67890123XXXX Snacks H 0 99 12.0
38 11 67890123IJKL 67890123 Oats F 2 90 67.0 67890123YYYY Snacks H 2 77 10.5
61 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123XXXX Snacks H 0 99 12.0
62 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123YYYY Snacks H 2 77 10.5
63 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123ZZZZ Snacks H 3 10 11.0
67 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123XXXX Snacks H 0 99 12.0
68 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123YYYY Snacks H 2 77 10.5
69 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123ZZZZ Snacks H 3 10 11.0
The .reset_index()
allows you to then .groupby("index")
which you could then add your .to_dict("records")
to:
combined = m.groupby("index").apply(lambda group:
group[["BarCode", "Description", "Category",
"Code", "Quantity", "Price"
]].to_dict("records")
).rename("Combined")
Which you can then .join
:
>>> df.join(combined)
BarCode Extracted_Code Description Category Code Quantity Price Combined
0 12345678AAAA 12345678 Fruits H 0 99 12.0 NaN
1 12345678BBBB 12345678 Fruits H 2 77 10.5 NaN
2 12345678CCCC 12345678 Fruits H 3 10 11.0 NaN
3 12345678ABCD 12345678 Apples M 1 52 15.6 [{'BarCode': '12345678AAAA', 'Description': 'F...
5 12345678IJKL 12345678 Mangoes S 4 90 67.0 [{'BarCode': '12345678AAAA', 'Description': 'F...
4 12345678EFGH 12345678 Oranges T 2 11 12.9 [{'BarCode': '12345678AAAA', 'Description': 'F...
11 67890123IJKL 67890123 Oats F 2 90 67.0 [{'BarCode': '67890123XXXX', 'Description': 'S...
6 67890123XXXX 67890123 Snacks H 0 99 12.0 NaN
7 67890123YYYY 67890123 Snacks H 2 77 10.5 NaN
8 67890123ZZZZ 67890123 Snacks H 3 10 11.0 NaN
9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 [{'BarCode': '67890123XXXX', 'Description': 'S...
10 67890123EFGH 67890123 Cookies M 4 11 12.9 [{'BarCode': '67890123XXXX', 'Description': 'S...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论