Pandas:对每个分组应用函数并将结果存储在新列中

huangapple go评论102阅读模式
英文:

Pandas: Apply function to each group and store result in new column

问题

你可以尝试以下代码来实现你的需求:

  1. import pandas as pd
  2. import numpy as np
  3. item_df = pd.DataFrame({'BarCode': ['12345678AAAA', '12345678BBBB', '12345678CCCC',
  4. '12345678ABCD', '12345678EFGH', '12345678IJKL',
  5. '67890123XXXX', '67890123YYYY', '67890123ZZZZ',
  6. '67890123ABCD', '67890123EFGH', '67890123IJKL'],
  7. 'Extracted_Code': ['12345678','12345678', '12345678','12345678','12345678','12345678',
  8. '67890123','67890123', '67890123','67890123','67890123','67890123'],
  9. 'Description': ['Fruits', 'Fruits', 'Fruits', 'Apples', 'Oranges', 'Mangoes',
  10. 'Snacks', 'Snacks', 'Snacks', 'Yoghurt', 'Cookies', 'Oats'],
  11. 'Category': ['H', 'H', 'H', 'M', 'T', 'S', 'H', 'H', 'H', 'M', 'M', 'F'],
  12. 'Code': ['0', '2', '3', '1', '2', '4', '0', '2', '3', '3', '4', '2'],
  13. 'Quantity': [99, 77, 10, 52, 11, 90, 99, 77, 10, 52, 11, 90],
  14. 'Price': [12.0, 10.5, 11.0, 15.6, 12.9, 67.0, 12.0, 10.5, 11.0, 15.6, 12.9, 67.0]})
  15. item_df = item_df.sort_values(by=['Extracted_Code', 'Category', 'Code'])
  16. item_df['Combined'] = np.NaN
  17. def create_combined(row, group):
  18. if row['Category'] == 'H':
  19. return np.NaN
  20. else:
  21. group_h = group[group['Category'] == 'H']
  22. group_h = group_h[group_h['Code'] <= row['Code']]
  23. return group_h.to_dict('records')
  24. item_df['Combined'] = item_df.groupby(['Extracted_Code']).apply(lambda group: group.apply(lambda row: create_combined(row, group), axis=1)).reset_index(drop=True)
  25. print(item_df)

这段代码将为每个组应用条件并创建Combined列。希望这可以满足你的需求。

英文:

I have an item dataframe such as:

  1. item_df = pd.DataFrame({&#39;BarCode&#39;: [&#39;12345678AAAA&#39;, &#39;12345678BBBB&#39;, &#39;12345678CCCC&#39;,
  2. &#39;12345678ABCD&#39;, &#39;12345678EFGH&#39;, &#39;12345678IJKL&#39;,
  3. &#39;67890123XXXX&#39;, &#39;67890123YYYY&#39;, &#39;67890123ZZZZ&#39;,
  4. &#39;67890123ABCD&#39;, &#39;67890123EFGH&#39;, &#39;67890123IJKL&#39;],
  5. &#39;Extracted_Code&#39;: [&#39;12345678&#39;,&#39;12345678&#39;, &#39;12345678&#39;,&#39;12345678&#39;,&#39;12345678&#39;,&#39;12345678&#39;,
  6. &#39;67890123&#39;,&#39;67890123&#39;, &#39;67890123&#39;,&#39;67890123&#39;, &#39;67890123&#39;,&#39;67890123&#39;],
  7. &#39;Description&#39;: [&#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Apples&#39;, &#39;Oranges&#39;, &#39;Mangoes&#39;,
  8. &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Yoghurt&#39;, &#39;Cookies&#39;, &#39;Oats&#39;],
  9. &#39;Category&#39;: [&#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;M&#39;, &#39;T&#39;, &#39;S&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;M&#39;, &#39;M&#39;, &#39;F&#39;],
  10. &#39;Code&#39;: [&#39;0&#39;, &#39;2&#39;, &#39;3&#39;, &#39;1&#39;, &#39;2&#39;, &#39;4&#39;, &#39;0&#39;, &#39;2&#39;, &#39;3&#39;, &#39;3&#39;, &#39;4&#39;, &#39;2&#39;],
  11. &#39;Quantity&#39;: [99, 77, 10, 52, 11, 90, 99, 77, 10, 52, 11, 90],
  12. &#39;Price&#39;: [12.0, 10.5, 11.0, 15.6, 12.9, 67.0, 12.0, 10.5, 11.0, 15.6, 12.9, 67.0]})
  13. item_df = item_df.sort_values(by=[&#39;Extracted_Code&#39;, &#39;Category&#39;, &#39;Code&#39;])
  14. item_df[&#39;Combined&#39;] = np.NaN

What I am trying to achieve is a bit tricky. I have to perform groupby on [&#39;Extracted_Code&#39;] and for each group, create a new column Combined. The column Combined will have value based on:

  1. For rows with Category='H', Combined will have NaN values.
  2. For rows with Category other than 'H', suppose if we take a row with Category='M', then Combined column of that particular row will have a list of row jsons that has Category='H' in the same group and whose Code is less than or equal to Code of that particular row.

My desired result is:

  1. BarCode Extracted_Code Description Category Code Quantity Price Combined
  2. 0 12345678AAAA 12345678 Fruits H 0 99 12.0 NaN
  3. 1 12345678BBBB 12345678 Fruits H 2 77 10.5 NaN
  4. 2 12345678CCCC 12345678 Fruits H 3 10 11.0 NaN
  5. 3 12345678ABCD 12345678 Apples M 1 52 15.6 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}]
  6. 4 12345678IJKL 12345678 Mangoes S 4 90 67.0 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0},
  7. {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;2&#39;, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5},
  8. {&#39;BarCode&#39;: &#39;12345678CCCC&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;3&#39;, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
  9. 5 12345678EFGH 12345678 Oranges T 2 11 12.9 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0},
  10. {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;2&#39;, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]
  11. 6 67890123IJKL 67890123 Oats F 2 90 67.0 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0},
  12. {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;2&#39;, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]
  13. 7 67890123XXXX 67890123 Snacks H 0 99 12.0 NaN
  14. 8 67890123YYYY 67890123 Snacks H 2 77 10.5 NaN
  15. 9 67890123ZZZZ 67890123 Snacks H 3 10 11.0 NaN
  16. 10 67890123ABCD 67890123 Yoghurt M 3 52 15.6 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0},
  17. {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;2&#39;, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5},
  18. {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;3&#39;, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
  19. 11 67890123EFGH 67890123 Cookies M 4 11 12.9 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;0&#39;, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0},
  20. {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;2&#39;, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5},
  21. {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: &#39;3&#39;, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]

This is what I have done to get list of row jsons:

  1. item_df.groupby([&#39;Extracted_Code&#39;, &#39;Category&#39;, &#39;Code&#39;]).apply(lambda x: x.to_dict(&#39;records&#39;)).reset_index(name=&#39;Combined&#39;)

But I am confused on how to apply the condition to each group without losing any columns in the end result.

答案1

得分: 2

  1. 你可以执行自我合并,并筛选出符合条件的行:
  2. ```python
  3. m = df.reset_index().merge(df, on="Extracted_Code", suffixes=("_x", ""))
  4. m = m[(m["Category"] == "H") & (m["Code"] <= m["Code_x"]) & (m["Category_x"] != "H")]
  1. # .reset_index() 允许你执行 .groupby("index"),然后可以将 .to_dict("records") 添加到:
  2. combined = m.groupby("index").apply(lambda group:
  3. group[["BarCode", "Description", "Category", "Code", "Quantity", "Price"]].to_dict("records")
  4. ).rename("Combined")
  1. # 然后你可以执行 .join:
  2. df.join(combined)
英文:

You could perform a self-merge, and filter out rows that match your criteria:

  1. m = df.reset_index().merge(df, on=&quot;Extracted_Code&quot;, suffixes=(&quot;_x&quot;, &quot;&quot;))
  2. m = m[ (m[&quot;Category&quot;] == &quot;H&quot;) &amp; (m[&quot;Code&quot;] &lt;= m[&quot;Code_x&quot;]) &amp; (m[&quot;Category_x&quot;] != &quot;H&quot;) ]
  1. index BarCode_x Extracted_Code Description_x Category_x Code_x Quantity_x Price_x BarCode Description Category Code Quantity Price
  2. 18 3 12345678ABCD 12345678 Apples M 1 52 15.6 12345678AAAA Fruits H 0 99 12.0
  3. 24 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678AAAA Fruits H 0 99 12.0
  4. 25 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678BBBB Fruits H 2 77 10.5
  5. 26 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 12345678CCCC Fruits H 3 10 11.0
  6. 30 4 12345678EFGH 12345678 Oranges T 2 11 12.9 12345678AAAA Fruits H 0 99 12.0
  7. 31 4 12345678EFGH 12345678 Oranges T 2 11 12.9 12345678BBBB Fruits H 2 77 10.5
  8. 37 11 67890123IJKL 67890123 Oats F 2 90 67.0 67890123XXXX Snacks H 0 99 12.0
  9. 38 11 67890123IJKL 67890123 Oats F 2 90 67.0 67890123YYYY Snacks H 2 77 10.5
  10. 61 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123XXXX Snacks H 0 99 12.0
  11. 62 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123YYYY Snacks H 2 77 10.5
  12. 63 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 67890123ZZZZ Snacks H 3 10 11.0
  13. 67 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123XXXX Snacks H 0 99 12.0
  14. 68 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123YYYY Snacks H 2 77 10.5
  15. 69 10 67890123EFGH 67890123 Cookies M 4 11 12.9 67890123ZZZZ Snacks H 3 10 11.0

The .reset_index() allows you to then .groupby(&quot;index&quot;) which you could then add your .to_dict(&quot;records&quot;) to:

  1. combined = m.groupby(&quot;index&quot;).apply(lambda group:
  2. group[[&quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;,
  3. &quot;Code&quot;, &quot;Quantity&quot;, &quot;Price&quot;
  4. ]].to_dict(&quot;records&quot;)
  5. ).rename(&quot;Combined&quot;)

Which you can then .join:

  1. &gt;&gt;&gt; df.join(combined)
  2. BarCode Extracted_Code Description Category Code Quantity Price Combined
  3. 0 12345678AAAA 12345678 Fruits H 0 99 12.0 NaN
  4. 1 12345678BBBB 12345678 Fruits H 2 77 10.5 NaN
  5. 2 12345678CCCC 12345678 Fruits H 3 10 11.0 NaN
  6. 3 12345678ABCD 12345678 Apples M 1 52 15.6 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;F...
  7. 5 12345678IJKL 12345678 Mangoes S 4 90 67.0 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;F...
  8. 4 12345678EFGH 12345678 Oranges T 2 11 12.9 [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;F...
  9. 11 67890123IJKL 67890123 Oats F 2 90 67.0 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;S...
  10. 6 67890123XXXX 67890123 Snacks H 0 99 12.0 NaN
  11. 7 67890123YYYY 67890123 Snacks H 2 77 10.5 NaN
  12. 8 67890123ZZZZ 67890123 Snacks H 3 10 11.0 NaN
  13. 9 67890123ABCD 67890123 Yoghurt M 3 52 15.6 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;S...
  14. 10 67890123EFGH 67890123 Cookies M 4 11 12.9 [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;S...

huangapple
  • 本文由 发表于 2023年5月26日 00:02:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334272.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定