在pandas计算中出现错误。

huangapple go评论137阅读模式
英文:

Getting error in the calculation in pandas

问题

I am getting calculation errors while writing the groupby function with aggregate function in a loop. But, outside the loop everything is okay. Getting the results correctly...!

  1. import pandas as pd
  2. import numpy as np
  3. # Example DataFrame
  4. df = pd.DataFrame({
  5. 'GroupA': ['A', 'A', 'B', 'B', 'B', 'C'],
  6. 'GroupB': ['X', 'Y', 'Z', 'X', 'Y', 'X'],
  7. 'POP': [10, 20, 30, 40, 50, 60],
  8. 'LF': [1, 2, 3, 4, 5, 6],
  9. 'WRK': [100, 200, 300, 400, 500, 600]
  10. })
  11. groupby_cols = [[], ['GroupA'], ['GroupB'], ['GroupA', 'GroupB']]
  12. def test(df, gby):
  13. # Perform groupby and aggregation
  14. groupby_columns = groupby_cols[gby]
  15. w2 = df.groupby(groupby_columns).agg(
  16. pophat=('POP', lambda x: np.sum(x * df['CMULT'])),
  17. lfhat=('LF', lambda x: np.sum(x * df['CMULT'])),
  18. wrkhat=('WRK', lambda x: np.sum(x * df['CMULT']))
  19. ).reset_index()
  20. # Calculate CMULT column based on the current groupby configuration
  21. if len(groupby_columns) == 1:
  22. w2['CMULT'] = w2[groupby_columns[0]].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  23. else:
  24. w2['CMULT'] = w2['GroupA'].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  25. print(w2, groupby_columns)
  26. for i in range(len(groupby_cols)):
  27. if i == 0:
  28. df['CMULT'] = df['GroupA'].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  29. df['POP'] = pd.to_numeric(df['POP']) * df['CMULT']
  30. df['LF'] = pd.to_numeric(df['LF']) * df['CMULT']
  31. df['WRK'] = pd.to_numeric(df['WRK']) * df['CMULT']
  32. df['no_sam'] = df.shape[0]
  33. agg_dict = {'POP': 'sum', 'LF': 'sum', 'WRK': 'sum', 'no_sam': 'count'}
  34. # Group the data by the current groupby configuration and calculate the aggregates
  35. w2 = df.agg(agg_dict).to_frame().T
  36. print(w2, groupby_cols[i])
  37. else:
  38. test(df, i)

This is the code, in which I am getting calculation errors. The results are -

  1. POP LF WRK no_sam
  2. 0 63.0 6.3 630.0 6.0 []
  3. GroupA pophat lfhat wrkhat CMULT
  4. 0 A 7.5 0.75 75.0 0.5
  5. 1 B 10.8 1.08 108.0 0.3
  6. 2 C 2.4 0.24 24.0 0.2 ['GroupA']
  7. GroupB pophat lfhat wrkhat CMULT
  8. 0 X 8.5 0.85 85.0 NaN
  9. 1 Y 9.5 0.95 95.0 NaN
  10. 2 Z 2.7 0.27 27.0 NaN ['GroupB']
  11. GroupA GroupB pophat lfhat wrkhat CMULT
  12. 0 A X 2.5 0.25 25.0 0.5
  13. 1 A Y 5.0 0.50 50.0 0.5
  14. 2 B X 3.6 0.36 36.0 0.3
  15. 3 B Y 4.5 0.45 45.0 0.3
  16. 4 B Z 2.7 0.27 27.0 0.3
  17. 5 C X 2.4 0.24 24.0 0.2 ['GroupA', 'GroupB']

But, outside the loop the results are - (You can verify it by changing the index of groupby_cols[NNNNNNNNN])

  1. import pandas as pd
  2. import numpy as np
  3. # Example DataFrame
  4. df = pd.DataFrame({
  5. 'GroupA': ['A', 'A', 'B', 'B', 'B', 'C'],
  6. 'GroupB': ['X', 'Y', 'Z', 'X', 'Y', 'X'],
  7. 'POP': [10, 20, 30, 40, 50, 60],
  8. 'LF': [1, 2, 3, 4, 5, 6],
  9. 'WRK': [100, 200, 300, 400, 500, 600]
  10. })
  11. groupby_cols = [[], ['GroupA'], ['GroupB'], ['GroupA', 'GroupB']]
  12. df['CMULT'] = df.groupby(groupby_cols[i])['GroupA'].transform(lambda x: x.map({'A': 0.5, 'B': 0.3, 'C': 0.2}))
  13. # Perform groupby and aggregation based on the current groupby configuration
  14. w2 = df.groupby(groupby_cols[3]).agg(
  15. pophat=('POP', lambda x: np.sum(x * df['CMULT'])),
  16. lfhat=('LF', lambda x: np.sum(x * df['CMULT'])),
  17. wrkhat=('WRK', lambda x: np.sum(x * df['CMULT']))
  18. ).reset_index()
  19. print(w2)

The results outside the loop are -

  1. POP LF WRK no_sam
  2. 0 63.0 6.3 630.0 6.0
  3. GroupA pophat lfhat wrkhat
  4. 0 A 15.0 1.5 150.0
  5. 1 B 36.0 3.6 360.0
  6. 2 C 12.0 1.2 120.0
  7. GroupB pophat lfhat wrkhat
  8. 0 X 29.0 2.9 290.0
  9. 1 Y 25.0 2.5 250.0
  10. 2 Z 9.0
  11. <details>
  12. <summary>英文:</summary>
  13. I am getting calculation errors while writing the groupby function with aggregate function in a loop.But, outside the loop everything is okay. Getting the results correctly...!
  14. import pandas as pd
  15. import numpy as np
  16. # Example DataFrame
  17. df = pd.DataFrame({
  18. &#39;GroupA&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;],
  19. &#39;GroupB&#39;: [&#39;X&#39;, &#39;Y&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;X&#39;],
  20. &#39;POP&#39;: [10, 20, 30, 40, 50, 60],
  21. &#39;LF&#39;: [1, 2, 3, 4, 5, 6],
  22. &#39;WRK&#39;: [100, 200, 300, 400, 500, 600]
  23. })
  24. groupby_cols = [[], [&#39;GroupA&#39;], [&#39;GroupB&#39;], [&#39;GroupA&#39;, &#39;GroupB&#39;]]
  25. def test(df, gby):
  26. # Perform groupby and aggregation
  27. groupby_columns = groupby_cols[gby]
  28. w2 = df.groupby(groupby_columns).agg(
  29. pophat=(&#39;POP&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;])),
  30. lfhat=(&#39;LF&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;])),
  31. wrkhat=(&#39;WRK&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;]))
  32. ).reset_index()
  33. # Calculate CMULT column based on the current groupby configuration
  34. if len(groupby_columns) == 1:
  35. w2[&#39;CMULT&#39;] = w2[groupby_columns[0]].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  36. else:
  37. w2[&#39;CMULT&#39;] = w2[&#39;GroupA&#39;].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  38. print(w2, groupby_columns)
  39. for i in range(len(groupby_cols)):
  40. if i == 0:
  41. df[&#39;CMULT&#39;] = df[&#39;GroupA&#39;].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  42. df[&#39;POP&#39;] = pd.to_numeric(df[&#39;POP&#39;]) * df[&#39;CMULT&#39;]
  43. df[&#39;LF&#39;] = pd.to_numeric(df[&#39;LF&#39;]) * df[&#39;CMULT&#39;]
  44. df[&#39;WRK&#39;] = pd.to_numeric(df[&#39;WRK&#39;]) * df[&#39;CMULT&#39;]
  45. df[&#39;no_sam&#39;] = df.shape[0]
  46. agg_dict = {&#39;POP&#39;: &#39;sum&#39;, &#39;LF&#39;: &#39;sum&#39;, &#39;WRK&#39;: &#39;sum&#39;, &#39;no_sam&#39;: &#39;count&#39;}
  47. # Group the data by the current groupby configuration and calculate the aggregates
  48. w2 = df.agg(agg_dict).to_frame().T
  49. print(w2, groupby_cols[i])
  50. else:
  51. test(df, i)
  52. This is the code , in which I am getting calculation errors. The results are -
  53. POP LF WRK no_sam
  54. 0 63.0 6.3 630.0 6.0 []
  55. GroupA pophat lfhat wrkhat CMULT
  56. 0 A 7.5 0.75 75.0 0.5
  57. 1 B 10.8 1.08 108.0 0.3
  58. 2 C 2.4 0.24 24.0 0.2 [&#39;GroupA&#39;]
  59. GroupB pophat lfhat wrkhat CMULT
  60. 0 X 8.5 0.85 85.0 NaN
  61. 1 Y 9.5 0.95 95.0 NaN
  62. 2 Z 2.7 0.27 27.0 NaN [&#39;GroupB&#39;]
  63. GroupA GroupB pophat lfhat wrkhat CMULT
  64. 0 A X 2.5 0.25 25.0 0.5
  65. 1 A Y 5.0 0.50 50.0 0.5
  66. 2 B X 3.6 0.36 36.0 0.3
  67. 3 B Y 4.5 0.45 45.0 0.3
  68. 4 B Z 2.7 0.27 27.0 0.3
  69. 5 C X 2.4 0.24 24.0 0.2 [&#39;GroupA&#39;, &#39;GroupB&#39;]
  70. But, outside the loop the results are - (You can verify it by changing the index of groupby_cols[NNNNNNNNN])
  71. import pandas as pd
  72. import numpy as np
  73. # Example DataFrame
  74. df = pd.DataFrame({
  75. &#39;GroupA&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;],
  76. &#39;GroupB&#39;: [&#39;X&#39;, &#39;Y&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;X&#39;],
  77. &#39;POP&#39;: [10, 20, 30, 40, 50, 60],
  78. &#39;LF&#39;: [1, 2, 3, 4, 5, 6],
  79. &#39;WRK&#39;: [100, 200, 300, 400, 500, 600]
  80. })
  81. groupby_cols = [[], [&#39;GroupA&#39;], [&#39;GroupB&#39;], [&#39;GroupA&#39;, &#39;GroupB&#39;]]
  82. df[&#39;CMULT&#39;] = df.groupby(groupby_cols[i])[&#39;GroupA&#39;].transform(lambda x: x.map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2}))
  83. # Perform groupby and aggregation based on the current groupby configuration
  84. w2 = df.groupby(groupby_cols[3]).agg(
  85. pophat=(&#39;POP&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;])),
  86. lfhat=(&#39;LF&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;])),
  87. wrkhat=(&#39;WRK&#39;, lambda x: np.sum(x * df[&#39;CMULT&#39;]))
  88. ).reset_index()
  89. print(w2)
  90. POP LF WRK no_sam
  91. 0 63.0 6.3 630.0 6.0
  92. GroupA pophat lfhat wrkhat
  93. 0 A 15.0 1.5 150.0
  94. 1 B 36.0 3.6 360.0
  95. 2 C 12.0 1.2 120.0
  96. GroupB pophat lfhat wrkhat
  97. 0 X 29.0 2.9 290.0
  98. 1 Y 25.0 2.5 250.0
  99. 2 Z 9.0 0.9 90.0
  100. GroupA GroupB pophat lfhat wrkhat
  101. 0 A X 5.0 0.5 50.0
  102. 1 A Y 10.0 1.0 100.0
  103. 2 B X 12.0 1.2 120.0
  104. 3 B Y 15.0 1.5 150.0
  105. 4 B Z 9.0 0.9 90.0
  106. 5 C X 12.0 1.2 120.0
  107. So, am I not understanding the groupby and aggregation properly because It is not working in the loop or The functions work differently in the loop. I have doubt about it, how can It be possible???
  108. </details>
  109. # 答案1
  110. **得分**: 1
  111. Here are the translated parts of the code you provided:
  112. ```python
  113. IIUC multiple columns before loop and then aggregate `sum` only:
  114. groupby_cols = [[], ['GroupA'], ['GroupB'], ['GroupA', 'GroupB']]
  115. def test(df, gby):
  116. groupby_columns = groupby_cols[gby]
  117. w2 = df.groupby(groupby_columns).agg(
  118. pophat=('POP', 'sum'),
  119. lfhat=('LF', 'sum'),
  120. wrkhat=('WRK', 'sum')
  121. ).reset_index()
  122. if len(groupby_columns) == 1:
  123. w2['CMULT'] = w2[groupby_columns[0]].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  124. else:
  125. w2['CMULT'] = w2['GroupA'].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  126. print(w2, groupby_columns)
  127. df['CMULT'] = df['GroupA'].map({'A': 0.5, 'B': 0.3, 'C': 0.2})
  128. df['POP'] = pd.to_numeric(df['POP']) * df['CMULT']
  129. df['LF'] = pd.to_numeric(df['LF']) * df['CMULT']
  130. df['WRK'] = pd.to_numeric(df['WRK']) * df['CMULT']
  131. df['no_sam'] = df.shape[0]
  132. for i in range(len(groupby_cols)):
  133. if i == 0:
  134. agg_dict = {'POP': 'sum', 'LF': 'sum', 'WRK': 'sum', 'no_sam': 'count'}
  135. w2 = df.agg(agg_dict).to_frame().T
  136. print(w2, groupby_cols[i])
  137. else:
  138. test(df, i)

Please note that I've only translated the code, and there are no additional comments or explanations.

英文:

IIUC multiple columns before loop and then aggregate sum only:

  1. groupby_cols = [[], [&#39;GroupA&#39;], [&#39;GroupB&#39;], [&#39;GroupA&#39;, &#39;GroupB&#39;]]
  2. def test(df, gby):
  3. # print (df)
  4. # Perform groupby and aggregation
  5. groupby_columns = groupby_cols[gby]
  6. w2 = df.groupby(groupby_columns).agg(
  7. pophat=(&#39;POP&#39;, &#39;sum&#39;),
  8. lfhat=(&#39;LF&#39;, &#39;sum&#39;),
  9. wrkhat=(&#39;WRK&#39;, &#39;sum&#39;)
  10. ).reset_index()
  11. # print (w2)
  12. # Calculate CMULT column based on the current groupby configuration
  13. if len(groupby_columns) == 1:
  14. w2[&#39;CMULT&#39;] = w2[groupby_columns[0]].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  15. else:
  16. w2[&#39;CMULT&#39;] = w2[&#39;GroupA&#39;].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  17. print(w2, groupby_columns)

  1. df[&#39;CMULT&#39;] = df[&#39;GroupA&#39;].map({&#39;A&#39;: 0.5, &#39;B&#39;: 0.3, &#39;C&#39;: 0.2})
  2. df[&#39;POP&#39;] = pd.to_numeric(df[&#39;POP&#39;]) * df[&#39;CMULT&#39;]
  3. df[&#39;LF&#39;] = pd.to_numeric(df[&#39;LF&#39;]) * df[&#39;CMULT&#39;]
  4. df[&#39;WRK&#39;] = pd.to_numeric(df[&#39;WRK&#39;]) * df[&#39;CMULT&#39;]
  5. df[&#39;no_sam&#39;] = df.shape[0]
  6. for i in range(len(groupby_cols)):
  7. if i == 0:
  8. agg_dict = {&#39;POP&#39;: &#39;sum&#39;, &#39;LF&#39;: &#39;sum&#39;, &#39;WRK&#39;: &#39;sum&#39;, &#39;no_sam&#39;: &#39;count&#39;}
  9. # Group the data by the current groupby configuration and calculate the aggregates
  10. w2 = df.agg(agg_dict).to_frame().T
  11. print(w2, groupby_cols[i])
  12. else:
  13. test(df, i)

huangapple
  • 本文由 发表于 2023年6月1日 13:33:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378911.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定