Python Pandas DataFrame,添加列并标记已调整和插入的行。

huangapple go评论96阅读模式
英文:

python pandas dataframe, add column and tag an adjusted and inserted row

问题

以下是代码的翻译部分:

  1. 我有以下的数据框
  2. import pandas as pd
  3. df = pd.DataFrame()
  4. df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
  5. df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake)
  6. df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
  7. df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING)
  8. print(df)
  9. 如果某个人的工时累积达到38小时将对达到38小时的单元格进行调整插入一行重复的数据行并将工时余额添加到以下行以下代码执行此操作可以看到原始数据和调整后数据之间的差异
  10. s = df.groupby('number')['hours'].cumsum()
  11. m = s.gt(38)
  12. idx = m.groupby(df['number']).idxmax()
  13. delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
  14. out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
  15. out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
  16. out.loc[out.index.duplicated(), 'hours'] -= delta
  17. print(out)
  18. 对于被调整的行和被插入的行我需要通过插入另一列并添加字符例如 'x'来标记它们以突出显示已调整和已插入的行
  19. <details>
  20. <summary>英文:</summary>
  21. I have the following data-frame
  22. import pandas as pd
  23. df = pd.DataFrame()
  24. df[&#39;number&#39;] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
  25. df[&#39;name&#39;]=(&#39;Alex&#39;,&#39;Alex&#39;,&#39;Alex&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Abhishek&#39;,&#39;Abhishek&#39;,&#39;Abhishek&#39;,&#39;Blake&#39;,&#39;Blake&#39;,&#39;Blake&#39;,&#39;Blake&#39;)
  26. df[&#39;hours&#39;]=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
  27. df[&#39;loc&#39;]=(&#39;Nar&#39;,&#39;SCC&#39;,&#39;RSL&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNI&#39;,&#39;UNI&#39;,&#39;UNI&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;)
  28. print(df)
  29. If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.
  30. s = df.groupby(&#39;number&#39;)[&#39;hours&#39;].cumsum()
  31. m = s.gt(38)
  32. idx = m.groupby(df[&#39;number&#39;]).idxmax()
  33. delta = s.groupby(df[&#39;number&#39;]).shift().rsub(38).fillna(s)
  34. out = df.loc[df.index.repeat((df.index.isin(idx)&amp;m)+1)]
  35. out.loc[out.index.duplicated(keep=&#39;last&#39;), &#39;hours&#39;] = delta
  36. out.loc[out.index.duplicated(), &#39;hours&#39;] -= delta
  37. print(out)
  38. For the row that got adjusted and the row that got inserted I need to tag them via inserting another column and adding a character such as an &#39;x&#39; to highlight the adjusted and inserted row
  39. </details>
  40. # 答案1
  41. **得分**: 1
  42. 当你复制索引时可以使用`out.index.duplicated`作为布尔掩码
  43. ```python
  44. # or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
  45. out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
  46. print(out)
  47. # 输出
  48. number name hours loc mod
  49. 0 651 Alex 8.25 Nar NaN
  50. 1 651 Alex 7.50 SCC NaN
  51. 2 651 Alex 7.50 RSL NaN
  52. 3 4267 Ankit 7.50 UNIT-C NaN
  53. 4 4267 Ankit 14.00 UNIT-C NaN
  54. 5 4267 Ankit 12.00 UNIT-C NaN
  55. 6 4267 Ankit 4.50 UNIT-C x # 索引6
  56. 6 4267 Ankit 10.50 UNIT-C x # 重复
  57. 7 4267 Ankit 11.00 UNIT-C NaN
  58. 8 4267 Ankit 6.50 UNIT-C NaN
  59. 9 4267 Ankit 14.00 UNIT-C NaN
  60. 10 8806 Abhishek 15.00 UNI NaN
  61. 11 8806 Abhishek 15.00 UNI NaN
  62. 12 8806 Abhishek 8.00 UNI x # 索引12
  63. 12 8806 Abhishek 5.50 UNI x # 重复
  64. 13 6841 Blake 8.00 UNKING NaN
  65. 14 6841 Blake 8.00 UNKING NaN
  66. 15 6841 Blake 8.00 UNKING NaN
  67. 16 6841 Blake 8.00 UNKING NaN

请注意,这是代码的翻译,不包括代码部分。

英文:

As you duplicate index, you can use out.index.duplicated as boolean mask:

  1. # or out[&#39;mod&#39;] = np.where(out.index.duplicated(keep=False), &#39;x&#39;, &#39;-&#39;)
  2. out.loc[out.index.duplicated(keep=False), &#39;mod&#39;] = &#39;x&#39;
  3. print(out)
  4. # Output
  5. number name hours loc mod
  6. 0 651 Alex 8.25 Nar NaN
  7. 1 651 Alex 7.50 SCC NaN
  8. 2 651 Alex 7.50 RSL NaN
  9. 3 4267 Ankit 7.50 UNIT-C NaN
  10. 4 4267 Ankit 14.00 UNIT-C NaN
  11. 5 4267 Ankit 12.00 UNIT-C NaN
  12. 6 4267 Ankit 4.50 UNIT-C x # index 6
  13. 6 4267 Ankit 10.50 UNIT-C x # twice
  14. 7 4267 Ankit 11.00 UNIT-C NaN
  15. 8 4267 Ankit 6.50 UNIT-C NaN
  16. 9 4267 Ankit 14.00 UNIT-C NaN
  17. 10 8806 Abhishek 15.00 UNI NaN
  18. 11 8806 Abhishek 15.00 UNI NaN
  19. 12 8806 Abhishek 8.00 UNI x # index 12
  20. 12 8806 Abhishek 5.50 UNI x # twice
  21. 13 6841 Blake 8.00 UNKING NaN
  22. 14 6841 Blake 8.00 UNKING NaN
  23. 15 6841 Blake 8.00 UNKING NaN
  24. 16 6841 Blake 8.00 UNKING NaN

huangapple
  • 本文由 发表于 2023年2月19日 06:34:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496801.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定