根据数据框列的值拆分数字。

huangapple go评论82阅读模式
英文:

Split numbers based on dataframe column values

问题

我是一个初学者的Python程序员,我手头有一个需要设计一些逻辑的任务。我有一个数据框,其中包含用户、任务和用户为该任务投入的工时的以下数值:

用户 任务 工时
约翰 T1 8
约翰 T2 32
亚历克斯 A1 40
鲍勃 B1 16

现在我需要做的是,每周生成一个新的数据框,其中包含用户、任务代码和分配到5天的工时(一周有7天)。请注意,每周的最大工时不能超过40小时,即每天8小时。

对于亚历克斯
分配很简单。每天8小时,一周5天,总共40小时工作时间。

用户 任务 工时 天数
亚历克斯 A1 8 1
亚历克斯 A1 8 2
亚历克斯 A1 8 3
亚历克斯 A1 8 4
亚历克斯 A1 8 5

现在变得复杂了。

对于鲍勃
分配将是16/5 = 3.2。但我需要输入整数(4)并在最后一天调整剩余的工时。

用户 任务 工时 天数
鲍勃 B1 3 1
鲍勃 B1 3 2
鲍勃 B1 3 3
鲍勃 B1 3 4
鲍勃 B1 4 5

对于约翰

这是最棘手的一个。我需要将2个任务的工时(32和8)分配到5天。

我需要将8小时分为2-2-2-2,分配到4天,或者分为4-4,分配到2天。

并且将32小时分为6-6-6-6-8,分配到5天,或者分为4-4-8-8-8,分配到5天。

考虑第一种情况,我们会得到以下结果:

用户 任务 工时 天数
约翰 T1 2 1
约翰 T2 6 1
约翰 T1 2 2
约翰 T2 6 2
约翰 T1 2 3
约翰 T2 6 3
约翰 T1 2 4
约翰 T2 6 4
约翰 T2 8 5

如果我的格式或语言不当或不符合指南,请原谅。

如果你能帮助我找出这个问题的逻辑,我将非常感激。提前感谢你。

英文:

I am a novice python programmer and I have a task at hand which needs a little logic designing. So I have a data frame with the below mentioned values in User, Task and Hours invested by the user for that task:

User Task hours
John T1 8
John T2 32
Alex A1 40
Bob B1 16

Now what I need to do is, generate a new dataframe on a weekly basis which contains the user, task code and the hours divided into 5 (5 working days in a week of 7 days). Note that max hours per week cannot exceed 40 i.e 8 hours per day.

For Alex:
The division is simple. 8 hours per day for 5 days will give me 40 hours of worktime.

User Task hours Day
Alex A1 8 1
Alex A1 8 2
Alex A1 8 3
Alex A1 8 4
Alex A1 8 5

Now it gets tricky.

For Bob:
The division will be 16/5 = 3.2. But I need to enter whole numbers (4) and adjust the leftover on the last day.

User Task hours Day
Bob B1 3 1
Bob B1 3 2
Bob B1 3 3
Bob B1 3 4
Bob B1 4 5

For John:

This is the most tricky one. I need to split the 2 task's hours (32 and 8) among 5 days.

I need 8 split up into 2-2-2-2 for 4 days or 4-4 for 2 days

and 32 split up into 6-6-6-6-8 for 5 days or 4-4-8-8-8 for 5 days

Considering the 1st cases for both, we would get something like this:

User Task hours Day
John T1 2 1
John T2 6 1
John T1 2 2
John T2 6 2
John T1 2 3
John T2 6 3
John T1 2 4
John T2 6 4
John T2 8 5

I am sorry if my format or language is inappropriate or not adhering the guidelines. My apologies.

I would be extremely grateful of you could help me figure out a logic for this. A big thank you in advance.

答案1

得分: 1

首先,import numpy as np

我们的数据框:

  1. df = pd.DataFrame([
  2. ["John", "T1", 8],
  3. ["John", "T2", 32],
  4. ["Alex", "A1", 40],
  5. ["Bob", "B1", 16]],
  6. columns=["User", "Task", 'Hours'])
  1. # 创建一个列,记录平均工作小时数
  2. df["avg"] = df["Hours"] / 5
  3. # 创建一个列,记录一周前四天的剩余小时数
  4. df["leftover"] = (df["avg"] - np.floor(df2["avg"])) * 4
  5. # 创建一个函数,将每天的工作小时数存储在列表中,第五天加入剩余小时数
  6. # 列表的元素遵循格式(<工作小时数>,<第几天>)
  7. def days_list(row):
  8. days = [(np.floor(row["avg"]), i+1) for i in range(4)]
  9. days.append((np.floor(row["leftover"] + row["avg"]), 5))
  10. return days
  11. df["list_of_days"] = df.apply(lambda row: days_list(row), axis=1) # 应用该函数
  12. # 展开列 'list_of_days',使每个元组都有一行
  13. df = df.explode("list_of_days")
  14. # 最后,将每个元组的值分别放入两列 'Hours' & 'Days'
  15. df['Hours'], df['Days'] = zip(*df["list_of_days"])

保留我们需要的列并打印数据框:

  1. final_df = df[["User", "Task", "Hours", "Days"]]
  2. final_df
User Task Hours Days
John T1 1.0 1
John T1 1.0 2
John T1 1.0 3
John T1 1.0 4
John T1 4.0 5
John T2 6.0 1
John T2 6.0 2
John T2 6.0 3
John T2 6.0 4
John T2 8.0 5
Alex A1 8.0 1
Alex A1 8.0 2
Alex A1 8.0 3
Alex A1 8.0 4
Alex A1 8.0 5
Bob B1 3.0 1
Bob B1 3.0 2
Bob B1 3.0 3
Bob B1 3.0 4
Bob B1 4.0 5
  1. <details>
  2. <summary>英文:</summary>
  3. First, `import numpy as np`
  4. Our dataframe:
  5. ```python
  6. df = pd.DataFrame([
  7. [&quot;John&quot;, &quot;T1&quot;,8],
  8. [&quot;John&quot;, &quot;T2&quot;, 32],
  9. [&quot;Alex&quot;,&quot;A1&quot;,40],
  10. [&quot;Bob&quot;,&quot;B1&quot;,16]],
  11. columns=[&quot;User&quot;, &quot;Task&quot;, &#39;Hours&#39;])
  1. # We create a column with the average work hours
  2. df[&quot;avg&quot;] = df[&quot;Hours&quot;] / 5
  3. # A column with the leftover hours of the first 4 days of week
  4. df[&quot;leftover&quot;] = (df[&quot;avg&quot;] - np.floor(df2[&quot;avg&quot;]) ) * 4
  5. # We create a function, in which we will store in a list
  6. # the working hours for each day, adding the leftovers in the 5th day.
  7. # The elements of the list are tuples following the format (&lt;working-hours&gt;, &lt;nth-day&gt;)
  8. def days_list(row):
  9. days = [(np.floor(row[&quot;avg&quot;]), i+1) for i in range(4)]
  10. days.append((np.floor(row[&quot;leftover&quot;] + row[&quot;avg&quot;]), 5))
  11. return days
  12. df[&quot;list_of_days&quot;] = df.apply(lambda row: days_list(row), axis=1) # applying the function
  13. # We explode the column &#39;list_of_days&#39; so as to have a row for each tuple
  14. df = df.explode(&quot;list_of_days&quot;)
  15. # Finally, we unzip the values of each tuple in the 2 columns &#39;Hours&#39; &amp; &#39;Days&#39; accordingly.
  16. df[&#39;Hours&#39;], df[&#39;Days&#39;] = zip(*df[&quot;list_of_days&quot;])

Keeping the columns we need and printing the Dataframe :

  1. final_df = df[[&quot;User&quot;, &quot;Task&quot;, &quot;Hours&quot;, &quot;Days&quot;]]
  2. final_df
User Task Hours Days
John T1 1.0 1
John T1 1.0 2
John T1 1.0 3
John T1 1.0 4
John T1 4.0 5
John T2 6.0 1
John T2 6.0 2
John T2 6.0 3
John T2 6.0 4
John T2 8.0 5
Alex A1 8.0 1
Alex A1 8.0 2
Alex A1 8.0 3
Alex A1 8.0 4
Alex A1 8.0 5
Bob B1 3.0 1
Bob B1 3.0 2
Bob B1 3.0 3
Bob B1 3.0 4
Bob B1 4.0 5

huangapple
  • 本文由 发表于 2023年6月12日 20:12:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456569.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定