英文:
Split numbers based on dataframe column values
问题
我是一个初学者的Python程序员,我手头有一个需要设计一些逻辑的任务。我有一个数据框,其中包含用户、任务和用户为该任务投入的工时的以下数值:
用户 | 任务 | 工时 |
---|---|---|
约翰 | T1 | 8 |
约翰 | T2 | 32 |
亚历克斯 | A1 | 40 |
鲍勃 | B1 | 16 |
现在我需要做的是,每周生成一个新的数据框,其中包含用户、任务代码和分配到5天的工时(一周有7天)。请注意,每周的最大工时不能超过40小时,即每天8小时。
对于亚历克斯:
分配很简单。每天8小时,一周5天,总共40小时工作时间。
用户 | 任务 | 工时 | 天数 |
---|---|---|---|
亚历克斯 | A1 | 8 | 1 |
亚历克斯 | A1 | 8 | 2 |
亚历克斯 | A1 | 8 | 3 |
亚历克斯 | A1 | 8 | 4 |
亚历克斯 | A1 | 8 | 5 |
现在变得复杂了。
对于鲍勃:
分配将是16/5 = 3.2。但我需要输入整数(4)并在最后一天调整剩余的工时。
用户 | 任务 | 工时 | 天数 |
---|---|---|---|
鲍勃 | B1 | 3 | 1 |
鲍勃 | B1 | 3 | 2 |
鲍勃 | B1 | 3 | 3 |
鲍勃 | B1 | 3 | 4 |
鲍勃 | B1 | 4 | 5 |
对于约翰:
这是最棘手的一个。我需要将2个任务的工时(32和8)分配到5天。
我需要将8小时分为2-2-2-2,分配到4天,或者分为4-4,分配到2天。
并且将32小时分为6-6-6-6-8,分配到5天,或者分为4-4-8-8-8,分配到5天。
考虑第一种情况,我们会得到以下结果:
用户 | 任务 | 工时 | 天数 |
---|---|---|---|
约翰 | T1 | 2 | 1 |
约翰 | T2 | 6 | 1 |
约翰 | T1 | 2 | 2 |
约翰 | T2 | 6 | 2 |
约翰 | T1 | 2 | 3 |
约翰 | T2 | 6 | 3 |
约翰 | T1 | 2 | 4 |
约翰 | T2 | 6 | 4 |
约翰 | T2 | 8 | 5 |
如果我的格式或语言不当或不符合指南,请原谅。
如果你能帮助我找出这个问题的逻辑,我将非常感激。提前感谢你。
英文:
I am a novice python programmer and I have a task at hand which needs a little logic designing. So I have a data frame with the below mentioned values in User, Task and Hours invested by the user for that task:
User | Task | hours |
---|---|---|
John | T1 | 8 |
John | T2 | 32 |
Alex | A1 | 40 |
Bob | B1 | 16 |
Now what I need to do is, generate a new dataframe on a weekly basis which contains the user, task code and the hours divided into 5 (5 working days in a week of 7 days). Note that max hours per week cannot exceed 40 i.e 8 hours per day.
For Alex:
The division is simple. 8 hours per day for 5 days will give me 40 hours of worktime.
User | Task | hours | Day |
---|---|---|---|
Alex | A1 | 8 | 1 |
Alex | A1 | 8 | 2 |
Alex | A1 | 8 | 3 |
Alex | A1 | 8 | 4 |
Alex | A1 | 8 | 5 |
Now it gets tricky.
For Bob:
The division will be 16/5 = 3.2. But I need to enter whole numbers (4) and adjust the leftover on the last day.
User | Task | hours | Day |
---|---|---|---|
Bob | B1 | 3 | 1 |
Bob | B1 | 3 | 2 |
Bob | B1 | 3 | 3 |
Bob | B1 | 3 | 4 |
Bob | B1 | 4 | 5 |
For John:
This is the most tricky one. I need to split the 2 task's hours (32 and 8) among 5 days.
I need 8 split up into 2-2-2-2 for 4 days or 4-4 for 2 days
and 32 split up into 6-6-6-6-8 for 5 days or 4-4-8-8-8 for 5 days
Considering the 1st cases for both, we would get something like this:
User | Task | hours | Day |
---|---|---|---|
John | T1 | 2 | 1 |
John | T2 | 6 | 1 |
John | T1 | 2 | 2 |
John | T2 | 6 | 2 |
John | T1 | 2 | 3 |
John | T2 | 6 | 3 |
John | T1 | 2 | 4 |
John | T2 | 6 | 4 |
John | T2 | 8 | 5 |
I am sorry if my format or language is inappropriate or not adhering the guidelines. My apologies.
I would be extremely grateful of you could help me figure out a logic for this. A big thank you in advance.
答案1
得分: 1
首先,import numpy as np
。
我们的数据框:
df = pd.DataFrame([
["John", "T1", 8],
["John", "T2", 32],
["Alex", "A1", 40],
["Bob", "B1", 16]],
columns=["User", "Task", 'Hours'])
# 创建一个列,记录平均工作小时数
df["avg"] = df["Hours"] / 5
# 创建一个列,记录一周前四天的剩余小时数
df["leftover"] = (df["avg"] - np.floor(df2["avg"])) * 4
# 创建一个函数,将每天的工作小时数存储在列表中,第五天加入剩余小时数
# 列表的元素遵循格式(<工作小时数>,<第几天>)
def days_list(row):
days = [(np.floor(row["avg"]), i+1) for i in range(4)]
days.append((np.floor(row["leftover"] + row["avg"]), 5))
return days
df["list_of_days"] = df.apply(lambda row: days_list(row), axis=1) # 应用该函数
# 展开列 'list_of_days',使每个元组都有一行
df = df.explode("list_of_days")
# 最后,将每个元组的值分别放入两列 'Hours' & 'Days'
df['Hours'], df['Days'] = zip(*df["list_of_days"])
保留我们需要的列并打印数据框:
final_df = df[["User", "Task", "Hours", "Days"]]
final_df
User | Task | Hours | Days |
---|---|---|---|
John | T1 | 1.0 | 1 |
John | T1 | 1.0 | 2 |
John | T1 | 1.0 | 3 |
John | T1 | 1.0 | 4 |
John | T1 | 4.0 | 5 |
John | T2 | 6.0 | 1 |
John | T2 | 6.0 | 2 |
John | T2 | 6.0 | 3 |
John | T2 | 6.0 | 4 |
John | T2 | 8.0 | 5 |
Alex | A1 | 8.0 | 1 |
Alex | A1 | 8.0 | 2 |
Alex | A1 | 8.0 | 3 |
Alex | A1 | 8.0 | 4 |
Alex | A1 | 8.0 | 5 |
Bob | B1 | 3.0 | 1 |
Bob | B1 | 3.0 | 2 |
Bob | B1 | 3.0 | 3 |
Bob | B1 | 3.0 | 4 |
Bob | B1 | 4.0 | 5 |
<details>
<summary>英文:</summary>
First, `import numpy as np`
Our dataframe:
```python
df = pd.DataFrame([
["John", "T1",8],
["John", "T2", 32],
["Alex","A1",40],
["Bob","B1",16]],
columns=["User", "Task", 'Hours'])
# We create a column with the average work hours
df["avg"] = df["Hours"] / 5
# A column with the leftover hours of the first 4 days of week
df["leftover"] = (df["avg"] - np.floor(df2["avg"]) ) * 4
# We create a function, in which we will store in a list
# the working hours for each day, adding the leftovers in the 5th day.
# The elements of the list are tuples following the format (<working-hours>, <nth-day>)
def days_list(row):
days = [(np.floor(row["avg"]), i+1) for i in range(4)]
days.append((np.floor(row["leftover"] + row["avg"]), 5))
return days
df["list_of_days"] = df.apply(lambda row: days_list(row), axis=1) # applying the function
# We explode the column 'list_of_days' so as to have a row for each tuple
df = df.explode("list_of_days")
# Finally, we unzip the values of each tuple in the 2 columns 'Hours' & 'Days' accordingly.
df['Hours'], df['Days'] = zip(*df["list_of_days"])
Keeping the columns we need and printing the Dataframe :
final_df = df[["User", "Task", "Hours", "Days"]]
final_df
User | Task | Hours | Days |
---|---|---|---|
John | T1 | 1.0 | 1 |
John | T1 | 1.0 | 2 |
John | T1 | 1.0 | 3 |
John | T1 | 1.0 | 4 |
John | T1 | 4.0 | 5 |
John | T2 | 6.0 | 1 |
John | T2 | 6.0 | 2 |
John | T2 | 6.0 | 3 |
John | T2 | 6.0 | 4 |
John | T2 | 8.0 | 5 |
Alex | A1 | 8.0 | 1 |
Alex | A1 | 8.0 | 2 |
Alex | A1 | 8.0 | 3 |
Alex | A1 | 8.0 | 4 |
Alex | A1 | 8.0 | 5 |
Bob | B1 | 3.0 | 1 |
Bob | B1 | 3.0 | 2 |
Bob | B1 | 3.0 | 3 |
Bob | B1 | 3.0 | 4 |
Bob | B1 | 4.0 | 5 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论