英文:
Duplicating rows and and creating ID column and repeating column in Python
问题
以下是您要求的翻译部分:
# 在输入数据框中,我想要每一行复制三次。
# 计算得到的“id”列是重复三次的相同数字,表示行号。
# 计算得到的“type”列对于每个原始记录依次为1、2、3。
# 如何在Python中完成这个任务?
# 预期输出:
# id type c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
# 1 1 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
# 1 2 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
# 1 3 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
# 2 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
# 2 2 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
# 2 3 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
# 3 1 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
# 3 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
# 3 3 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
# 4 1 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
# 4 2 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
# 4 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
# 5 1 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
# 5 2 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
# 5 3 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
我有以下代码,但我不喜欢它,因为我不得不使用assign()两次,而且我不确定如何计算id列...我只是在那里放置了占位符代码。不过,我似乎正确获取了type列。
我的尝试:
(df
.dropna()
.assign(id = lambda x: range(1, len(x) + 1)
)
.pipe(lambda x: x.loc[x.index.repeat(3)])
.assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
type = lambda x: np.r_[:len(x)] % 3 + 1))
英文:
I have the following input dataframe:
Input Dataframe:
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
In the input datframe, I want duplicate each row 3 times.
The calculated id
column is the same number that repeats three times and represents the row number.
The calculated type
column goes 1,2,3 for each original record.
How can I do this in Python?
Expected Output:
id type c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
1 1 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
1 2 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
1 3 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0
2 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
2 2 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
2 3 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0
3 1 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
3 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
3 3 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0
4 1 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
4 2 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
4 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0
5 1 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
5 2 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
5 3 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0
I have the following code but I don't like it because I have to use assign() twice and also I'm not sure how to calculate the id column...I just put placeholder code there. I seem to get the type column correct though.
My Attempt:
(df
.dropna()
.assign(id = lambda x: range(1, len(x) + 1)
)
.pipe(lambda x: x.loc[x.index.repeat(3)])
.assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
type = lambda x: np.r_[:len(x)] % 3 + 1))
答案1
得分: 3
一个交叉连接将完成这项工作:
pd.merge(
df.rename_axis("id").reset_index(),
pd.DataFrame({"type": [1, 2, 3]}),
how="cross",
)
# 如果您使用的是 pandas 1.5 或更新版本
pd.merge(
df.reset_index(names="id"),
pd.DataFrame({"type": [1, 2, 3]}),
how="cross",
)
英文:
A cross join will do the job:
pd.merge(
df.rename_axis("id").reset_index(),
pd.DataFrame({"type": [1, 2, 3]}),
how="cross",
)
# If you are using pandas 1.5 or later
pd.merge(
df.reset_index(names="id"),
pd.DataFrame({"type": [1, 2, 3]}),
how="cross",
)
答案2
得分: 1
使用 concat
、groupby
和 cumcount
来稍微复杂一些的方法:
df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1
输出:
id c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 type
0 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 1
1 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 2
2 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 3
3 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 1
4 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 2
5 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 3
6 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 1
7 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 2
8 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 3
9 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 1
10 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 2
11 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 3
12 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 1
13 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 2
14 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 3
英文:
A slightly more complicated way to do this using concat
and groupby
with cumcount
:
df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1
Output:
id c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 type
0 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 1
1 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 2
2 0 56 1 2 4 1.0 1 4.0 1.0 2 2 18000.0 3
3 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 1
4 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 2
5 1 52 2 2 5 3.0 1 4.0 1.0 1 1 0.0 3
6 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 1
7 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 2
8 2 82 2 2 5 4.0 2 4.0 1.0 1 1 0.0 3
9 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 1
10 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 2
11 3 26 1 2 4 2.0 1 4.0 1.0 2 2 12000.0 3
12 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 1
13 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 2
14 4 65 1 2 4 1.0 1 4.0 23.0 2 1 324900.0 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论