复制行并创建ID列以及在Python中重复列。

huangapple go评论71阅读模式
英文:

Duplicating rows and and creating ID column and repeating column in Python

问题

以下是您要求的翻译部分:

# 在输入数据框中,我想要每一行复制三次。
# 计算得到的“id”列是重复三次的相同数字,表示行号。
# 计算得到的“type”列对于每个原始记录依次为1、2、3。

# 如何在Python中完成这个任务?

# 预期输出:

#  id type  c1  c2  c3  c4  c5  c6  c7  c8   c9 c10 c11
#  1   1   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  1   2   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  1   3   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  2   1   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  2   2   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  2   3   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  3   1   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  3   2   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  3   3   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  4   1   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  4   2   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  4   3   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  5   1   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0
#  5   2   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0
#  5   3   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0

我有以下代码,但我不喜欢它,因为我不得不使用assign()两次,而且我不确定如何计算id列...我只是在那里放置了占位符代码。不过,我似乎正确获取了type列。

我的尝试:

(df
     .dropna()
     .assign(id =  lambda x: range(1, len(x) + 1)
            )
      .pipe(lambda x: x.loc[x.index.repeat(3)])
      .assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
              type = lambda x: np.r_[:len(x)] % 3 + 1))
英文:

I have the following input dataframe:

Input Dataframe:

c1  c2  c3  c4  c5  c6  c7  c8    c9 c10 c11
56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0

In the input datframe, I want duplicate each row 3 times.
The calculated id column is the same number that repeats three times and represents the row number.
The calculated type column goes 1,2,3 for each original record.

How can I do this in Python?

Expected Output:

id type  c1  c2  c3  c4  c5  c6  c7  c8   c9 c10 c11
1  1     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
1  2     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
1  3     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
2  1     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
2  2     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
2  3     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
3  1     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
3  2     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
3  3     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
4  1     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
4  2     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
4  3     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
5  1     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0
5  2     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0
5  3     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0

I have the following code but I don't like it because I have to use assign() twice and also I'm not sure how to calculate the id column...I just put placeholder code there. I seem to get the type column correct though.

My Attempt:

(df
     .dropna()
     .assign(id =  lambda x: range(1, len(x) + 1)
            )
      .pipe(lambda x: x.loc[x.index.repeat(3)])
      .assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
              type = lambda x: np.r_[:len(x)] % 3 + 1))

答案1

得分: 3

一个交叉连接将完成这项工作:

pd.merge(
    df.rename_axis("id").reset_index(),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

# 如果您使用的是 pandas 1.5 或更新版本
pd.merge(
    df.reset_index(names="id"),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)
英文:

A cross join will do the job:

pd.merge(
    df.rename_axis("id").reset_index(),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

# If you are using pandas 1.5 or later
pd.merge(
    df.reset_index(names="id"),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

答案2

得分: 1

使用 concatgroupbycumcount 来稍微复杂一些的方法:

df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1

输出:

    id  c1  c2  c3  c4   c5  c6   c7    c8  c9  c10       c11  type
0    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     1
1    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     2
2    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     3
3    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     1
4    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     2
5    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     3
6    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     1
7    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     2
8    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     3
9    3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     1
10   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     2
11   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     3
12   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     1
13   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     2
14   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     3
英文:

A slightly more complicated way to do this using concat and groupby with cumcount:

df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1

Output:

    id  c1  c2  c3  c4   c5  c6   c7    c8  c9  c10       c11  type
0    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     1
1    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     2
2    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     3
3    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     1
4    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     2
5    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     3
6    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     1
7    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     2
8    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     3
9    3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     1
10   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     2
11   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     3
12   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     1
13   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     2
14   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     3

huangapple
  • 本文由 发表于 2023年2月27日 10:54:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75576409.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定