2023年2月27日 10:54:55go评论71阅读模式

英文:

Duplicating rows and and creating ID column and repeating column in Python

问题

以下是您要求的翻译部分：

# 在输入数据框中，我想要每一行复制三次。
# 计算得到的“id”列是重复三次的相同数字，表示行号。
# 计算得到的“type”列对于每个原始记录依次为1、2、3。

# 如何在Python中完成这个任务？

# 预期输出：

#  id type  c1  c2  c3  c4  c5  c6  c7  c8   c9 c10 c11
#  1   1   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  1   2   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  1   3   56  1   2   4   1.0  1   4.0  1.0  2   2   18000.0
#  2   1   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  2   2   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  2   3   52  2   2   5   3.0  1   4.0  1.0  1   1   0.0
#  3   1   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  3   2   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  3   3   82  2   2   5   4.0  2   4.0  1.0  1   1   0.0
#  4   1   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  4   2   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  4   3   26  1   2   4   2.0  1   4.0  1.0  2   2   12000.0
#  5   1   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0
#  5   2   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0
#  5   3   65  1   2   4   1.0  1   4.0  23.0 2   1   324900.0

我有以下代码，但我不喜欢它，因为我不得不使用assign()两次，而且我不确定如何计算id列...我只是在那里放置了占位符代码。不过，我似乎正确获取了type列。

我的尝试：

(df
     .dropna()
     .assign(id =  lambda x: range(1, len(x) + 1)
            )
      .pipe(lambda x: x.loc[x.index.repeat(3)])
      .assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
              type = lambda x: np.r_[:len(x)] % 3 + 1))

英文:

I have the following input dataframe:

Input Dataframe:

c1  c2  c3  c4  c5  c6  c7  c8    c9 c10 c11
56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0

In the input datframe, I want duplicate each row 3 times.
The calculated id column is the same number that repeats three times and represents the row number.
The calculated type column goes 1,2,3 for each original record.

How can I do this in Python?

Expected Output:

id type  c1  c2  c3  c4  c5  c6  c7  c8   c9 c10 c11
1  1     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
1  2     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
1  3     56	1	2	4	1.0	1	4.0	1.0	  2	 2	 18000.0
2  1     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
2  2     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
2  3     52	2	2	5	3.0	1	4.0	1.0	  1	 1	 0.0
3  1     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
3  2     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
3  3     82	2	2	5	4.0	2	4.0	1.0	  1	 1	 0.0
4  1     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
4  2     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
4  3     26	1	2	4	2.0	1	4.0	1.0	  2	 2	 12000.0
5  1     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0
5  2     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0
5  3     65	1	2	4	1.0	1	4.0	23.0  2	 1	 324900.0

I have the following code but I don't like it because I have to use assign() twice and also I'm not sure how to calculate the id column...I just put placeholder code there. I seem to get the type column correct though.

My Attempt:

(df
     .dropna()
     .assign(id =  lambda x: range(1, len(x) + 1)
            )
      .pipe(lambda x: x.loc[x.index.repeat(3)])
      .assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
              type = lambda x: np.r_[:len(x)] % 3 + 1))

答案1

得分: 3

一个交叉连接将完成这项工作：

pd.merge(
    df.rename_axis("id").reset_index(),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

# 如果您使用的是 pandas 1.5 或更新版本
pd.merge(
    df.reset_index(names="id"),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

英文:

A cross join will do the job:

pd.merge(
    df.rename_axis(&quot;id&quot;).reset_index(),
    pd.DataFrame({&quot;type&quot;: [1, 2, 3]}),
    how=&quot;cross&quot;,
)

# If you are using pandas 1.5 or later
pd.merge(
    df.reset_index(names=&quot;id&quot;),
    pd.DataFrame({&quot;type&quot;: [1, 2, 3]}),
    how=&quot;cross&quot;,
)

答案2

得分: 1

使用 concat、groupby 和 cumcount 来稍微复杂一些的方法：

df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1

输出：

    id  c1  c2  c3  c4   c5  c6   c7    c8  c9  c10       c11  type
0    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     1
1    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     2
2    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     3
3    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     1
4    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     2
5    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     3
6    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     1
7    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     2
8    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     3
9    3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     1
10   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     2
11   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     3
12   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     1
13   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     2
14   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     3

英文:

A slightly more complicated way to do this using concat and groupby with cumcount:

df3 = pd.concat([df, df, df]).sort_index().reset_index(names=&#39;id&#39;)
df3[&#39;type&#39;] = df3.groupby(&#39;id&#39;).cumcount() + 1

Output:

    id  c1  c2  c3  c4   c5  c6   c7    c8  c9  c10       c11  type
0    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     1
1    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     2
2    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     3
3    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     1
4    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     2
5    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     3
6    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     1
7    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     2
8    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     3
9    3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     1
10   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     2
11   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     3
12   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     1
13   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     2
14   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

复制行并创建ID列以及在Python中重复列。

问题

答案1

答案2

如何确保数值映射到正确的增量表列？

Access to fetch at https://api-test-license.onrender.com/licenses'from origin https://license-frontend.onrender.com has been blocked by CORS policy

我想在index.html中显示’yyy’，但运行服务器时，页面正常显示但没有’yyy’。

如何在PyQt5中使表格小部件中的进度条跨足两个单元格？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论