2023年6月27日 18:27:25go评论122阅读模式

英文:

Combining multiple columns with similar names in Jupyter notebook from an xls import

问题

我正在从一个Excel表格中导入调查数据，其中有多列具有相同的标题。当我将它们导入到我的数据框中时，它们被重新格式化为value.1、value.2、value.3等。

我一直在尝试将数据框从宽格式转换为长格式，但变量名称不会合并值，所以我得到每个值名称都是新的问题。

此外，我必须在转换中指定所有值的名称，这很繁琐，因为现在有40多个变体。

import pandas as pd
# 创建一个带有x、x.1、y和y.1列的数据框
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'x.1': [1, 2, 3, 4, 5], 'y': [1, 2, 3, 4, 5], 'y.1': [1, 2, 3, 4, 5],})
# 添加一个id列
df['id'] = df.index
# 使用melt将数据从宽格式转换为长格式
df = pd.melt(df, id_vars=['id'], value_vars=['x', 'x.1', 'y', 'y.1'], var_name='variable', value_name='value')
print(df)

你得到的输出是：

   id variable  value
0   0        x      1
1   1        x      2
2   2        x      3
3   3        x      4
4   4        x      5
5   0      x.1      1
6   1      x.1      2
7   2      x.1      3
8   3      x.1      4
9   4      x.1      5
10  0        y      1
11  1        y      2
12  2        y      3
13  3        y      4
14  4        y      5
15  0      y.1      1
16  1      y.1      2
17  2      y.1      3
18  3      y.1      4
19  4      y.1      5

但我想要的是像这样收集值：

   id variable  value
0   0        x      1
1   1        x      2
2   2        x      3
3   3        x      4
4   4        x      5
5   0        x      1
6   1        x      2
7   2        x      3
8   3        x      4
9   4        x      5
10  0        y      1
11  1        y      2
12  2        y      3
13  3        y      4
14  4        y      5
15  0        y      1
16  1        y      2
17  2        y      3
18  3        y      4
19  4        y      5

如何做到这一点？最好的方式是不需要在融合数据框时有一个非常长的值列表。

英文:

I'm bringing in survey data from an excel sheet where there are multiple columns which have the same title in the excel sheet, when I import them into my dataframe, they're reformatted as value.1, value.2, value.3, etc.

I've been trying to change the dataframe from a wide to a long format, but the variable names don't merge the values, so instead I get each one as a new value name.

In addition, I have to specify all the value names in the conversion, which is tedious, because there's 40+ variants now.

import pandas as pd
    
# create a dataframe with an x, x.1, and y, y.2 column
df = pd.DataFrame({&#39;x&#39;: [1, 2, 3, 4, 5], &#39;x.1&#39;: [1, 2, 3, 4, 5], &#39;y&#39;: [1, 2, 3, 4, 5], &#39;y.1&#39;: [1, 2, 3, 4, 5],})
# add an id column
df[&#39;id&#39;] = df.index
# change from wide to long format using melt
df = pd.melt(df, id_vars=[&#39;id&#39;], value_vars=[&#39;x&#39;, &#39;x.1&#39;, &#39;y&#39;, &#39;y.1&#39;], var_name=&#39;variable&#39;, value_name=&#39;value&#39;)
print(df)

Where I'm getting this output

    id variable  value
0    0        x      1
1    1        x      2
2    2        x      3
3    3        x      4
4    4        x      5
5    0      x.1      1
6    1      x.1      2
7    2      x.1      3
8    3      x.1      4
9    4      x.1      5
10   0        y      1
11   1        y      2
12   2        y      3
13   3        y      4
14   4        y      5
15   0      y.1      1
16   1      y.1      2
17   2      y.1      3
18   3      y.1      4
19   4      y.1      5

But what I want is to gather the values like this:

    id variable  value
0    0        x      1
1    1        x      2
2    2        x      3
3    3        x      4
4    4        x      5
5    0        x      1
6    1        x      2
7    2        x      3
8    3        x      4
9    4        x      5
10   0        y      1
11   1        y      2
12   2        y      3
13   3        y      4
14   4        y      5
15   0        y      1
16   1        y      2
17   2        y      3
18   3        y      4
19   4        y      5

How can I do this? Ideally in a way where I don't have to have a super long list in when I melt the dataframe

答案1

得分: 1

以下是翻译好的部分：

根据注释所提到的，如果您使用 str.replace 预处理列名以去除 .0 后缀，这将变得很容易：

out = (df
   .set_axis(df.columns.str.replace(r'\.\d+', '', regex=True), axis=1)
   .melt(['id'],
         var_name='variable', value_name='value', # 不需要
        )
)

请注意，如果这个集合与 id_vars 互补，您不需要指定 value_vars。var_name='variable'/value_name='value' 是默认值，因此也不需要。

输出：

    id variable  value
0    0        x      1
1    1        x      2
2    2        x      3
3    3        x      4
4    4        x      5
5    0        x      1
6    1        x      2
7    2        x      3
8    3        x      4
9    4        x      5
10   0        y      1
11   1        y      2
12   2        y      3
13   3        y      4
14   4        y      5
15   0        y      1
16   1        y      2
17   2        y      3
18   3        y      4
19   4        y      5

正则表达式演示

英文:

As mentioned as comment, this is easy if you pre-process the column names with str.replace to remove the .0 suffix:

out = (df
   .set_axis(df.columns.str.replace(r&#39;\.\d+&#39;, &#39;&#39;, regex=True), axis=1)
   .melt([&#39;id&#39;],
         var_name=&#39;variable&#39;, value_name=&#39;value&#39;, # not needed
        )
)

Note that you don't need to specify the value_vars if this set is complementary to id_vars. var_name='variable'/value_name='value' are default values, so not needed either.

Output:

    id variable  value
0    0        x      1
1    1        x      2
2    2        x      3
3    3        x      4
4    4        x      5
5    0        x      1
6    1        x      2
7    2        x      3
8    3        x      4
9    4        x      5
10   0        y      1
11   1        y      2
12   2        y      3
13   3        y      4
14   4        y      5
15   0        y      1
16   1        y      2
17   2        y      3
18   3        y      4
19   4        y      5

regex demo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Jupyter Notebook中，从xls导入时合并具有相似名称的多个列。

问题

答案1

Python Wheel 包括共享库，构建为纯Python、跨平台、非特定平台

PyVoIP写入音频发送的声音质量真的很差

如何使Python中的Perlin噪声模块使用种子？

计算布尔矩阵的密度如何？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。