2023年2月14日 01:42:51go评论88阅读模式

英文:

How can I create a empty dataframe with combined column names?

问题

我正在尝试从一个.xlsx文件中创建一个数据框，将单元格中的字符串转换为排列在单个单元格中的多个字符串。例如，我有以下数据框：
"column_name1" "column_name2"
[[[A;B;C], [D;E]]],
[[F;G;H], [I;J]]]]]

我的意图是创建5列："column_name1_1"、"column_name1_2"、"column_name1_3"、"column_name2_1"、"column_name2_2"。列名能够自动化吗？在创建数据框之后，我的意图是将数据"A"放入第一列，"B"放入第二列，依此类推。"F"也会放在第一列，但在"A"下面，"G"会放在第二列，但在"B"下面。

是否有办法实现这个结果？对我来说，不创建列名，而是按照我上面的说法分配信息也会很有用。

我已经创建了这个将字母分隔成列表的简单代码：

for headers in df.columns:
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(character)
        print(sublist)

这是我第一次使用pandas，这也是我的第一篇帖子。欢迎任何建议。非常感谢大家！

英文:

I am trying to create a dataframe from an .xlsx file that transforms a string that is in a cell into a number of strings that are arranged in a single cell.
For example, I have a dataframe as follows:
column_name1 column_name2
[[[A;B;C], [D;E]]],
[[F;G;H], [I;J]]]]]
My intention is that 5 columns are created: "column_name1_1", "column_name1_2", "column_name1_3", "column_name2_1", "column_name2_2". Can the column name be automatized?
After the dataframe is created, my intention is to enter the data "A" in the first column, "B" in the second column, and so on. "F" would also go in the first column, but under "A" and "G" would go in the second column, but under "B".

Is there any way to achieve this result? It would also be useful for me not to create the name of the columns, but to distribute the information in the way I stated above.

I have created this simple code that separates the letters into lists:

for headers in df.columns:
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(character)
        print(sublist)

I am using pandas for the first time and this is my first post. Any advice is welcome. Thank you all very much!

答案1

得分: 0

你可以使用Pandas来实现这个目标。

这里是代码部分：

import pandas as pd
# 将.xlsx文件加载到一个Pandas数据框中
df = pd.read_excel("file.xlsx")
# 创建一个新的数据框来存储拆分后的值
split_df = pd.DataFrame()
# 遍历列
for headers in df.columns:
    # 遍历每列中的单元格
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(";")
        # 获取子列表中的元素数
        num_elements = len(sublist)
        # 为拆分后的数据框中的每个元素创建新的列
        for i in range(num_elements):
            column_name = headers + "_" + str(i+1)
            split_df[column_name] = sublist[i]
# 重置拆分后的数据框的索引
split_df = split_df.reset_index(drop=True)
# 将拆分后的数据框保存到一个新的.xlsx文件
split_df.to_excel("split_file.xlsx", index=False)

这段代码会将.xlsx文件中的值拆分到一个新的数据框中，每个值会被分成自己的列。新的列的命名会基于原始列名和值在列表中的位置。然后，新的数据框会保存到一个名为"split_file.xlsx"的新.xlsx文件中。

英文:

You can achieve this using Pandas.

Here you go!

import pandas as pd
# Load the .xlsx file into a Pandas dataframe
df = pd.read_excel(&quot;file.xlsx&quot;)
# Create a new dataframe to store the split values
split_df = pd.DataFrame()
# Loop through the columns
for headers in df.columns:
    # Loop through the cells in each column
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(&quot;;&quot;)
        # Get the number of elements in the sublist
        num_elements = len(sublist)
        # Create new columns in the split_df dataframe for each element in the sublist
        for i in range(num_elements):
            column_name = headers + &quot;_&quot; + str(i+1)
            split_df[column_name] = sublist[i]
# Reset the index of the split_df dataframe
split_df = split_df.reset_index(drop=True)
# Save the split_df dataframe to a new .xlsx file
split_df.to_excel(&quot;split_file.xlsx&quot;, index=False)

This code will split the values in a .xlsx file into a new dataframe, with each value separated into its own column. The new columns will be named based on the original column names and the position of the value in the list. The new dataframe will then be saved to a new .xlsx file named "split_file.xlsx".

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何创建一个具有合并列名称的空数据框？

问题

答案1

如何在Python中计算所有其他产品的加权平均值

调整图表中的 Taipy 高度

Profile matching query does not exist.

Pandas “Consecutive”/Rolling Percent Rank

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。