
huangapple go评论55阅读模式

How can I create a empty dataframe with combined column names?


"column_name1" "column_name2"
[[[A;B;C], [D;E]]],
[[F;G;H], [I;J]]]]]




for headers in df.columns:
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(character)



I am trying to create a dataframe from an .xlsx file that transforms a string that is in a cell into a number of strings that are arranged in a single cell.
For example, I have a dataframe as follows:
column_name1 column_name2
[[[A;B;C], [D;E]]],
[[F;G;H], [I;J]]]]]
My intention is that 5 columns are created: "column_name1_1", "column_name1_2", "column_name1_3", "column_name2_1", "column_name2_2". Can the column name be automatized?
After the dataframe is created, my intention is to enter the data "A" in the first column, "B" in the second column, and so on. "F" would also go in the first column, but under "A" and "G" would go in the second column, but under "B".

Is there any way to achieve this result? It would also be useful for me not to create the name of the columns, but to distribute the information in the way I stated above.

I have created this simple code that separates the letters into lists:

for headers in df.columns:
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(character)

I am using pandas for the first time and this is my first post. Any advice is welcome. Thank you all very much!


得分: 0



import pandas as pd

# 将.xlsx文件加载到一个Pandas数据框中
df = pd.read_excel("file.xlsx")

# 创建一个新的数据框来存储拆分后的值
split_df = pd.DataFrame()

# 遍历列
for headers in df.columns:
    # 遍历每列中的单元格
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(";")
        # 获取子列表中的元素数
        num_elements = len(sublist)
        # 为拆分后的数据框中的每个元素创建新的列
        for i in range(num_elements):
            column_name = headers + "_" + str(i+1)
            split_df[column_name] = sublist[i]

# 重置拆分后的数据框的索引
split_df = split_df.reset_index(drop=True)

# 将拆分后的数据框保存到一个新的.xlsx文件
split_df.to_excel("split_file.xlsx", index=False)



You can achieve this using Pandas.

Here you go!

import pandas as pd

# Load the .xlsx file into a Pandas dataframe
df = pd.read_excel("file.xlsx")

# Create a new dataframe to store the split values
split_df = pd.DataFrame()

# Loop through the columns
for headers in df.columns:
    # Loop through the cells in each column
    for cells in df[headers]:
        cells = str(cells)
        sublist = cells.split(";")
        # Get the number of elements in the sublist
        num_elements = len(sublist)
        # Create new columns in the split_df dataframe for each element in the sublist
        for i in range(num_elements):
            column_name = headers + "_" + str(i+1)
            split_df[column_name] = sublist[i]

# Reset the index of the split_df dataframe
split_df = split_df.reset_index(drop=True)

# Save the split_df dataframe to a new .xlsx file
split_df.to_excel("split_file.xlsx", index=False)

This code will split the values in a .xlsx file into a new dataframe, with each value separated into its own column. The new columns will be named based on the original column names and the position of the value in the list. The new dataframe will then be saved to a new .xlsx file named "split_file.xlsx".

  • 本文由 发表于 2023年2月14日 01:42:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439442.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
