英文:
split values from columns and generate sequence number
问题
我有一个数据框中有两列。每列在一行中有多个值。我想要将每个值拆分到另一个表中的新行中,并生成序列号。给定的数据是:
新数据框应该是这样的:
x                  76.25
y                  87.12
序列号             1
x                  345.65
y                  96.45
序列号             2
x                  78.12
y                  85.23
序列号             1
x                  35.1
y                  65.21
序列号             2
x                  98.27
y                  56.63
序列号             3
所有的值都是字符串。我不知道该如何做。我应该编写一个函数还是数据框中有任何命令?任何帮助都将不胜感激。
英文:
I have two columns in a df. each column has multiple values in 1 row.I want to split each value in a new row in another table and generate sequence number. given data is
<!-- language: none -->
x                                           y
76.25, 345.65                           87.12,96.45
78.12,35.1,98.27                       85.23,65.2,56.63
new df should be like this
<!-- language: none -->
x                  76.25
y                  87.12
sequence number      1
x                    345.65
y                    96.45
sequence number       2
x                     78.12
y                      85.23
sequence number         1
x                       35.1
y                      65.21
sequence number         2
x                     98.27
y                     56.63
sequence number         3
all values are strings. I have no idea how should I do it.Should I write a function or there is any command in dataframe? any help is appreciated
答案1
得分: 0
你可以使用iterrows()和concat()来实现:
df = pd.DataFrame({
    'x': ('76.25,345.65', '78.12,35.1,98.27'),
    'y': ('87.12,96.45', '85.23,65.2,56.63')
})
def get_parts():
    for index, row in df.iterrows():
        x = row['x'].split(',')
        y = row['y'].split(',')
        for index, _ in enumerate(x):
            # len(x)必须等于len(y)...
            yield 'x', x[index]
            yield 'y', y[index]
            # 在每个拆分的项目之后生成数字
            yield 'sequence number', index + 1
# 从各个部分生成Series并合并成新的DataFrame
new_df = pd.concat([
    pd.Series(], 
])
    for p in get_parts()
])
希望这对你有所帮助。
英文:
You can do it using iterrows() + concat():
df = pd.DataFrame({
    'x': ('76.25,345.65', '78.12,35.1,98.27'),
    'y': ('87.12,96.45', '85.23,65.2,56.63')
})
def get_parts():
    for index, row in df.iterrows():
        x = row['x'].split(',')
        y = row['y'].split(',')
        for index, _ in enumerate(x):
            #  len(x) must be equal len(y)...
            yield 'x', x[index]
            yield 'y', y[index]
            # generate number after each splitted item
            yield 'sequence number', index + 1
# generate Series from parts and union into new dataframe
new_df = pd.concat([
    pd.Series(], 
])
    for p in get_parts()
])
Hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论