从列中拆分值并生成序列号。

huangapple go评论111阅读模式
英文:

split values from columns and generate sequence number

问题

我有一个数据框中有两列。每列在一行中有多个值。我想要将每个值拆分到另一个表中的新行中,并生成序列号。给定的数据是:

新数据框应该是这样的:

x 76.25
y 87.12
序列号 1
x 345.65
y 96.45
序列号 2
x 78.12
y 85.23
序列号 1
x 35.1
y 65.21
序列号 2
x 98.27
y 56.63
序列号 3

所有的值都是字符串。我不知道该如何做。我应该编写一个函数还是数据框中有任何命令?任何帮助都将不胜感激。

英文:

I have two columns in a df. each column has multiple values in 1 row.I want to split each value in a new row in another table and generate sequence number. given data is

<!-- language: none -->

x                                           y
76.25, 345.65                           87.12,96.45
78.12,35.1,98.27                       85.23,65.2,56.63

new df should be like this

<!-- language: none -->

x                  76.25
y                  87.12
sequence number      1
x                    345.65
y                    96.45
sequence number       2
x                     78.12
y                      85.23
sequence number         1
x                       35.1
y                      65.21
sequence number         2
x                     98.27
y                     56.63
sequence number         3

all values are strings. I have no idea how should I do it.Should I write a function or there is any command in dataframe? any help is appreciated

答案1

得分: 0

你可以使用iterrows()concat()来实现:

df = pd.DataFrame({
    'x': ('76.25,345.65', '78.12,35.1,98.27'),
    'y': ('87.12,96.45', '85.23,65.2,56.63')
})


def get_parts():
    for index, row in df.iterrows():
        x = row['x'].split(',')
        y = row['y'].split(',')
        for index, _ in enumerate(x):
            # len(x)必须等于len(y)...
            yield 'x', x[index]
            yield 'y', y[index]
            # 在每个拆分的项目之后生成数字
            yield 'sequence number', index + 1


# 从各个部分生成Series并合并成新的DataFrame
new_df = pd.concat([
    pd.Series(

],

]) for p in get_parts() ])

希望这对你有所帮助。

英文:

You can do it using iterrows() + concat():

df = pd.DataFrame({
    &#39;x&#39;: (&#39;76.25,345.65&#39;, &#39;78.12,35.1,98.27&#39;),
    &#39;y&#39;: (&#39;87.12,96.45&#39;, &#39;85.23,65.2,56.63&#39;)
})


def get_parts():
    for index, row in df.iterrows():
        x = row[&#39;x&#39;].split(&#39;,&#39;)
        y = row[&#39;y&#39;].split(&#39;,&#39;)
        for index, _ in enumerate(x):
            #  len(x) must be equal len(y)...
            yield &#39;x&#39;, x[index]
            yield &#39;y&#39;, y[index]
            # generate number after each splitted item
            yield &#39;sequence number&#39;, index + 1


# generate Series from parts and union into new dataframe
new_df = pd.concat([
    pd.Series(

],

]) for p in get_parts() ])

Hope this helps.

huangapple
  • 本文由 发表于 2020年1月6日 18:29:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610445.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定