连接列,并逐行列出它们。

huangapple go评论64阅读模式
英文:

Join the columns and list them one by one in row

问题

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
英文:

I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.

order container box
a c1,c2,c3 b1,b2,b3
b c4,c5,c6 b4,b5,b6

Need to get table with two columns: "order" and "content" with all values from both container and box.

I could only merge the container and box - but do not know how to list them row by row.

Needed table is:

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6

答案1

得分: 1

以下是翻译好的部分:

你可以使用 stacksplit + explode 然后转换为 DataFrame:

out = (df.set_index('order').stack()  # 将其他列设置在一边并堆叠
         .str.split(',').explode()  # 将值展开为多行
         # 清理
         .reset_index('order', name='content').reset_index(drop=True)
      )
print(out)

输出结果:

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

使用 melt 的另一种方法:

(df.melt('order', value_name='content')
   .assign(content=lambda d: d['content'].str.split(','))
   .explode('content').drop(columns='variable')
)
英文:

You can stack, split+explode and convert to DataFrame:

out = (df.set_index('order').stack() # set other columns aside and stack
         .str.split(',').explode() # expand values to multiple rows
          # cleanup
         .reset_index('order', name='content').reset_index(drop=True)
      )
print(out)

Output:

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

Alternative with melt:

(df.melt('order', value_name='content')
   .assign(content=lambda d: d['content'].str.split(','))
   .explode('content').drop(columns='variable')
)

答案2

得分: 1

你还可以使用pd.DataFrame.meltpd.DataFrame.set_indexpd.DataFrame.pipepd.DataFrame.sort_index来完成相同的操作:

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')
 .pipe(lambda x: x['value'].str.split(',').explode())
 .sort_index()
)

或者更简洁的方法:

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')['value'].str.split(',').explode()
 .sort_index())
英文:

You could also use pd.DataFrame.melt, pd.DataFrame.set_index, pd.DataFrame.pipe, pd.DataFrame.sort_index:

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')
 .pipe(lambda x: x['value'].str.split(',').explode())
 .sort_index()
)

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

or even a more concise approach::

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')['value'].str.split(',').explode()
 .sort_index())

答案3

得分: 0

以下是翻译好的部分:

import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)

输出:

       order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

解释:使用 .str.splitexpand=True 将逗号分隔的值分成单独的列,然后使用 .concat 创建包含这些列的 DataFrame,接着使用 .melt 在 "order" 列上获取所需的结果,最后选择所需的列。

英文:

I would do it following way

import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)

output

   order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

Explanation: use .str.split with expand=True to get ,-sheared values into separate columns, then use .concat to create DataFrame from them, then use .melt at order to get desired result and then select required columns.

答案4

得分: 0

df1.set_index("order").agg(','.join, axis=1).map(lambda x: x.split(',')).explode().rename("content")

输出:

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6
英文:
df1.set_index("order").agg(','.join,axis=1).map(lambda x:x.split(",")).explode().rename("content")

out

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

huangapple
  • 本文由 发表于 2023年2月6日 02:59:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75354759.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定