连接列,并逐行列出它们。

huangapple go评论91阅读模式
英文:

Join the columns and list them one by one in row

问题

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
英文:

I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.

order container box
a c1,c2,c3 b1,b2,b3
b c4,c5,c6 b4,b5,b6

Need to get table with two columns: "order" and "content" with all values from both container and box.

I could only merge the container and box - but do not know how to list them row by row.

Needed table is:

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6

答案1

得分: 1

以下是翻译好的部分:

你可以使用 stacksplit + explode 然后转换为 DataFrame:

  1. out = (df.set_index('order').stack() # 将其他列设置在一边并堆叠
  2. .str.split(',').explode() # 将值展开为多行
  3. # 清理
  4. .reset_index('order', name='content').reset_index(drop=True)
  5. )
  6. print(out)

输出结果:

  1. order content
  2. 0 a c1
  3. 1 a c2
  4. 2 a c3
  5. 3 a b1
  6. 4 a b2
  7. 5 a b3
  8. 6 b c4
  9. 7 b c5
  10. 8 b c6
  11. 9 b b4
  12. 10 b b5
  13. 11 b b6

使用 melt 的另一种方法:

  1. (df.melt('order', value_name='content')
  2. .assign(content=lambda d: d['content'].str.split(','))
  3. .explode('content').drop(columns='variable')
  4. )
英文:

You can stack, split+explode and convert to DataFrame:

  1. out = (df.set_index('order').stack() # set other columns aside and stack
  2. .str.split(',').explode() # expand values to multiple rows
  3. # cleanup
  4. .reset_index('order', name='content').reset_index(drop=True)
  5. )
  6. print(out)

Output:

  1. order content
  2. 0 a c1
  3. 1 a c2
  4. 2 a c3
  5. 3 a b1
  6. 4 a b2
  7. 5 a b3
  8. 6 b c4
  9. 7 b c5
  10. 8 b c6
  11. 9 b b4
  12. 10 b b5
  13. 11 b b6

Alternative with melt:

  1. (df.melt('order', value_name='content')
  2. .assign(content=lambda d: d['content'].str.split(','))
  3. .explode('content').drop(columns='variable')
  4. )

答案2

得分: 1

你还可以使用pd.DataFrame.meltpd.DataFrame.set_indexpd.DataFrame.pipepd.DataFrame.sort_index来完成相同的操作:

  1. (df
  2. .melt(id_vars='order', value_vars=['container', 'box'])
  3. .set_index('order')
  4. .pipe(lambda x: x['value'].str.split(',').explode())
  5. .sort_index()
  6. )

或者更简洁的方法:

  1. (df
  2. .melt(id_vars='order', value_vars=['container', 'box'])
  3. .set_index('order')['value'].str.split(',').explode()
  4. .sort_index())
英文:

You could also use pd.DataFrame.melt, pd.DataFrame.set_index, pd.DataFrame.pipe, pd.DataFrame.sort_index:

  1. (df
  2. .melt(id_vars='order', value_vars=['container', 'box'])
  3. .set_index('order')
  4. .pipe(lambda x: x['value'].str.split(',').explode())
  5. .sort_index()
  6. )
  7. order
  8. a c1
  9. a c2
  10. a c3
  11. a b1
  12. a b2
  13. a b3
  14. b c4
  15. b c5
  16. b c6
  17. b b4
  18. b b5
  19. b b6

or even a more concise approach::

  1. (df
  2. .melt(id_vars='order', value_vars=['container', 'box'])
  3. .set_index('order')['value'].str.split(',').explode()
  4. .sort_index())

答案3

得分: 0

以下是翻译好的部分:

  1. import pandas as pd
  2. df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
  3. df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
  4. df2 = df2.melt("order")[["order","value"]]
  5. print(df2)

输出:

  1. order value
  2. 0 a c1
  3. 1 b c4
  4. 2 a c2
  5. 3 b c5
  6. 4 a c3
  7. 5 b c6
  8. 6 a b1
  9. 7 b b4
  10. 8 a b2
  11. 9 b b5
  12. 10 a b3
  13. 11 b b6

解释:使用 .str.splitexpand=True 将逗号分隔的值分成单独的列,然后使用 .concat 创建包含这些列的 DataFrame,接着使用 .melt 在 "order" 列上获取所需的结果,最后选择所需的列。

英文:

I would do it following way

  1. import pandas as pd
  2. df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
  3. df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
  4. df2 = df2.melt("order")[["order","value"]]
  5. print(df2)

output

  1. order value
  2. 0 a c1
  3. 1 b c4
  4. 2 a c2
  5. 3 b c5
  6. 4 a c3
  7. 5 b c6
  8. 6 a b1
  9. 7 b b4
  10. 8 a b2
  11. 9 b b5
  12. 10 a b3
  13. 11 b b6

Explanation: use .str.split with expand=True to get ,-sheared values into separate columns, then use .concat to create DataFrame from them, then use .melt at order to get desired result and then select required columns.

答案4

得分: 0

df1.set_index("order").agg(','.join, axis=1).map(lambda x: x.split(',')).explode().rename("content")

输出:

  1. order
  2. a c1
  3. a c2
  4. a c3
  5. a b1
  6. a b2
  7. a b3
  8. b c4
  9. b c5
  10. b c6
  11. b b4
  12. b b5
  13. b b6
英文:
  1. df1.set_index("order").agg(','.join,axis=1).map(lambda x:x.split(",")).explode().rename("content")

out

  1. order
  2. a c1
  3. a c2
  4. a c3
  5. a b1
  6. a b2
  7. a b3
  8. b c4
  9. b c5
  10. b c6
  11. b b4
  12. b b5
  13. b b6

huangapple
  • 本文由 发表于 2023年2月6日 02:59:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75354759.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定