2023年2月6日 02:59:20go评论91阅读模式

英文:

Join the columns and list them one by one in row

问题

order	content
a	c1
a	c2
a	c3
a	b1
a	b2
a	b3
b	c4
b	c5
b	c6
b	b4
b	b5
b	b6

英文:

I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.

order	container	box
a	c1,c2,c3	b1,b2,b3
b	c4,c5,c6	b4,b5,b6

Need to get table with two columns: "order" and "content" with all values from both container and box.

I could only merge the container and box - but do not know how to list them row by row.

Needed table is:

order	content
a	c1
a	c2
a	c3
a	b1
a	b2
a	b3
b	c4
b	c5
b	c6
b	b4
b	b5
b	b6

答案1

得分: 1

以下是翻译好的部分：

你可以使用 stack，split + explode 然后转换为 DataFrame：

out = (df.set_index('order').stack()  # 将其他列设置在一边并堆叠
         .str.split(',').explode()  # 将值展开为多行
         # 清理
         .reset_index('order', name='content').reset_index(drop=True)
      )
print(out)

输出结果：

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

使用 melt 的另一种方法：

(df.melt('order', value_name='content')
   .assign(content=lambda d: d['content'].str.split(','))
   .explode('content').drop(columns='variable')
)

英文:

You can stack, split+explode and convert to DataFrame:

out = (df.set_index(&#39;order&#39;).stack() # set other columns aside and stack
         .str.split(&#39;,&#39;).explode() # expand values to multiple rows
          # cleanup
         .reset_index(&#39;order&#39;, name=&#39;content&#39;).reset_index(drop=True)
      )
print(out)

Output:

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

Alternative with melt:

(df.melt(&#39;order&#39;, value_name=&#39;content&#39;)
   .assign(content=lambda d: d[&#39;content&#39;].str.split(&#39;,&#39;))
   .explode(&#39;content&#39;).drop(columns=&#39;variable&#39;)
)

答案2

得分: 1

你还可以使用pd.DataFrame.melt，pd.DataFrame.set_index，pd.DataFrame.pipe，pd.DataFrame.sort_index来完成相同的操作：

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')
 .pipe(lambda x: x['value'].str.split(',').explode())
 .sort_index()
)

或者更简洁的方法：

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')['value'].str.split(',').explode()
 .sort_index())

英文:

You could also use pd.DataFrame.melt, pd.DataFrame.set_index, pd.DataFrame.pipe, pd.DataFrame.sort_index:

(df
 .melt(id_vars=&#39;order&#39;, value_vars=[&#39;container&#39;, &#39;box&#39;])
 .set_index(&#39;order&#39;)
 .pipe(lambda x: x[&#39;value&#39;].str.split(&#39;,&#39;).explode())
 .sort_index()
)
order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

or even a more concise approach::

(df
 .melt(id_vars=&#39;order&#39;, value_vars=[&#39;container&#39;, &#39;box&#39;])
 .set_index(&#39;order&#39;)[&#39;value&#39;].str.split(&#39;,&#39;).explode()
 .sort_index())

答案3

得分: 0

以下是翻译好的部分：

import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)

输出：

       order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

解释：使用 .str.split 和 expand=True 将逗号分隔的值分成单独的列，然后使用 .concat 创建包含这些列的 DataFrame，接着使用 .melt 在 "order" 列上获取所需的结果，最后选择所需的列。

英文:

I would do it following way

import pandas as pd
df = pd.DataFrame({&quot;order&quot;:[&quot;a&quot;,&quot;b&quot;],&quot;container&quot;:[&quot;c1,c2,c3&quot;,&quot;c4,c5,c6&quot;],&quot;box&quot;:[&quot;b1,b2,b3&quot;,&quot;b4,b5,b6&quot;]})
df2 = pd.concat([df.order,df.container.str.split(&quot;,&quot;,expand=True),df.box.str.split(&quot;,&quot;,expand=True)],axis=1)
df2 = df2.melt(&quot;order&quot;)[[&quot;order&quot;,&quot;value&quot;]]
print(df2)

output

   order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

Explanation: use .str.split with expand=True to get ,-sheared values into separate columns, then use .concat to create DataFrame from them, then use .melt at order to get desired result and then select required columns.

答案4

得分: 0

df1.set_index("order").agg(','.join, axis=1).map(lambda x: x.split(',')).explode().rename("content")

输出：

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

英文:

df1.set_index(&quot;order&quot;).agg(&#39;,&#39;.join,axis=1).map(lambda x:x.split(&quot;,&quot;)).explode().rename(&quot;content&quot;)

out

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

连接列，并逐行列出它们。

问题

答案1

答案2

答案3

答案4

禁用 PostgreSQL 索引更新暂时，并稍后手动更新索引以提高插入语句性能。

按照第二个单字对双字组列表进行排序如何？

“Databricks DLT pipeline with for..loop reports error ‘AnalysisException: Cannot redefine dataset'”

Python Code and Output in Bookdown pdf are not in multiple lines

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。