2023年3月1日 13:23:28go评论88阅读模式

英文:

Pandas split corresponding rows based on separator in two columns duplicating everything else

问题

I have an excel sheet

Col1    Col2                          Col3            Col4
John    English\nMaths                34\n33          Pass
Sam     Science                       40              Pass
Jack    English\nHistory\nGeography   89\n07\n98      Pass

Need to convert it to

Col1    Col2      Col3    Col4
John    English   34      Pass
John    Maths     33      Pass
Sam     Science   40      Pass
Jack    English   89      Pass
Jack    History   07      Pass     
Jack    Geography 98      Pass

The excel sheet has \n as separator for corresponding Col2 and col3 column. Just need to pull each subject in a new row with its corresponding marks and copy all the other column contents as it is.

英文:

I have an excel sheet

Col1    Col2                          Col3            Col4
John    English\nMaths                34\n33          Pass
Sam     Science                       40              Pass
Jack    English\nHistory\nGeography   89\n07\n98      Pass

Need to convert it to

Col1    Col2      Col3    Col4
John    English   34      Pass
John    Maths     33      Pass
Sam     Science   40      Pass
Jack    English   89      Pass
Jack    History   07      Pass     
Jack    Geography 98      Pass

The excel sheet has \n as separator for corresponding Col2 and col3 column. Just need to pull each subject in a new row with its corresponding marks and copy all the other column contents as it is.

Tried

split_cols = [&#39;Col2&#39;, &#39;Col3&#39;]
# loop over the columns and split them
separator = &#39;\n&#39;
for col in split_cols:
    df[[f&#39;{col}_Split1&#39;, f&#39;{col}_Split2&#39;]] = df[col].str.split(separator, n=1, expand=True).fillna(&#39;&#39;)
# create two new dataframes with the desired columns
df1 = df[[&#39;Col1&#39;, &#39;Col2_Split1&#39;, &#39;Col3_Split1&#39;, &#39;Col4&#39;]].rename(columns={&#39;Col2_Split1&#39;: &#39;D&#39;, &#39;Col3_Split1&#39;: &#39;C&#39;})
df2 = df[[&#39;Col1&#39;, &#39;Col2_Split2&#39;, &#39;Col3_Split2&#39;, &#39;Col4&#39;]].rename(columns={&#39;Col2_Split2&#39;: &#39;D&#39;, &#39;Col3_Split2&#39;: &#39;C&#39;})
# concatenate the two dataframes
final_df = pd.concat([df1, df2], ignore_index=True)
# print the final dataframe
print(final_df)

答案1

得分: 3

以下是您要翻译的内容：

# First pass
out = (df.assign(Col2=df[&#39;Col2&#39;].str.split(&#39;\n&#39;), 
                 Col3=df[&#39;Col3&#39;].str.split(&#39;\n&#39;)))
# Fix unbalanced lists
def pad(sr):
    n = max(sr.str.len())
    sr[&#39;Col2&#39;] = np.pad(sr[&#39;Col2&#39;], (0, n-len(sr[&#39;Col2&#39;])))
    sr[&#39;Col3&#39;] = np.pad(sr[&#39;Col3&#39;], (0, n-len(sr[&#39;Col3&#39;]))
    return sr
m = out[&#39;Col2&#39;].str.len() != out[&#39;Col3&#39;].str.len()
out.loc[m, [&#39;Col2&#39;, &#39;Col3&#39;]] = out.loc[m, [&#39;Col2&#39;, &#39;Col3&#39;]].apply(pad, axis=1)
# Second pass
out = out.explode([&#39;Col2&#39;, &#39;Col3&#39;], ignore_index=True)
print(out)

输入数据框：

import pandas as pd
import numpy as np
data = {&#39;Col1&#39;: [&#39;John&#39;, &#39;Sam&#39;, &#39;Jack&#39;, &#39;Ryan&#39;],
        &#39;Col2&#39;: [&#39;English\nMaths&#39;, &#39;Science&#39;, &#39;English\nHistory\nGeography&#39;, &#39;Maths\nScience\nHistory&#39;],
        &#39;Col3&#39;: [&#39;34\n33&#39;, &#39;40&#39;, &#39;89\n07\n98&#39;, &#39;12\n10&#39;],
        &#39;Col4&#39;: [&#39;Pass&#39;, &#39;Pass&#39;, &#39;Pass&#39;, &#39;Failed&#39;]}
df = pd.DataFrame(data)
print(df)

输出：

   Col1                         Col2        Col3    Col4
0  John               English\nMaths      34\n33    Pass
1   Sam                      Science          40    Pass
2  Jack  English\nHistory\nGeography  89\n07\n98    Pass
3  Ryan      Maths\nScience\nHistory      12\n10  Failed

英文:

You can explode on multiple columns (with a recent version of Pandas >= 1.3) after exploding each string into list:

# First pass
out = (df.assign(Col2=df[&#39;Col2&#39;].str.split(&#39;\n&#39;), 
                 Col3=df[&#39;Col3&#39;].str.split(&#39;\n&#39;)))
# Fix unbalanced lists
def pad(sr):
    n = max(sr.str.len())
    sr[&#39;Col2&#39;] = np.pad(sr[&#39;Col2&#39;], (0, n-len(sr[&#39;Col2&#39;])))
    sr[&#39;Col3&#39;] = np.pad(sr[&#39;Col3&#39;], (0, n-len(sr[&#39;Col3&#39;])))
    return sr
m = out[&#39;Col2&#39;].str.len() != out[&#39;Col3&#39;].str.len()
out.loc[m, [&#39;Col2&#39;, &#39;Col3&#39;]] = out.loc[m, [&#39;Col2&#39;, &#39;Col3&#39;]].apply(pad, axis=1)
# Second pass
out = out.explode([&#39;Col2&#39;, &#39;Col3&#39;], ignore_index=True)
print(out)
# Output
   Col1       Col2 Col3    Col4
0  John    English   34    Pass
1  John      Maths   33    Pass
2   Sam    Science   40    Pass
3  Jack    English   89    Pass
4  Jack    History   07    Pass
5  Jack  Geography   98    Pass
6  Ryan      Maths   12  Failed
7  Ryan    Science   10  Failed
8  Ryan    History    0  Failed

Input dataframe:

import pandas as pd
import numpy as np
data = {&#39;Col1&#39;: [&#39;John&#39;, &#39;Sam&#39;, &#39;Jack&#39;, &#39;Ryan&#39;],
        &#39;Col2&#39;: [&#39;English\nMaths&#39;, &#39;Science&#39;, &#39;English\nHistory\nGeography&#39;, &#39;Maths\nScience\nHistory&#39;],
        &#39;Col3&#39;: [&#39;34\n33&#39;, &#39;40&#39;, &#39;89\n07\n98&#39;, &#39;12\n10&#39;],
        &#39;Col4&#39;: [&#39;Pass&#39;, &#39;Pass&#39;, &#39;Pass&#39;, &#39;Failed&#39;]}
df = pd.DataFrame(data)
print(df)
# Output
   Col1                         Col2        Col3    Col4
0  John               English\nMaths      34\n33    Pass
1   Sam                      Science          40    Pass
2  Jack  English\nHistory\nGeography  89\n07\n98    Pass
3  Ryan      Maths\nScience\nHistory      12\n10  Failed

答案2

得分: 1

你可以使用.str.split + .explode方法来实现你的目标。

import pandas
df = pandas.DataFrame([
  ["John", "English\nMaths", "34\n33", "Pass"],
  ["Sam", "Science", "40", "Pass"],
  ["Jack", "English\nHistory\nGeography", "89\n07\n98", "Pass"],
])
df[1] = df[1].str.split("\n")
df[2] = df[2].str.split("\n")
df = df.explode([1, 2])
print(df)

英文:

EDITED.

You can achieve your goals using .str.split + .explode methods.

import pandas
df = pandas.DataFrame([
  [&quot;John&quot;, &quot;English\nMaths&quot;, &quot;34\n33&quot;, &quot;Pass&quot;],
  [&quot;Sam&quot;, &quot;Science&quot;, &quot;40&quot;, &quot;Pass&quot;],
  [&quot;Jack&quot;, &quot;English\nHistory\nGeography&quot;, &quot;89\n07\n98&quot;, &quot;Pass&quot;],
])
df[1] = df[1].str.split(&quot;\n&quot;)
df[2] = df[2].str.split(&quot;\n&quot;)
df = df.explode([1, 2])
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas根据两列中的分隔符拆分对应的行，并复制其他所有内容。

问题

答案1

答案2

在通过参数传递到函数的情况下，使用用户ID在BigQuery SQL查询中。

Python Selenium Undetected Chromedriver不适用于带有身份验证代理。

如何从HuggingFace的文本分类管道中获取模型的logits？

在Pandas中的递归函数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。