2023年6月8日 17:45:57go评论70阅读模式

英文:

split string into many columns with value in the same string Python

问题

从“code”列开始，我需要（通过Python代码）将包含相关数字的每个字符串的名称作为列旋转（请参见下面的示例）

id	code	C	HD	HT	S
1	74C + 24HD	74	24	0	0
2	23C + 14HT + 3S	23	0	14	3
3	0	0	0	0	0

谢谢！

英文:

Starting from the column "code", I need (by Python code) to pivot columns with the name of each string that contain the associated number (see example below)

id	code	C	HD	HT	S
1	74C + 24HD	74	24	0	0
2	23C + 14HT + 3S	23	0	14	3
3	0	0	0	0	0

Thank u!

答案1

得分: 2

我假设你的初始数据框 df 看起来像这样：

   id             code
0   1       74C + 24HD
1   2  23C + 14HT + 3S
2   3                0
3   4    0.40C + 32.3P

如果是这种情况，你可以尝试使用 str.extractall 与 pivot 结合使用：使用一个匹配数字-列名组合的模式，将两个部分分组。然后，.extractall 将这些部分提取到单独的列中。包含列名部分的列可以使用 .pivot 拉入到列中。

new_cols_df = (
    df["code"].str.extractall(r"(\d+(?:\.\d*)?)(?P<column>[A-Z]+)").droplevel(1)
    .pivot(columns="column")
    .droplevel(0, axis=1).rename_axis(None, axis=1)
)
df = pd.concat([df, new_cols_df], axis=1).fillna(0)

得到：

   id             code     C  HD  HT     P  S
0   1       74C + 24HD    74  24   0     0  0
1   2  23C + 14HT + 3S    23   0  14     0  3
2   3                0     0   0   0     0  0
3   4    0.40C + 32.3P  0.40   0   0  32.3  0

如果你需要新列中的数值值：

df[df.columns[2:]] = df[df.columns[2:]].astype("float")

英文:

I'm assuming your initial dataframe df looks like

   id             code
0   1       74C + 24HD
1   2  23C + 14HT + 3S
2   3                0
3   4    0.40C + 32.3P

If that's the case then you could try to use .str.extractall in combination with .pivot: Use a pattern that matches the number-columnname-combinations and groups both parts. .extractall then extracts the parts into separate columns. The column that contains the column name part can then be pulled into columns with .pivot.

new_cols_df = (
    df[&quot;code&quot;].str.extractall(r&quot;(\d+(?:\.\d*)?)(?P&lt;column&gt;[A-Z]+)&quot;).droplevel(1)
    .pivot(columns=&quot;column&quot;)
    .droplevel(0, axis=1).rename_axis(None, axis=1)
)
df = pd.concat([df, new_cols_df], axis=1).fillna(0)

to get

   id             code     C  HD  HT     P  S
0   1       74C + 24HD    74  24   0     0  0
1   2  23C + 14HT + 3S    23   0  14     0  3
2   3                0     0   0   0     0  0
3   4    0.40C + 32.3P  0.40   0   0  32.3  0

In case you need numeric values in the new columns:

df[df.columns[2:]] = df[df.columns[2:]].astype(&quot;float&quot;)

答案2

得分: 1

我们可以在这里使用str.extract以及np.where：

df["C"] = np.where(~df["code"].str.extract(r'\b(\d+)C\b').isnull(), df["code"].str.extract(r'\b(\d+)C\b').astype(int), 0)
df["HD"] = np.where(~df["code"].str.extract(r'\b(\d+)HD\b').isnull(), df["code"].str.extract(r'\b(\d+)HD\b').astype(int), 0)
df["HT"] = np.where(~df["code"].str.extract(r'\b(\d+)HT\b').isnull(), df["code"].str.extract(r'\b(\d+)HT\b').astype(int), 0)
df["S"] = np.where(~df["code"].str.extract(r'\b(\d+)S\b').isnull(), df["code"].str.extract(r'\b(\d+)S\b').astype(int), 0)

英文:

We could use str.extract here along with np.where:

df[&quot;C&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)C\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)C\b&#39;).astype(int), 0)
df[&quot;HD&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HD\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HD\b&#39;).astype(int), 0)
df[&quot;HT&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HT\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HT\b&#39;).astype(int), 0)
df[&quot;S&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)S\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)S\b&#39;).astype(int), 0)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将字符串分割成多个列，列中包含相同的字符串值。 Python

问题

答案1

答案2

Beautiful Soup没有找到文件中存在的HTML。

在R中从遵循特定关键字的字符串创建一个数据框。

Why adding a object to an attribute of the another object of the same class have the side effect of adding it to both objects in python

KickMe Command（Python）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论