将字符串分割成多个列,列中包含相同的字符串值。 Python

huangapple go评论70阅读模式
英文:

split string into many columns with value in the same string Python

问题

从“code”列开始,我需要(通过Python代码)将包含相关数字的每个字符串的名称作为列旋转(请参见下面的示例)

id code C HD HT S
1 74C + 24HD 74 24 0 0
2 23C + 14HT + 3S 23 0 14 3
3 0 0 0 0 0

谢谢!

英文:

Starting from the column "code", I need (by Python code) to pivot columns with the name of each string that contain the associated number (see example below)

id code C HD HT S
1 74C + 24HD 74 24 0 0
2 23C + 14HT + 3S 23 0 14 3
3 0 0 0 0 0

Thank u!

答案1

得分: 2

我假设你的初始数据框 df 看起来像这样:

   id             code
0   1       74C + 24HD
1   2  23C + 14HT + 3S
2   3                0
3   4    0.40C + 32.3P

如果是这种情况,你可以尝试使用 str.extractallpivot 结合使用:使用一个匹配数字-列名组合的模式,将两个部分分组。然后,.extractall 将这些部分提取到单独的列中。包含列名部分的列可以使用 .pivot 拉入到列中。

new_cols_df = (
    df["code"].str.extractall(r"(\d+(?:\.\d*)?)(?P<column>[A-Z]+)").droplevel(1)
    .pivot(columns="column")
    .droplevel(0, axis=1).rename_axis(None, axis=1)
)
df = pd.concat([df, new_cols_df], axis=1).fillna(0)

得到:

   id             code     C  HD  HT     P  S
0   1       74C + 24HD    74  24   0     0  0
1   2  23C + 14HT + 3S    23   0  14     0  3
2   3                0     0   0   0     0  0
3   4    0.40C + 32.3P  0.40   0   0  32.3  0

如果你需要新列中的数值值:

df[df.columns[2:]] = df[df.columns[2:]].astype("float")
英文:

I'm assuming your initial dataframe df looks like

   id             code
0   1       74C + 24HD
1   2  23C + 14HT + 3S
2   3                0
3   4    0.40C + 32.3P

If that's the case then you could try to use .str.extractall in combination with .pivot: Use a pattern that matches the number-columnname-combinations and groups both parts. .extractall then extracts the parts into separate columns. The column that contains the column name part can then be pulled into columns with .pivot.

new_cols_df = (
    df[&quot;code&quot;].str.extractall(r&quot;(\d+(?:\.\d*)?)(?P&lt;column&gt;[A-Z]+)&quot;).droplevel(1)
    .pivot(columns=&quot;column&quot;)
    .droplevel(0, axis=1).rename_axis(None, axis=1)
)
df = pd.concat([df, new_cols_df], axis=1).fillna(0)

to get

   id             code     C  HD  HT     P  S
0   1       74C + 24HD    74  24   0     0  0
1   2  23C + 14HT + 3S    23   0  14     0  3
2   3                0     0   0   0     0  0
3   4    0.40C + 32.3P  0.40   0   0  32.3  0

In case you need numeric values in the new columns:

df[df.columns[2:]] = df[df.columns[2:]].astype(&quot;float&quot;)

答案2

得分: 1

我们可以在这里使用str.extract以及np.where

df["C"] = np.where(~df["code"].str.extract(r'\b(\d+)C\b').isnull(), df["code"].str.extract(r'\b(\d+)C\b').astype(int), 0)
df["HD"] = np.where(~df["code"].str.extract(r'\b(\d+)HD\b').isnull(), df["code"].str.extract(r'\b(\d+)HD\b').astype(int), 0)
df["HT"] = np.where(~df["code"].str.extract(r'\b(\d+)HT\b').isnull(), df["code"].str.extract(r'\b(\d+)HT\b').astype(int), 0)
df["S"] = np.where(~df["code"].str.extract(r'\b(\d+)S\b').isnull(), df["code"].str.extract(r'\b(\d+)S\b').astype(int), 0)
英文:

We could use str.extract here along with np.where:

<!-- language: python -->

df[&quot;C&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)C\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)C\b&#39;).astype(int), 0)
df[&quot;HD&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HD\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HD\b&#39;).astype(int), 0)
df[&quot;HT&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HT\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)HT\b&#39;).astype(int), 0)
df[&quot;S&quot;] = np.where(~df[&quot;code&quot;].str.extract(r&#39;\b(\d+)S\b&#39;).isnull(), df[&quot;code&quot;].str.extract(r&#39;\b(\d+)S\b&#39;).astype(int), 0)

huangapple
  • 本文由 发表于 2023年6月8日 17:45:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76430575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定