以两列为基础复制行的更好方法,将这些列合并为单列。

huangapple go评论86阅读模式
英文:

Better way to duplicate rows based on two columns, merging those columns into a single column

问题

我有以下的Pandas数据框(DF)...
以两列为基础复制行的更好方法,将这些列合并为单列。

即。

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383

为了以后合并,最好有四行(而不是两行),并且injection_acqmethod_idinjection_damethod_id都放在method_id列中,如下所示...
以两列为基础复制行的更好方法,将这些列合并为单列。

即。

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id,method_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,None
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,53b34ff9-fec2-472d-a4d0-61e6029d586a
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,cd4cbbd9-5f23-4146-a499-9c90e3c73383

我正在使用以下代码...

_injections["method_id"] = (_injections.injection_acqmethod_id.astype(str) + "," + _injections.injection_damethod_id.astype(str)).str.split(",")
_injections = _injections.explode("method_id")

将列合并成列表,然后再展开似乎是不必要的工作。有没有更Pythonic/更快/更简洁的方法来做到这一点?

英文:

I have the following Padas DF...
以两列为基础复制行的更好方法,将这些列合并为单列。

I.e.

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383

In order for a later merge, it would be better if I had four rows (instead of two) and injection_acqmethod_id and injection_damethod_id were both simply in column method_id as follows...
以两列为基础复制行的更好方法,将这些列合并为单列。

I.e.

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id,method_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,None
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,53b34ff9-fec2-472d-a4d0-61e6029d586a
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,cd4cbbd9-5f23-4146-a499-9c90e3c73383

I'm using the following code...

_injections["method_id"] = (_injections.injection_acqmethod_id.astype(str) + "," + _injections.injection_damethod_id.astype(str)).str.split(",")
_injections = _injections.explode("method_id")

Merging the columns into a list and then exploding seems like unnecessary work. Is tere a more pythonic/faster/more-concise way to do this?

答案1

得分: 1

One idea with reshape:

cols = ['injection_acqmethod_id', 'injection_damethod_id']

out = (df.assign(**{f'_{x}_': df[x] for x in cols})
        .set_index(list(df.columns))
        .stack()
        .droplevel(-1)
        .reset_index(name='method_id'))
print(out)

Another idea with numpy ravel:

cols = ['injection_acqmethod_id', 'injection_damethod_id']
out = df.loc[df.index.repeat(len(cols))].assign(method_id=np.ravel(df[cols].to_numpy()))

Or with concat:

cols = ['injection_acqmethod_id', 'injection_damethod_id']

out = (pd.concat([df.assign(method_id=df[x]) for x in cols])
         .sort_index(kind='stable', ignore_index=True))
英文:

One idea with reshape:

cols = ['injection_acqmethod_id','injection_damethod_id']

out = (df.assign(**{f'_{x}_': df[x] for x in cols})
        .set_index(list(df.columns))
        .stack()
        .droplevel(-1)
        .reset_index(name='method_id'))
print (out)

Another idea with numpy ravel:

cols = ['injection_acqmethod_id','injection_damethod_id']
out = df.loc[df.index.repeat(len(cols))].assign(method_id = np.ravel(df[cols].to_numpy()))

Or with concat:

cols = ['injection_acqmethod_id','injection_damethod_id']

out = (pd.concat([df.assign(method_id=df[x]) for x in cols])
         .sort_index(kind='stable', ignore_index=True))

huangapple
  • 本文由 发表于 2023年7月6日 17:07:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76627219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定