2023年7月6日 17:07:32go评论111阅读模式

英文:

Better way to duplicate rows based on two columns, merging those columns into a single column

问题

我有以下的Pandas数据框（DF）...
以两列为基础复制行的更好方法，将这些列合并为单列。

即。

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383

为了以后合并，最好有四行（而不是两行），并且injection_acqmethod_id和injection_damethod_id都放在method_id列中，如下所示...
以两列为基础复制行的更好方法，将这些列合并为单列。

即。

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id,method_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,None
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,53b34ff9-fec2-472d-a4d0-61e6029d586a
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,cd4cbbd9-5f23-4146-a499-9c90e3c73383

我正在使用以下代码...

_injections[&quot;method_id&quot;] = (_injections.injection_acqmethod_id.astype(str) + &quot;,&quot; + _injections.injection_damethod_id.astype(str)).str.split(&quot;,&quot;)
_injections = _injections.explode(&quot;method_id&quot;)

将列合并成列表，然后再展开似乎是不必要的工作。有没有更Pythonic/更快/更简洁的方法来做到这一点？

英文:

I have the following Padas DF...

I.e.

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383

In order for a later merge, it would be better if I had four rows (instead of two) and injection_acqmethod_id and injection_damethod_id were both simply in column method_id as follows...

I.e.

,resultset_id,resultsetrevision_id,injection_id,injection_acqmethod_id,injection_damethod_id,method_id
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3
0,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,5cff24fc-f1b8-43b1-98a5-39fc41c27a33,f85b0a52-52a8-4e8d-93c3-54be11c7f8c3,,None
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,53b34ff9-fec2-472d-a4d0-61e6029d586a
1,8c502f71-9965-43c9-b3be-e7988a2fc89e,023c8953-565e-4953-991a-a842e0444e67,6c005f00-8654-4ebc-8e42-c92bd4a5fa64,53b34ff9-fec2-472d-a4d0-61e6029d586a,cd4cbbd9-5f23-4146-a499-9c90e3c73383,cd4cbbd9-5f23-4146-a499-9c90e3c73383

I'm using the following code...

_injections[&quot;method_id&quot;] = (_injections.injection_acqmethod_id.astype(str) + &quot;,&quot; + _injections.injection_damethod_id.astype(str)).str.split(&quot;,&quot;)
_injections = _injections.explode(&quot;method_id&quot;)

Merging the columns into a list and then exploding seems like unnecessary work. Is tere a more pythonic/faster/more-concise way to do this?

答案1

得分: 1

One idea with reshape:

cols = ['injection_acqmethod_id', 'injection_damethod_id']
out = (df.assign(**{f'_{x}_': df[x] for x in cols})
        .set_index(list(df.columns))
        .stack()
        .droplevel(-1)
        .reset_index(name='method_id'))
print(out)

Another idea with numpy ravel:

cols = ['injection_acqmethod_id', 'injection_damethod_id']
out = df.loc[df.index.repeat(len(cols))].assign(method_id=np.ravel(df[cols].to_numpy()))

Or with concat:

cols = ['injection_acqmethod_id', 'injection_damethod_id']
out = (pd.concat([df.assign(method_id=df[x]) for x in cols])
         .sort_index(kind='stable', ignore_index=True))

英文:

One idea with reshape:

cols = [&#39;injection_acqmethod_id&#39;,&#39;injection_damethod_id&#39;]
out = (df.assign(**{f&#39;_{x}_&#39;: df[x] for x in cols})
        .set_index(list(df.columns))
        .stack()
        .droplevel(-1)
        .reset_index(name=&#39;method_id&#39;))
print (out)

Another idea with numpy ravel:

cols = [&#39;injection_acqmethod_id&#39;,&#39;injection_damethod_id&#39;]
out = df.loc[df.index.repeat(len(cols))].assign(method_id = np.ravel(df[cols].to_numpy()))

Or with concat:

cols = [&#39;injection_acqmethod_id&#39;,&#39;injection_damethod_id&#39;]
out = (pd.concat([df.assign(method_id=df[x]) for x in cols])
         .sort_index(kind=&#39;stable&#39;, ignore_index=True))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

以两列为基础复制行的更好方法，将这些列合并为单列。

问题

答案1

“AttributeError: ‘property’ object has no attribute ‘get'”在使用FastAPI中使用Depends时发生

Python：嵌套JSON转DataFrame

Python if语句出现问题：’Series的真值是不明确的’

无法定位弹出窗口按钮使用Selenium。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。