2023年2月6日 20:23:34go评论95阅读模式

英文:

Reshape dataframe from long to wide

问题

我的DataFrame：

d = {'project_id': [19, 20, 19, 20, 19, 20],
     'task_id': [11, 22, 11, 22, 11, 22],
     "task": ["task_1", "task_1", "task_1", "task_1", "task_1", "task_1"],
     "username": ["tom", "jery", "tom", "jery", "tom", "jery"],
     "image_id": [101, 202, 303, 404, 505, 606],
     "frame": [0, 0, 9, 8, 11, 11],
     "label": ['foo', 'foo', 'bar', 'xyz', 'bar', 'bar']}
df = pd.DataFrame(data=d)

我的DataFrame是长格式的，其中有一些重复，只有image_id是唯一的。我试图使用pd.pivot和pd.merge将它重塑为宽格式，以username为基础。

我的代码：

pd.pivot(df, index=['task', 'frame', 'image_id'], columns='username', values='label')

我的输出：

我期望（或希望达到）的结果：

所以，正如您所见，我实际上不需要在输出中包括image_id。只需要总结每个用户在每个帧上使用的标签。

英文:

my df:

d = {&#39;project_id&#39;: [19,20,19,20,19,20], 
     &#39;task_id&#39;: [11,22,11,22,11,22], 
     &quot;task&quot;: [&quot;task_1&quot;,&quot;task_1&quot;,&quot;task_1&quot;,&quot;task_1&quot;,&quot;task_1&quot;,&quot;task_1&quot;], 
     &quot;username&quot;: [&quot;tom&quot;,&quot;jery&quot;,&quot;tom&quot;,&quot;jery&quot;,&quot;tom&quot;,&quot;jery&quot;],
     &quot;image_id&quot;:[101,202,303,404,505,606],
     &quot;frame&quot;:[0,0,9,8,11,11],
     &quot;label&quot;:[&#39;foo&#39;,&#39;foo&#39;,&#39;bar&#39;,&#39;xyz&#39;,&#39;bar&#39;,&#39;bar&#39;]} 
df = pd.DataFrame(data=d)

So my df, is long format, in some duplicate and only image_id is unique.
I trying pivot my df, with pd.pivot and pd.merge reshape to wide format by username.
My code:

pd.pivot(df, index=[&#39;task&#39;,&#39;frame&#39;,&#39;image_id&#39;], columns = &#39;username&#39;, values=&#39;label&#39;)

My output:

I expected(or want to reach):

So, as you see, I don't really need image_id in my output. Just summary, which user use tag per frame.

答案1

得分: 1

你可以在pivot之后添加 groupby.first：

(pd.pivot(df, index=['task','frame','image_id'],
          columns='username', values='label')
   .groupby(level=['task','frame']).first()
)

或者使用 pivot_table，并且设置 aggfunc='first'：

pd.pivot_table(df, index=['task','frame'],
               columns='username', values='label',
               aggfunc='first')

输出结果：

username      jery   tom
task   frame            
task_1 0       foo   foo
       8       xyz  None
       9      None   bar
       11      bar   bar

英文:

You can add a groupby.first after the pivot:

(pd.pivot(df, index=[&#39;task&#39;,&#39;frame&#39;,&#39;image_id&#39;],
          columns=&#39;username&#39;, values=&#39;label&#39;)
   .groupby(level=[&#39;task&#39;,&#39;frame&#39;]).first()
)

Or use pivot_table with aggfunc='first':

pd.pivot_table(df, index=[&#39;task&#39;,&#39;frame&#39;],
               columns=&#39;username&#39;, values=&#39;label&#39;,
               aggfunc=&#39;first&#39;)

Output:

username      jery   tom
task   frame            
task_1 0       foo   foo
       8       xyz  None
       9      None   bar
       11      bar   bar

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数据框从长格式转换为宽格式。

问题

答案1

无法从AWS秘密管理器中列出所有秘密。

好的，以下是翻译好的内容：在VScode中查看矩阵和更高维数组的好方法。

Python基于ID填写列数值

Dataframe通过在另一个Dataframe中查找另一个列的出现来填充一列的值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。