2023年6月15日 02:35:32go评论98阅读模式

英文:

Pandas - Combine two rows into a single row and create new columns

问题

我有一个包含5列和数千行的数据框。我正在尝试基于日期时间、Mac、来源和关联（一个关联一个取消关联）将两行特定行合并为一行... 数据框如下：

数据框已排序，如下所示：

df.sort_values(['datetime', 'Association'], ascending=[True, False])

id	datetime	Mac	Source	Association
32310	2023-06-07 07:39:48	00:00:00:00:00:1e	B1-31	associated
36653	2023-06-07 07:40:02	00:00:00:00:00:1e	B1-31	disassociated
32313	2023-06-07 07:40:02	00:00:00:00:00:1e	B2-21	associated
32358	2023-06-07 07:40:19	00:00:00:00:00:1e	B2-21	disassociated
32341	2023-06-07 07:40:26	00:00:00:00:00:1e	PI-12	disassociated
37577	2023-06-07 07:40:26	00:00:00:00:00:1e	PS-19	associated
32955	2023-06-07 07:40:31	00:00:00:00:00:1e	PS-19	disassociated
32364	2023-06-07 07:40:31	00:00:00:00:00:1f	B2-21	associated
32046	2023-06-07 07:41:04	00:00:00:00:00:1f	B2-21	disassociated
32405	2023-06-07 07:41:04	00:00:00:00:00:1f	B1-31	associated
32431	2023-06-07 07:41:14	00:00:00:00:00:1f	B1-31	disassociated
33915	2023-06-07 07:41:14	00:00:00:00:00:1f	B3-24	associated
32458	2023-06-07 07:41:33	00:00:00:00:00:1f	B3-24	disassociated

我想要的输出如下：

id	datetime associated	datetime disassociated	Mac	Source
1	2023-06-07 07:39:48	2023-06-07 07:40:02	00:00:00:00:00:1e	B1-31
2	2023-06-07 07:40:02	2023-06-07 07:40:19	00:00:00:00:00:1e	B2-21
3	0000-00-00 00:00:00	2023-06-07 07:40:26	00:00:00:00:00:1e	PI-12
4	2023-06-07 07:40:26	2023-06-07 07:40:31	00:00:00:00:00:1e	PS-19
5	2023-06-07 07:40:31	2023-06-07 07:41:04	00:00:00:00:00:1f	B2-21
6	2023-06-07 07:41:04	2023-06-07 07:41:14	00:00:00:00:00:1f	B1-31
7	2023-06-07 07:41:14	2023-06-07 07:41:33	00:00:00:00:00:1f	B3-24

有什么建议吗？提前感谢。

英文:

I have a data frame that has 5 columns and thousands rows. I am trying to merge two specific rows into one single row based on datetime, Mac, Source and Association (one associated with one disassociated).... the data frame is like this

The dataframe is sorted, like that

df.sort_values([&#39;datetime&#39;, &#39;Association&#39;],ascending=[True,False])

id	datetime	Mac	Source	Association
32310	2023-06-07 07:39:48	00:00:00:00:00:1e	B1-31	associated
36653	2023-06-07 07:40:02	00:00:00:00:00:1e	B1-31	disassociated
32313	2023-06-07 07:40:02	00:00:00:00:00:1e	B2-21	associated
32358	2023-06-07 07:40:19	00:00:00:00:00:1e	B2-21	disassociated
32341	2023-06-07 07:40:26	00:00:00:00:00:1e	PI-12	disassociated
37577	2023-06-07 07:40:26	00:00:00:00:00:1e	PS-19	associated
32955	2023-06-07 07:40:31	00:00:00:00:00:1e	PS-19	disassociated
32364	2023-06-07 07:40:31	00:00:00:00:00:1f	B2-21	associated
32046	2023-06-07 07:41:04	00:00:00:00:00:1f	B2-21	disassociated
32405	2023-06-07 07:41:04	00:00:00:00:00:1f	B1-31	associated
32431	2023-06-07 07:41:14	00:00:00:00:00:1f	B1-31	disassociated
33915	2023-06-07 07:41:14	00:00:00:00:00:1f	B3-24	associated
32458	2023-06-07 07:41:33	00:00:00:00:00:1f	B3-24	disassociated

What I want the output as

id	datetime associated	datetime disassociated	Mac	Source
1	2023-06-07 07:39:48	2023-06-07 07:40:02	00:00:00:00:00:1e	B1-31
2	2023-06-07 07:40:02	2023-06-07 07:40:19	00:00:00:00:00:1e	B2-21
3	0000-00-00 00:00:00	2023-06-07 07:40:26	00:00:00:00:00:1e	PI-12
4	2023-06-07 07:40:26	2023-06-07 07:40:31	00:00:00:00:00:1e	PS-19
5	2023-06-07 07:40:31	2023-06-07 07:41:04	00:00:00:00:00:1f	B2-21
6	2023-06-07 07:41:04	2023-06-07 07:41:14	00:00:00:00:00:1f	B1-31
7	2023-06-07 07:41:14	2023-06-07 07:41:33	00:00:00:00:00:1f	B3-24

Any suggestions? Thanks in advance

答案1

得分: 1

This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])

        datetime
Association associated disassociated
Source  Mac     
B1-31   00:00:00:00:00:1e  2023-06-14 07:39:48  2023-06-14 07:40:02
        00:00:00:00:00:1f  2023-06-14 07:41:04  2023-06-14 07:41:14
B2-21   00:00:00:00:00:1e  2023-06-14 07:40:02  2023-06-14 07:40:19
        00:00:00:00:00:1f  2023-06-14 07:40:31  2023-06-14 07:41:04
B3-24   00:00:00:00:00:1f  2023-06-14 07:41:14  2023-06-14 07:41:33
PI-12   00:00:00:00:00:1e  NaT  2023-06-14 07:40:26
PS-19   00:00:00:00:00:1e  2023-06-14 07:40:26  2023-06-14 07:40:31

英文:

This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])

		datetime
        Association	        associated	        disassociated
Source	Mac		
B1-31	00:00:00:00:00:1e	2023-06-14 07:39:48	2023-06-14 07:40:02
        00:00:00:00:00:1f	2023-06-14 07:41:04	2023-06-14 07:41:14
B2-21	00:00:00:00:00:1e	2023-06-14 07:40:02	2023-06-14 07:40:19
        00:00:00:00:00:1f	2023-06-14 07:40:31	2023-06-14 07:41:04
B3-24	00:00:00:00:00:1f	2023-06-14 07:41:14	2023-06-14 07:41:33
PI-12	00:00:00:00:00:1e	NaT	                2023-06-14 07:40:26
PS-19	00:00:00:00:00:1e	2023-06-14 07:40:26	2023-06-14 07:40:31
</details>
# 答案2
**得分**: 1
以下是翻译好的代码部分：
```python
df2 = df.pivot(
    index=["Mac", "Source"],
    columns="Association",
    values="datetime",
).reset_index()
df2.columns.name = None
df2 = df2.rename(
    columns={
        "associated": "datetime associated",
        "disassociated": "datetime disassociated",
    }
)
df2['datetime associated'] = df2['datetime associated'].astype('datetime64')
df2['datetime disassociated'] = df2['datetime disassociated'].astype('datetime64')
df2 = df2[['datetime associated', 'datetime disassociated', 'Mac', 'Source']]

英文:

Just a continuation of spinodal's solution:

df2 = df.pivot(
    index=[&quot;Mac&quot;, &quot;Source&quot;],
    columns=&quot;Association&quot;,
    values=&quot;datetime&quot;,
).reset_index()
df2.columns.name = None
df2 = df2.rename(
    columns={
        &quot;associated&quot;: &quot;datetime associated&quot;,
        &quot;disassociated&quot;: &quot;datetime disassociated&quot;,
    }
)
df2[&#39;datetime associated&#39;] = df2[&#39;datetime associated&#39;].astype(&#39;datetime64&#39;)
df2[&#39;datetime disassociated&#39;] = df2[&#39;datetime disassociated&#39;].astype(&#39;datetime64&#39;)
df2 = df2[[&#39;datetime associated&#39;, &#39;datetime disassociated&#39;, &#39;Mac&#39;, &#39;Source&#39;]]

答案3

得分: 0

Pivot 是一个好方法：

df = data.pivot(index=('Mac', 'Source'), columns='Association', values='datetime').reset_index()

英文:

Pivot is the way to go:

df = data.pivot(index=('Mac', 'Source'), columns='Association', values='datetime').reset_index()

答案4

得分: 0

这是你可以尝试的一种方法：

    import pandas as pd
    import numpy as np
    
    data = pd.DataFrame({
        'id': [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
        'datetime': ['2023-06-07 07:39:48', '2023-06-07 07:40:02', '2023-06-07 07:40:02', '2023-06-07 07:40:19',
                     '2023-06-07 07:40:26', '2023-06-07 07:40:26', '2023-06-07 07:40:31', '2023-06-07 07:40:31',
                     '2023-06-07 07:41:04', '2023-06-07 07:41:04', '2023-06-07 07:41:14', '2023-06-07 07:41:14',
                     '2023-06-07 07:41:33'],
        'Mac': ['00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e',
                '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1f',
                '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f',
                '00:00:00:00:00:1f'],
        'Source': ['B1-31', 'B1-31', 'B2-21', 'B2-21', 'PI-12', 'PS-19', 'PS-19', 'B2-21', 'B2-21',
                   'B1-31', 'B1-31', 'B3-24', 'B3-24'],
        'Association': ['associated', 'disassociated', 'associated', 'disassociated', 'disassociated',
                        'associated', 'disassociated', 'associated', 'disassociated', 'associated',
                        'disassociated', 'associated', 'disassociated']
    })
    
    # Sort the dataframe by datetime and Association
    data = data.sort_values(['datetime', 'Association'], ascending=[True, False])
    
    _condition_1 = (data.Source == data.Source.shift(-1))
    _condition_2 = (data.Association == "associated") & (data.Association.shift(-1) == "disassociated")
    new_df = data.copy()
    new_df["datetime associated"] = np.where((_condition_1 & _condition_2),data.datetime,np.nan)
    new_df["datetime diassociated"] = np.where((_condition_1 & _condition_2),data.datetime.shift(-1),np.nan)
    new_df = new_df.dropna()
    new_df.drop(['datetime'],axis=1, inplace=True)
    display(new_df)

告诉我是否回答了你的问题。

英文:

Here is an approach you could try :

import pandas as pd
import numpy as np
data = pd.DataFrame({
&#39;id&#39;: [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
&#39;datetime&#39;: [&#39;2023-06-07 07:39:48&#39;, &#39;2023-06-07 07:40:02&#39;, &#39;2023-06-07 07:40:02&#39;, &#39;2023-06-07 07:40:19&#39;,
&#39;2023-06-07 07:40:26&#39;, &#39;2023-06-07 07:40:26&#39;, &#39;2023-06-07 07:40:31&#39;, &#39;2023-06-07 07:40:31&#39;,
&#39;2023-06-07 07:41:04&#39;, &#39;2023-06-07 07:41:04&#39;, &#39;2023-06-07 07:41:14&#39;, &#39;2023-06-07 07:41:14&#39;,
&#39;2023-06-07 07:41:33&#39;],
&#39;Mac&#39;: [&#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;,
&#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1f&#39;,
&#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;,
&#39;00:00:00:00:00:1f&#39;],
&#39;Source&#39;: [&#39;B1-31&#39;, &#39;B1-31&#39;, &#39;B2-21&#39;, &#39;B2-21&#39;, &#39;PI-12&#39;, &#39;PS-19&#39;, &#39;PS-19&#39;, &#39;B2-21&#39;, &#39;B2-21&#39;,
&#39;B1-31&#39;, &#39;B1-31&#39;, &#39;B3-24&#39;, &#39;B3-24&#39;],
&#39;Association&#39;: [&#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;, &#39;disassociated&#39;,
&#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;,
&#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;]
})
# Sort the dataframe by datetime and Association
data = data.sort_values([&#39;datetime&#39;, &#39;Association&#39;], ascending=[True, False])
_condition_1 = (data.Source == data.Source.shift(-1))
_condition_2 = (data.Association == &quot;associated&quot;) &amp; (data.Association.shift(-1) == &quot;disassociated&quot;)
new_df = data.copy()
new_df[&quot;datetime associated&quot;] = np.where((_condition_1 &amp; _condition_2),data.datetime,np.nan)
new_df[&quot;datetime diassociated&quot;] = np.where((_condition_1 &amp; _condition_2),data.datetime.shift(-1),np.nan)
new_df = new_df.dropna()
new_df.drop([&#39;datetime&#39;],axis=1, inplace=True)
display(new_df)

Tell me if it answers your question

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas – 将两行合并为一行并创建新列

问题

答案1

答案3

答案4

检测CSV文件中的多个标题

奇怪的行为与Python中的randint

pandas将一组API函数应用于多个数据框。

关于缺失的分页元素，需要一些爬取指导。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。