Pandas – 将两行合并为一行并创建新列

huangapple go评论65阅读模式
英文:

Pandas - Combine two rows into a single row and create new columns

问题

我有一个包含5列和数千行的数据框。我正在尝试基于日期时间、Mac、来源和关联(一个关联一个取消关联)将两行特定行合并为一行... 数据框如下:

数据框已排序,如下所示:

df.sort_values(['datetime', 'Association'], ascending=[True, False])
id datetime Mac Source Association
32310 2023-06-07 07:39:48 00:00:00:00:00:1e B1-31 associated
36653 2023-06-07 07:40:02 00:00:00:00:00:1e B1-31 disassociated
32313 2023-06-07 07:40:02 00:00:00:00:00:1e B2-21 associated
32358 2023-06-07 07:40:19 00:00:00:00:00:1e B2-21 disassociated
32341 2023-06-07 07:40:26 00:00:00:00:00:1e PI-12 disassociated
37577 2023-06-07 07:40:26 00:00:00:00:00:1e PS-19 associated
32955 2023-06-07 07:40:31 00:00:00:00:00:1e PS-19 disassociated
32364 2023-06-07 07:40:31 00:00:00:00:00:1f B2-21 associated
32046 2023-06-07 07:41:04 00:00:00:00:00:1f B2-21 disassociated
32405 2023-06-07 07:41:04 00:00:00:00:00:1f B1-31 associated
32431 2023-06-07 07:41:14 00:00:00:00:00:1f B1-31 disassociated
33915 2023-06-07 07:41:14 00:00:00:00:00:1f B3-24 associated
32458 2023-06-07 07:41:33 00:00:00:00:00:1f B3-24 disassociated

我想要的输出如下:

id datetime associated datetime disassociated Mac Source
1 2023-06-07 07:39:48 2023-06-07 07:40:02 00:00:00:00:00:1e B1-31
2 2023-06-07 07:40:02 2023-06-07 07:40:19 00:00:00:00:00:1e B2-21
3 0000-00-00 00:00:00 2023-06-07 07:40:26 00:00:00:00:00:1e PI-12
4 2023-06-07 07:40:26 2023-06-07 07:40:31 00:00:00:00:00:1e PS-19
5 2023-06-07 07:40:31 2023-06-07 07:41:04 00:00:00:00:00:1f B2-21
6 2023-06-07 07:41:04 2023-06-07 07:41:14 00:00:00:00:00:1f B1-31
7 2023-06-07 07:41:14 2023-06-07 07:41:33 00:00:00:00:00:1f B3-24

有什么建议吗?提前感谢。

英文:

I have a data frame that has 5 columns and thousands rows. I am trying to merge two specific rows into one single row based on datetime, Mac, Source and Association (one associated with one disassociated).... the data frame is like this

The dataframe is sorted, like that

df.sort_values(['datetime', 'Association'],ascending=[True,False])
id datetime Mac Source Association
32310 2023-06-07 07:39:48 00:00:00:00:00:1e B1-31 associated
36653 2023-06-07 07:40:02 00:00:00:00:00:1e B1-31 disassociated
32313 2023-06-07 07:40:02 00:00:00:00:00:1e B2-21 associated
32358 2023-06-07 07:40:19 00:00:00:00:00:1e B2-21 disassociated
32341 2023-06-07 07:40:26 00:00:00:00:00:1e PI-12 disassociated
37577 2023-06-07 07:40:26 00:00:00:00:00:1e PS-19 associated
32955 2023-06-07 07:40:31 00:00:00:00:00:1e PS-19 disassociated
32364 2023-06-07 07:40:31 00:00:00:00:00:1f B2-21 associated
32046 2023-06-07 07:41:04 00:00:00:00:00:1f B2-21 disassociated
32405 2023-06-07 07:41:04 00:00:00:00:00:1f B1-31 associated
32431 2023-06-07 07:41:14 00:00:00:00:00:1f B1-31 disassociated
33915 2023-06-07 07:41:14 00:00:00:00:00:1f B3-24 associated
32458 2023-06-07 07:41:33 00:00:00:00:00:1f B3-24 disassociated

What I want the output as

id datetime associated datetime disassociated Mac Source
1 2023-06-07 07:39:48 2023-06-07 07:40:02 00:00:00:00:00:1e B1-31
2 2023-06-07 07:40:02 2023-06-07 07:40:19 00:00:00:00:00:1e B2-21
3 0000-00-00 00:00:00 2023-06-07 07:40:26 00:00:00:00:00:1e PI-12
4 2023-06-07 07:40:26 2023-06-07 07:40:31 00:00:00:00:00:1e PS-19
5 2023-06-07 07:40:31 2023-06-07 07:41:04 00:00:00:00:00:1f B2-21
6 2023-06-07 07:41:04 2023-06-07 07:41:14 00:00:00:00:00:1f B1-31
7 2023-06-07 07:41:14 2023-06-07 07:41:33 00:00:00:00:00:1f B3-24

Any suggestions? Thanks in advance

答案1

得分: 1

This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])

        datetime
Association associated disassociated
Source  Mac     
B1-31   00:00:00:00:00:1e  2023-06-14 07:39:48  2023-06-14 07:40:02
        00:00:00:00:00:1f  2023-06-14 07:41:04  2023-06-14 07:41:14
B2-21   00:00:00:00:00:1e  2023-06-14 07:40:02  2023-06-14 07:40:19
        00:00:00:00:00:1f  2023-06-14 07:40:31  2023-06-14 07:41:04
B3-24   00:00:00:00:00:1f  2023-06-14 07:41:14  2023-06-14 07:41:33
PI-12   00:00:00:00:00:1e  NaT  2023-06-14 07:40:26
PS-19   00:00:00:00:00:1e  2023-06-14 07:40:26  2023-06-14 07:40:31
英文:

This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])

		datetime
        Association	        associated	        disassociated
Source	Mac		
B1-31	00:00:00:00:00:1e	2023-06-14 07:39:48	2023-06-14 07:40:02
        00:00:00:00:00:1f	2023-06-14 07:41:04	2023-06-14 07:41:14
B2-21	00:00:00:00:00:1e	2023-06-14 07:40:02	2023-06-14 07:40:19
        00:00:00:00:00:1f	2023-06-14 07:40:31	2023-06-14 07:41:04
B3-24	00:00:00:00:00:1f	2023-06-14 07:41:14	2023-06-14 07:41:33
PI-12	00:00:00:00:00:1e	NaT	                2023-06-14 07:40:26
PS-19	00:00:00:00:00:1e	2023-06-14 07:40:26	2023-06-14 07:40:31

</details>



# 答案2
**得分**: 1

以下是翻译好的代码部分:

```python
df2 = df.pivot(
    index=["Mac", "Source"],
    columns="Association",
    values="datetime",
).reset_index()

df2.columns.name = None

df2 = df2.rename(
    columns={
        "associated": "datetime associated",
        "disassociated": "datetime disassociated",
    }
)

df2['datetime associated'] = df2['datetime associated'].astype('datetime64')
df2['datetime disassociated'] = df2['datetime disassociated'].astype('datetime64')

df2 = df2[['datetime associated', 'datetime disassociated', 'Mac', 'Source']]
英文:

Just a continuation of spinodal's solution:

df2 = df.pivot(
    index=[&quot;Mac&quot;, &quot;Source&quot;],
    columns=&quot;Association&quot;,
    values=&quot;datetime&quot;,
).reset_index()

df2.columns.name = None

df2 = df2.rename(
    columns={
        &quot;associated&quot;: &quot;datetime associated&quot;,
        &quot;disassociated&quot;: &quot;datetime disassociated&quot;,
    }
)

df2[&#39;datetime associated&#39;] = df2[&#39;datetime associated&#39;].astype(&#39;datetime64&#39;)
df2[&#39;datetime disassociated&#39;] = df2[&#39;datetime disassociated&#39;].astype(&#39;datetime64&#39;)

df2 = df2[[&#39;datetime associated&#39;, &#39;datetime disassociated&#39;, &#39;Mac&#39;, &#39;Source&#39;]]

答案3

得分: 0

Pivot 是一个好方法:

df = data.pivot(index=('Mac', 'Source'), columns='Association', values='datetime').reset_index()

英文:

Pivot is the way to go:

df = data.pivot(index=(&#39;Mac&#39;, &#39;Source&#39;), columns=&#39;Association&#39;, values=&#39;datetime&#39;).reset_index()

答案4

得分: 0

这是你可以尝试的一种方法:

    import pandas as pd
    import numpy as np
    
    data = pd.DataFrame({
        'id': [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
        'datetime': ['2023-06-07 07:39:48', '2023-06-07 07:40:02', '2023-06-07 07:40:02', '2023-06-07 07:40:19',
                     '2023-06-07 07:40:26', '2023-06-07 07:40:26', '2023-06-07 07:40:31', '2023-06-07 07:40:31',
                     '2023-06-07 07:41:04', '2023-06-07 07:41:04', '2023-06-07 07:41:14', '2023-06-07 07:41:14',
                     '2023-06-07 07:41:33'],
        'Mac': ['00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e',
                '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1f',
                '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f',
                '00:00:00:00:00:1f'],
        'Source': ['B1-31', 'B1-31', 'B2-21', 'B2-21', 'PI-12', 'PS-19', 'PS-19', 'B2-21', 'B2-21',
                   'B1-31', 'B1-31', 'B3-24', 'B3-24'],
        'Association': ['associated', 'disassociated', 'associated', 'disassociated', 'disassociated',
                        'associated', 'disassociated', 'associated', 'disassociated', 'associated',
                        'disassociated', 'associated', 'disassociated']
    })
    
    # Sort the dataframe by datetime and Association
    data = data.sort_values(['datetime', 'Association'], ascending=[True, False])
    
    _condition_1 = (data.Source == data.Source.shift(-1))
    _condition_2 = (data.Association == "associated") & (data.Association.shift(-1) == "disassociated")
    new_df = data.copy()
    new_df["datetime associated"] = np.where((_condition_1 & _condition_2),data.datetime,np.nan)
    new_df["datetime diassociated"] = np.where((_condition_1 & _condition_2),data.datetime.shift(-1),np.nan)
    new_df = new_df.dropna()
    new_df.drop(['datetime'],axis=1, inplace=True)
    display(new_df)

告诉我是否回答了你的问题。

英文:

Here is an approach you could try :

import pandas as pd
import numpy as np
data = pd.DataFrame({
&#39;id&#39;: [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
&#39;datetime&#39;: [&#39;2023-06-07 07:39:48&#39;, &#39;2023-06-07 07:40:02&#39;, &#39;2023-06-07 07:40:02&#39;, &#39;2023-06-07 07:40:19&#39;,
&#39;2023-06-07 07:40:26&#39;, &#39;2023-06-07 07:40:26&#39;, &#39;2023-06-07 07:40:31&#39;, &#39;2023-06-07 07:40:31&#39;,
&#39;2023-06-07 07:41:04&#39;, &#39;2023-06-07 07:41:04&#39;, &#39;2023-06-07 07:41:14&#39;, &#39;2023-06-07 07:41:14&#39;,
&#39;2023-06-07 07:41:33&#39;],
&#39;Mac&#39;: [&#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;,
&#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1e&#39;, &#39;00:00:00:00:00:1f&#39;,
&#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;, &#39;00:00:00:00:00:1f&#39;,
&#39;00:00:00:00:00:1f&#39;],
&#39;Source&#39;: [&#39;B1-31&#39;, &#39;B1-31&#39;, &#39;B2-21&#39;, &#39;B2-21&#39;, &#39;PI-12&#39;, &#39;PS-19&#39;, &#39;PS-19&#39;, &#39;B2-21&#39;, &#39;B2-21&#39;,
&#39;B1-31&#39;, &#39;B1-31&#39;, &#39;B3-24&#39;, &#39;B3-24&#39;],
&#39;Association&#39;: [&#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;, &#39;disassociated&#39;,
&#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;, &#39;associated&#39;,
&#39;disassociated&#39;, &#39;associated&#39;, &#39;disassociated&#39;]
})
# Sort the dataframe by datetime and Association
data = data.sort_values([&#39;datetime&#39;, &#39;Association&#39;], ascending=[True, False])
_condition_1 = (data.Source == data.Source.shift(-1))
_condition_2 = (data.Association == &quot;associated&quot;) &amp; (data.Association.shift(-1) == &quot;disassociated&quot;)
new_df = data.copy()
new_df[&quot;datetime associated&quot;] = np.where((_condition_1 &amp; _condition_2),data.datetime,np.nan)
new_df[&quot;datetime diassociated&quot;] = np.where((_condition_1 &amp; _condition_2),data.datetime.shift(-1),np.nan)
new_df = new_df.dropna()
new_df.drop([&#39;datetime&#39;],axis=1, inplace=True)
display(new_df)

Tell me if it answers your question

huangapple
  • 本文由 发表于 2023年6月15日 02:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76476626.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定