英文:
Pandas - Combine two rows into a single row and create new columns
问题
我有一个包含5列和数千行的数据框。我正在尝试基于日期时间、Mac、来源和关联(一个关联一个取消关联)将两行特定行合并为一行... 数据框如下:
数据框已排序,如下所示:
df.sort_values(['datetime', 'Association'], ascending=[True, False])
id | datetime | Mac | Source | Association |
---|---|---|---|---|
32310 | 2023-06-07 07:39:48 | 00:00:00:00:00:1e | B1-31 | associated |
36653 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B1-31 | disassociated |
32313 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B2-21 | associated |
32358 | 2023-06-07 07:40:19 | 00:00:00:00:00:1e | B2-21 | disassociated |
32341 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PI-12 | disassociated |
37577 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PS-19 | associated |
32955 | 2023-06-07 07:40:31 | 00:00:00:00:00:1e | PS-19 | disassociated |
32364 | 2023-06-07 07:40:31 | 00:00:00:00:00:1f | B2-21 | associated |
32046 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B2-21 | disassociated |
32405 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B1-31 | associated |
32431 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B1-31 | disassociated |
33915 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B3-24 | associated |
32458 | 2023-06-07 07:41:33 | 00:00:00:00:00:1f | B3-24 | disassociated |
我想要的输出如下:
id | datetime associated | datetime disassociated | Mac | Source |
---|---|---|---|---|
1 | 2023-06-07 07:39:48 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B1-31 |
2 | 2023-06-07 07:40:02 | 2023-06-07 07:40:19 | 00:00:00:00:00:1e | B2-21 |
3 | 0000-00-00 00:00:00 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PI-12 |
4 | 2023-06-07 07:40:26 | 2023-06-07 07:40:31 | 00:00:00:00:00:1e | PS-19 |
5 | 2023-06-07 07:40:31 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B2-21 |
6 | 2023-06-07 07:41:04 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B1-31 |
7 | 2023-06-07 07:41:14 | 2023-06-07 07:41:33 | 00:00:00:00:00:1f | B3-24 |
有什么建议吗?提前感谢。
英文:
I have a data frame that has 5 columns and thousands rows. I am trying to merge two specific rows into one single row based on datetime, Mac, Source and Association (one associated with one disassociated).... the data frame is like this
The dataframe is sorted, like that
df.sort_values(['datetime', 'Association'],ascending=[True,False])
id | datetime | Mac | Source | Association |
---|---|---|---|---|
32310 | 2023-06-07 07:39:48 | 00:00:00:00:00:1e | B1-31 | associated |
36653 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B1-31 | disassociated |
32313 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B2-21 | associated |
32358 | 2023-06-07 07:40:19 | 00:00:00:00:00:1e | B2-21 | disassociated |
32341 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PI-12 | disassociated |
37577 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PS-19 | associated |
32955 | 2023-06-07 07:40:31 | 00:00:00:00:00:1e | PS-19 | disassociated |
32364 | 2023-06-07 07:40:31 | 00:00:00:00:00:1f | B2-21 | associated |
32046 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B2-21 | disassociated |
32405 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B1-31 | associated |
32431 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B1-31 | disassociated |
33915 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B3-24 | associated |
32458 | 2023-06-07 07:41:33 | 00:00:00:00:00:1f | B3-24 | disassociated |
What I want the output as
id | datetime associated | datetime disassociated | Mac | Source |
---|---|---|---|---|
1 | 2023-06-07 07:39:48 | 2023-06-07 07:40:02 | 00:00:00:00:00:1e | B1-31 |
2 | 2023-06-07 07:40:02 | 2023-06-07 07:40:19 | 00:00:00:00:00:1e | B2-21 |
3 | 0000-00-00 00:00:00 | 2023-06-07 07:40:26 | 00:00:00:00:00:1e | PI-12 |
4 | 2023-06-07 07:40:26 | 2023-06-07 07:40:31 | 00:00:00:00:00:1e | PS-19 |
5 | 2023-06-07 07:40:31 | 2023-06-07 07:41:04 | 00:00:00:00:00:1f | B2-21 |
6 | 2023-06-07 07:41:04 | 2023-06-07 07:41:14 | 00:00:00:00:00:1f | B1-31 |
7 | 2023-06-07 07:41:14 | 2023-06-07 07:41:33 | 00:00:00:00:00:1f | B3-24 |
Any suggestions? Thanks in advance
答案1
得分: 1
This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])
datetime
Association associated disassociated
Source Mac
B1-31 00:00:00:00:00:1e 2023-06-14 07:39:48 2023-06-14 07:40:02
00:00:00:00:00:1f 2023-06-14 07:41:04 2023-06-14 07:41:14
B2-21 00:00:00:00:00:1e 2023-06-14 07:40:02 2023-06-14 07:40:19
00:00:00:00:00:1f 2023-06-14 07:40:31 2023-06-14 07:41:04
B3-24 00:00:00:00:00:1f 2023-06-14 07:41:14 2023-06-14 07:41:33
PI-12 00:00:00:00:00:1e NaT 2023-06-14 07:40:26
PS-19 00:00:00:00:00:1e 2023-06-14 07:40:26 2023-06-14 07:40:31
英文:
This output needs a bit more processing but accomplishes the hard part of reshaping the data:
df.pivot(index=['Source', 'Mac'], columns=['Association'], values=['datetime'])
datetime
Association associated disassociated
Source Mac
B1-31 00:00:00:00:00:1e 2023-06-14 07:39:48 2023-06-14 07:40:02
00:00:00:00:00:1f 2023-06-14 07:41:04 2023-06-14 07:41:14
B2-21 00:00:00:00:00:1e 2023-06-14 07:40:02 2023-06-14 07:40:19
00:00:00:00:00:1f 2023-06-14 07:40:31 2023-06-14 07:41:04
B3-24 00:00:00:00:00:1f 2023-06-14 07:41:14 2023-06-14 07:41:33
PI-12 00:00:00:00:00:1e NaT 2023-06-14 07:40:26
PS-19 00:00:00:00:00:1e 2023-06-14 07:40:26 2023-06-14 07:40:31
</details>
# 答案2
**得分**: 1
以下是翻译好的代码部分:
```python
df2 = df.pivot(
index=["Mac", "Source"],
columns="Association",
values="datetime",
).reset_index()
df2.columns.name = None
df2 = df2.rename(
columns={
"associated": "datetime associated",
"disassociated": "datetime disassociated",
}
)
df2['datetime associated'] = df2['datetime associated'].astype('datetime64')
df2['datetime disassociated'] = df2['datetime disassociated'].astype('datetime64')
df2 = df2[['datetime associated', 'datetime disassociated', 'Mac', 'Source']]
英文:
Just a continuation of spinodal's solution:
df2 = df.pivot(
index=["Mac", "Source"],
columns="Association",
values="datetime",
).reset_index()
df2.columns.name = None
df2 = df2.rename(
columns={
"associated": "datetime associated",
"disassociated": "datetime disassociated",
}
)
df2['datetime associated'] = df2['datetime associated'].astype('datetime64')
df2['datetime disassociated'] = df2['datetime disassociated'].astype('datetime64')
df2 = df2[['datetime associated', 'datetime disassociated', 'Mac', 'Source']]
答案3
得分: 0
Pivot 是一个好方法:
df = data.pivot(index=('Mac', 'Source'), columns='Association', values='datetime').reset_index()
英文:
Pivot is the way to go:
df = data.pivot(index=('Mac', 'Source'), columns='Association', values='datetime').reset_index()
答案4
得分: 0
这是你可以尝试的一种方法:
import pandas as pd
import numpy as np
data = pd.DataFrame({
'id': [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
'datetime': ['2023-06-07 07:39:48', '2023-06-07 07:40:02', '2023-06-07 07:40:02', '2023-06-07 07:40:19',
'2023-06-07 07:40:26', '2023-06-07 07:40:26', '2023-06-07 07:40:31', '2023-06-07 07:40:31',
'2023-06-07 07:41:04', '2023-06-07 07:41:04', '2023-06-07 07:41:14', '2023-06-07 07:41:14',
'2023-06-07 07:41:33'],
'Mac': ['00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e',
'00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1f',
'00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f',
'00:00:00:00:00:1f'],
'Source': ['B1-31', 'B1-31', 'B2-21', 'B2-21', 'PI-12', 'PS-19', 'PS-19', 'B2-21', 'B2-21',
'B1-31', 'B1-31', 'B3-24', 'B3-24'],
'Association': ['associated', 'disassociated', 'associated', 'disassociated', 'disassociated',
'associated', 'disassociated', 'associated', 'disassociated', 'associated',
'disassociated', 'associated', 'disassociated']
})
# Sort the dataframe by datetime and Association
data = data.sort_values(['datetime', 'Association'], ascending=[True, False])
_condition_1 = (data.Source == data.Source.shift(-1))
_condition_2 = (data.Association == "associated") & (data.Association.shift(-1) == "disassociated")
new_df = data.copy()
new_df["datetime associated"] = np.where((_condition_1 & _condition_2),data.datetime,np.nan)
new_df["datetime diassociated"] = np.where((_condition_1 & _condition_2),data.datetime.shift(-1),np.nan)
new_df = new_df.dropna()
new_df.drop(['datetime'],axis=1, inplace=True)
display(new_df)
告诉我是否回答了你的问题。
英文:
Here is an approach you could try :
import pandas as pd
import numpy as np
data = pd.DataFrame({
'id': [32310, 36653, 32313, 32358, 32341, 37577, 32955, 32364, 32046, 32405, 32431, 33915, 32458],
'datetime': ['2023-06-07 07:39:48', '2023-06-07 07:40:02', '2023-06-07 07:40:02', '2023-06-07 07:40:19',
'2023-06-07 07:40:26', '2023-06-07 07:40:26', '2023-06-07 07:40:31', '2023-06-07 07:40:31',
'2023-06-07 07:41:04', '2023-06-07 07:41:04', '2023-06-07 07:41:14', '2023-06-07 07:41:14',
'2023-06-07 07:41:33'],
'Mac': ['00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e',
'00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1e', '00:00:00:00:00:1f',
'00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f', '00:00:00:00:00:1f',
'00:00:00:00:00:1f'],
'Source': ['B1-31', 'B1-31', 'B2-21', 'B2-21', 'PI-12', 'PS-19', 'PS-19', 'B2-21', 'B2-21',
'B1-31', 'B1-31', 'B3-24', 'B3-24'],
'Association': ['associated', 'disassociated', 'associated', 'disassociated', 'disassociated',
'associated', 'disassociated', 'associated', 'disassociated', 'associated',
'disassociated', 'associated', 'disassociated']
})
# Sort the dataframe by datetime and Association
data = data.sort_values(['datetime', 'Association'], ascending=[True, False])
_condition_1 = (data.Source == data.Source.shift(-1))
_condition_2 = (data.Association == "associated") & (data.Association.shift(-1) == "disassociated")
new_df = data.copy()
new_df["datetime associated"] = np.where((_condition_1 & _condition_2),data.datetime,np.nan)
new_df["datetime diassociated"] = np.where((_condition_1 & _condition_2),data.datetime.shift(-1),np.nan)
new_df = new_df.dropna()
new_df.drop(['datetime'],axis=1, inplace=True)
display(new_df)
Tell me if it answers your question
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论