英文:
re-arrange data by pairs recursively
问题
我有一个包含ACQ/REL对的递归数据框,如下所示:
import pandas as pd
data = [
['2023-06-05 16:51:27.561','ACQ','location'],
['2023-06-05 16:51:27.564','ACQ','location'],
['2023-06-05 16:51:27.567','ACQ','location'],
['2023-06-05 16:51:27.571','REL','location'],
['2023-06-05 16:51:27.573','REL','location'],
['2023-06-05 16:51:27.587','REL','location'],
['2023-06-05 16:51:28.559','ACQ','location'],
['2023-06-05 16:51:28.561','ACQ','location'],
['2023-06-05 16:51:28.563','ACQ','location'],
['2023-06-05 16:51:28.566','REL','location'],
['2023-06-05 16:51:28.569','REL','location'],
['2023-06-05 16:51:28.575','REL','location']
]
df = pd.DataFrame(data,columns=['ts','action','name'])
我想重新组织它,以ACQ/REL对作为一个组,使输出数据框如下所示:
0 2023-06-05 16:51:27.561 ACQ location
5 2023-06-05 16:51:27.587 REL location
1 2023-06-05 16:51:27.564 ACQ location
4 2023-06-05 16:51:27.573 REL location
2 2023-06-05 16:51:27.567 ACQ location
3 2023-06-05 16:51:27.571 REL location
6 2023-06-05 16:51:28.559 ACQ location
11 2023-06-05 16:51:28.575 REL location
7 2023-06-05 16:51:28.561 ACQ location
10 2023-06-05 16:51:28.569 REL location
8 2023-06-05 16:51:28.563 ACQ location
9 2023-06-05 16:51:28.566 REL location
当前示例是3对ACQ/REL作为一组,但不一定始终相同。如何正确获得这样的结果?
英文:
I have dataframe contains ACQ/REL pair recusively as below:
import pandas as pd
data = [
['2023-06-05 16:51:27.561','ACQ','location'],
['2023-06-05 16:51:27.564','ACQ','location'],
['2023-06-05 16:51:27.567','ACQ','location'],
['2023-06-05 16:51:27.571','REL','location'],
['2023-06-05 16:51:27.573','REL','location'],
['2023-06-05 16:51:27.587','REL','location'],
['2023-06-05 16:51:28.559','ACQ','location'],
['2023-06-05 16:51:28.561','ACQ','location'],
['2023-06-05 16:51:28.563','ACQ','location'],
['2023-06-05 16:51:28.566','REL','location'],
['2023-06-05 16:51:28.569','REL','location'],
['2023-06-05 16:51:28.575','REL','location']
]
df = pd.DataFrame(data,columns=['ts','action','name'])
I would re-orgnize it by ACQ/REL pairs, the outer ACQ/REL pairs as a group, so that the output dataframe looks like below:
0 2023-06-05 16:51:27.561 ACQ location
5 2023-06-05 16:51:27.587 REL location
1 2023-06-05 16:51:27.564 ACQ location
4 2023-06-05 16:51:27.573 REL location
2 2023-06-05 16:51:27.567 ACQ location
3 2023-06-05 16:51:27.571 REL location
6 2023-06-05 16:51:28.559 ACQ location
11 2023-06-05 16:51:28.575 REL location
7 2023-06-05 16:51:28.561 ACQ location
10 2023-06-05 16:51:28.569 REL location
8 2023-06-05 16:51:28.563 ACQ location
9 2023-06-05 16:51:28.566 REL location
Current example is 3 pairs a group but it's not constantly the same. What's proper way to get such results?
答案1
得分: 1
以下是您要翻译的内容:
"If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list."
"如果始终保证ACQ和REL的数量相同,并且它们分别按正确顺序排列(即,第n个ACQ应始终与第n个REL配对),则解决方案是首先通过其第二个元素将原始列表拆分为两个,然后循环从每个列表中输出一个。"
请告诉我如果您需要翻译其他部分。
英文:
If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list.
Code:
import pandas as pd
data = [
['2023-06-05 16:51:27.561', 'ACQ', 'location'],
['2023-06-05 16:51:27.564', 'ACQ', 'location'],
['2023-06-05 16:51:27.567', 'ACQ', 'location'],
['2023-06-05 16:51:27.571', 'REL', 'location'],
['2023-06-05 16:51:27.573', 'REL', 'location'],
['2023-06-05 16:51:27.587', 'REL', 'location'],
['2023-06-05 16:51:28.559', 'ACQ', 'location'],
['2023-06-05 16:51:28.561', 'ACQ', 'location'],
['2023-06-05 16:51:28.563', 'ACQ', 'location'],
['2023-06-05 16:51:28.566', 'REL', 'location'],
['2023-06-05 16:51:28.569', 'REL', 'location'],
['2023-06-05 16:51:28.575', 'REL', 'location']
]
df = pd.DataFrame(data, columns=['ts', 'action', 'name'])
# print(df)
data_acq = []
data_rel = []
for index, row in df.iterrows():
if row['action'] == 'ACQ':
data_acq.append(row)
elif row['action'] == 'REL':
data_rel.append(row)
assert len(data_acq) == len(data_rel)
df_new = pd.DataFrame([], columns=['ts', 'action', 'name'])
for j in range(len(data_acq)):
df_new = pd.concat([
df_new,
pd.DataFrame([
data_acq[j], data_rel[j]
])
])
print(df_new)
Output:
ts action name
0 2023-06-05 16:51:27.561 ACQ location
3 2023-06-05 16:51:27.571 REL location
1 2023-06-05 16:51:27.564 ACQ location
4 2023-06-05 16:51:27.573 REL location
2 2023-06-05 16:51:27.567 ACQ location
5 2023-06-05 16:51:27.587 REL location
6 2023-06-05 16:51:28.559 ACQ location
9 2023-06-05 16:51:28.566 REL location
7 2023-06-05 16:51:28.561 ACQ location
10 2023-06-05 16:51:28.569 REL location
8 2023-06-05 16:51:28.563 ACQ location
11 2023-06-05 16:51:28.575 REL location
答案2
得分: 1
以下是翻译好的部分:
尝试这个:
df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])
输出:
ts action name sortkey
0 2023-06-05 16:51:27.561 ACQ location 0
3 2023-06-05 16:51:27.571 REL location 0
1 2023-06-05 16:51:27.564 ACQ location 1
4 2023-06-05 16:51:27.573 REL location 1
2 2023-06-05 16:51:27.567 ACQ location 2
5 2023-06-05 16:51:27.587 REL location 2
6 2023-06-05 16:51:28.559 ACQ location 3
9 2023-06-05 16:51:28.566 REL location 3
7 2023-06-05 16:51:28.561 ACQ location 4
10 2023-06-05 16:51:28.569 REL location 4
8 2023-06-05 16:51:28.563 ACQ location 5
11 2023-06-05 16:51:28.575 REL location 5
英文:
Try this:
df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])
Output:
ts action name sortkey
0 2023-06-05 16:51:27.561 ACQ location 0
3 2023-06-05 16:51:27.571 REL location 0
1 2023-06-05 16:51:27.564 ACQ location 1
4 2023-06-05 16:51:27.573 REL location 1
2 2023-06-05 16:51:27.567 ACQ location 2
5 2023-06-05 16:51:27.587 REL location 2
6 2023-06-05 16:51:28.559 ACQ location 3
9 2023-06-05 16:51:28.566 REL location 3
7 2023-06-05 16:51:28.561 ACQ location 4
10 2023-06-05 16:51:28.569 REL location 4
8 2023-06-05 16:51:28.563 ACQ location 5
11 2023-06-05 16:51:28.575 REL location 5
答案3
得分: 1
以下是翻译好的部分:
使用列表作为堆栈,您可以计算相应ACQ位置的REL的偏移量(即ACQ的索引)。然后根据调整后的位置(ACQ保持在其原始位置,REL偏移回ACQ的位置)对索引进行排序,以获取新顺序中的索引:
acq = list() # ACQ位置的堆栈
iREL = enumerate(x=="REL" for ,x, in data) # 识别REL索引
offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))
data = [data[i] for i in order] # 重新排序数据:df.reindex(order)
print(*data,sep="\n")
['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']
如果数据已经在数据框中,可以使用order
列表使用reindex()
方法对数据框进行排序
英文:
Using a list as a stack, you can compute offsets for the REL that indicate how far back the corresponding ACQ is located (i.e. the ACQ's index). Then sort the indices according to the adjusted positions (ACQ remaining at their original position and REL offset back to the ACQ's position) to get the indexes in the new order:
acq = list() # stack of ACQ positions
iREL = enumerate(x=="REL" for _,x,_ in data) # identify REL indices
offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))
data = [data[i] for i in order] # reorder data : df.reindex(order)
print(*data,sep="\n")
['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']
If the data is already in the dataframe, the order
list could be used to sort the dataframe with the reindex() method
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论