重新按成对递归地排列数据。

huangapple go评论74阅读模式
英文:

re-arrange data by pairs recursively

问题

我有一个包含ACQ/REL对的递归数据框,如下所示:

import pandas as pd

data = [
    ['2023-06-05 16:51:27.561','ACQ','location'],    
    ['2023-06-05 16:51:27.564','ACQ','location'],
    ['2023-06-05 16:51:27.567','ACQ','location'],
    ['2023-06-05 16:51:27.571','REL','location'],
    ['2023-06-05 16:51:27.573','REL','location'],
    ['2023-06-05 16:51:27.587','REL','location'],
    ['2023-06-05 16:51:28.559','ACQ','location'],
    ['2023-06-05 16:51:28.561','ACQ','location'],
    ['2023-06-05 16:51:28.563','ACQ','location'],
    ['2023-06-05 16:51:28.566','REL','location'],
    ['2023-06-05 16:51:28.569','REL','location'],
    ['2023-06-05 16:51:28.575','REL','location']
]

df = pd.DataFrame(data,columns=['ts','action','name'])

我想重新组织它,以ACQ/REL对作为一个组,使输出数据框如下所示:

0   2023-06-05 16:51:27.561    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
11  2023-06-05 16:51:28.575    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
9   2023-06-05 16:51:28.566    REL  location

当前示例是3对ACQ/REL作为一组,但不一定始终相同。如何正确获得这样的结果?

英文:

I have dataframe contains ACQ/REL pair recusively as below:

import pandas as pd

data = [
    ['2023-06-05 16:51:27.561','ACQ','location'],    
    ['2023-06-05 16:51:27.564','ACQ','location'],
    ['2023-06-05 16:51:27.567','ACQ','location'],
    ['2023-06-05 16:51:27.571','REL','location'],
    ['2023-06-05 16:51:27.573','REL','location'],
    ['2023-06-05 16:51:27.587','REL','location'],
    ['2023-06-05 16:51:28.559','ACQ','location'],
    ['2023-06-05 16:51:28.561','ACQ','location'],
    ['2023-06-05 16:51:28.563','ACQ','location'],
    ['2023-06-05 16:51:28.566','REL','location'],
    ['2023-06-05 16:51:28.569','REL','location'],
    ['2023-06-05 16:51:28.575','REL','location']
]

df = pd.DataFrame(data,columns=['ts','action','name'])

I would re-orgnize it by ACQ/REL pairs, the outer ACQ/REL pairs as a group, so that the output dataframe looks like below:

0   2023-06-05 16:51:27.561    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
11  2023-06-05 16:51:28.575    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
9   2023-06-05 16:51:28.566    REL  location

Current example is 3 pairs a group but it's not constantly the same. What's proper way to get such results?

答案1

得分: 1

以下是您要翻译的内容:

"If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list."

"如果始终保证ACQ和REL的数量相同,并且它们分别按正确顺序排列(即,第n个ACQ应始终与第n个REL配对),则解决方案是首先通过其第二个元素将原始列表拆分为两个,然后循环从每个列表中输出一个。"

请告诉我如果您需要翻译其他部分。

英文:

If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list.

Code:

import pandas as pd

data = [
    ['2023-06-05 16:51:27.561', 'ACQ', 'location'],
    ['2023-06-05 16:51:27.564', 'ACQ', 'location'],
    ['2023-06-05 16:51:27.567', 'ACQ', 'location'],
    ['2023-06-05 16:51:27.571', 'REL', 'location'],
    ['2023-06-05 16:51:27.573', 'REL', 'location'],
    ['2023-06-05 16:51:27.587', 'REL', 'location'],
    ['2023-06-05 16:51:28.559', 'ACQ', 'location'],
    ['2023-06-05 16:51:28.561', 'ACQ', 'location'],
    ['2023-06-05 16:51:28.563', 'ACQ', 'location'],
    ['2023-06-05 16:51:28.566', 'REL', 'location'],
    ['2023-06-05 16:51:28.569', 'REL', 'location'],
    ['2023-06-05 16:51:28.575', 'REL', 'location']
]

df = pd.DataFrame(data, columns=['ts', 'action', 'name'])
# print(df)

data_acq = []
data_rel = []

for index, row in df.iterrows():
    if row['action'] == 'ACQ':
        data_acq.append(row)
    elif row['action'] == 'REL':
        data_rel.append(row)
assert len(data_acq) == len(data_rel)

df_new = pd.DataFrame([], columns=['ts', 'action', 'name'])

for j in range(len(data_acq)):
    df_new = pd.concat([
        df_new,
        pd.DataFrame([
            data_acq[j], data_rel[j]
        ])
    ])
print(df_new)

Output:

                         ts action      name
0   2023-06-05 16:51:27.561    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
9   2023-06-05 16:51:28.566    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
11  2023-06-05 16:51:28.575    REL  location

答案2

得分: 1

以下是翻译好的部分:

尝试这个:

df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])

输出:

                             ts action      name  sortkey
    0   2023-06-05 16:51:27.561    ACQ  location        0
    3   2023-06-05 16:51:27.571    REL  location        0
    1   2023-06-05 16:51:27.564    ACQ  location        1
    4   2023-06-05 16:51:27.573    REL  location        1
    2   2023-06-05 16:51:27.567    ACQ  location        2
    5   2023-06-05 16:51:27.587    REL  location        2
    6   2023-06-05 16:51:28.559    ACQ  location        3
    9   2023-06-05 16:51:28.566    REL  location        3
    7   2023-06-05 16:51:28.561    ACQ  location        4
    10  2023-06-05 16:51:28.569    REL  location        4
    8   2023-06-05 16:51:28.563    ACQ  location        5
    11  2023-06-05 16:51:28.575    REL  location        5
英文:

Try this:

df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])

Output:

                         ts action      name  sortkey
0   2023-06-05 16:51:27.561    ACQ  location        0
3   2023-06-05 16:51:27.571    REL  location        0
1   2023-06-05 16:51:27.564    ACQ  location        1
4   2023-06-05 16:51:27.573    REL  location        1
2   2023-06-05 16:51:27.567    ACQ  location        2
5   2023-06-05 16:51:27.587    REL  location        2
6   2023-06-05 16:51:28.559    ACQ  location        3
9   2023-06-05 16:51:28.566    REL  location        3
7   2023-06-05 16:51:28.561    ACQ  location        4
10  2023-06-05 16:51:28.569    REL  location        4
8   2023-06-05 16:51:28.563    ACQ  location        5
11  2023-06-05 16:51:28.575    REL  location        5

答案3

得分: 1

以下是翻译好的部分:

使用列表作为堆栈,您可以计算相应ACQ位置的REL的偏移量(即ACQ的索引)。然后根据调整后的位置(ACQ保持在其原始位置,REL偏移回ACQ的位置)对索引进行排序,以获取新顺序中的索引:

acq = list() # ACQ位置的堆栈
iREL = enumerate(x=="REL" for ,x, in data) # 识别REL索引

offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))

data = [data[i] for i in order] # 重新排序数据:df.reindex(order)

print(*data,sep="\n")

['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']

如果数据已经在数据框中,可以使用order列表使用reindex()方法对数据框进行排序

英文:

Using a list as a stack, you can compute offsets for the REL that indicate how far back the corresponding ACQ is located (i.e. the ACQ's index). Then sort the indices according to the adjusted positions (ACQ remaining at their original position and REL offset back to the ACQ's position) to get the indexes in the new order:

acq     = list()                                # stack of ACQ positions
iREL    = enumerate(x=="REL" for _,x,_ in data) # identify REL indices

offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order   = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))

data    = [data[i] for i in order]   # reorder data : df.reindex(order)

print(*data,sep="\n")

['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']

If the data is already in the dataframe, the order list could be used to sort the dataframe with the reindex() method

huangapple
  • 本文由 发表于 2023年6月6日 06:35:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76410402.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定