重新按成对递归地排列数据。

huangapple go评论101阅读模式
英文:

re-arrange data by pairs recursively

问题

我有一个包含ACQ/REL对的递归数据框,如下所示:

  1. import pandas as pd
  2. data = [
  3. ['2023-06-05 16:51:27.561','ACQ','location'],
  4. ['2023-06-05 16:51:27.564','ACQ','location'],
  5. ['2023-06-05 16:51:27.567','ACQ','location'],
  6. ['2023-06-05 16:51:27.571','REL','location'],
  7. ['2023-06-05 16:51:27.573','REL','location'],
  8. ['2023-06-05 16:51:27.587','REL','location'],
  9. ['2023-06-05 16:51:28.559','ACQ','location'],
  10. ['2023-06-05 16:51:28.561','ACQ','location'],
  11. ['2023-06-05 16:51:28.563','ACQ','location'],
  12. ['2023-06-05 16:51:28.566','REL','location'],
  13. ['2023-06-05 16:51:28.569','REL','location'],
  14. ['2023-06-05 16:51:28.575','REL','location']
  15. ]
  16. df = pd.DataFrame(data,columns=['ts','action','name'])

我想重新组织它,以ACQ/REL对作为一个组,使输出数据框如下所示:

  1. 0 2023-06-05 16:51:27.561 ACQ location
  2. 5 2023-06-05 16:51:27.587 REL location
  3. 1 2023-06-05 16:51:27.564 ACQ location
  4. 4 2023-06-05 16:51:27.573 REL location
  5. 2 2023-06-05 16:51:27.567 ACQ location
  6. 3 2023-06-05 16:51:27.571 REL location
  7. 6 2023-06-05 16:51:28.559 ACQ location
  8. 11 2023-06-05 16:51:28.575 REL location
  9. 7 2023-06-05 16:51:28.561 ACQ location
  10. 10 2023-06-05 16:51:28.569 REL location
  11. 8 2023-06-05 16:51:28.563 ACQ location
  12. 9 2023-06-05 16:51:28.566 REL location

当前示例是3对ACQ/REL作为一组,但不一定始终相同。如何正确获得这样的结果?

英文:

I have dataframe contains ACQ/REL pair recusively as below:

  1. import pandas as pd
  2. data = [
  3. ['2023-06-05 16:51:27.561','ACQ','location'],
  4. ['2023-06-05 16:51:27.564','ACQ','location'],
  5. ['2023-06-05 16:51:27.567','ACQ','location'],
  6. ['2023-06-05 16:51:27.571','REL','location'],
  7. ['2023-06-05 16:51:27.573','REL','location'],
  8. ['2023-06-05 16:51:27.587','REL','location'],
  9. ['2023-06-05 16:51:28.559','ACQ','location'],
  10. ['2023-06-05 16:51:28.561','ACQ','location'],
  11. ['2023-06-05 16:51:28.563','ACQ','location'],
  12. ['2023-06-05 16:51:28.566','REL','location'],
  13. ['2023-06-05 16:51:28.569','REL','location'],
  14. ['2023-06-05 16:51:28.575','REL','location']
  15. ]
  16. df = pd.DataFrame(data,columns=['ts','action','name'])

I would re-orgnize it by ACQ/REL pairs, the outer ACQ/REL pairs as a group, so that the output dataframe looks like below:

  1. 0 2023-06-05 16:51:27.561 ACQ location
  2. 5 2023-06-05 16:51:27.587 REL location
  3. 1 2023-06-05 16:51:27.564 ACQ location
  4. 4 2023-06-05 16:51:27.573 REL location
  5. 2 2023-06-05 16:51:27.567 ACQ location
  6. 3 2023-06-05 16:51:27.571 REL location
  7. 6 2023-06-05 16:51:28.559 ACQ location
  8. 11 2023-06-05 16:51:28.575 REL location
  9. 7 2023-06-05 16:51:28.561 ACQ location
  10. 10 2023-06-05 16:51:28.569 REL location
  11. 8 2023-06-05 16:51:28.563 ACQ location
  12. 9 2023-06-05 16:51:28.566 REL location

Current example is 3 pairs a group but it's not constantly the same. What's proper way to get such results?

答案1

得分: 1

以下是您要翻译的内容:

"If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list."

"如果始终保证ACQ和REL的数量相同,并且它们分别按正确顺序排列(即,第n个ACQ应始终与第n个REL配对),则解决方案是首先通过其第二个元素将原始列表拆分为两个,然后循环从每个列表中输出一个。"

请告诉我如果您需要翻译其他部分。

英文:

If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list.

Code:

  1. import pandas as pd
  2. data = [
  3. ['2023-06-05 16:51:27.561', 'ACQ', 'location'],
  4. ['2023-06-05 16:51:27.564', 'ACQ', 'location'],
  5. ['2023-06-05 16:51:27.567', 'ACQ', 'location'],
  6. ['2023-06-05 16:51:27.571', 'REL', 'location'],
  7. ['2023-06-05 16:51:27.573', 'REL', 'location'],
  8. ['2023-06-05 16:51:27.587', 'REL', 'location'],
  9. ['2023-06-05 16:51:28.559', 'ACQ', 'location'],
  10. ['2023-06-05 16:51:28.561', 'ACQ', 'location'],
  11. ['2023-06-05 16:51:28.563', 'ACQ', 'location'],
  12. ['2023-06-05 16:51:28.566', 'REL', 'location'],
  13. ['2023-06-05 16:51:28.569', 'REL', 'location'],
  14. ['2023-06-05 16:51:28.575', 'REL', 'location']
  15. ]
  16. df = pd.DataFrame(data, columns=['ts', 'action', 'name'])
  17. # print(df)
  18. data_acq = []
  19. data_rel = []
  20. for index, row in df.iterrows():
  21. if row['action'] == 'ACQ':
  22. data_acq.append(row)
  23. elif row['action'] == 'REL':
  24. data_rel.append(row)
  25. assert len(data_acq) == len(data_rel)
  26. df_new = pd.DataFrame([], columns=['ts', 'action', 'name'])
  27. for j in range(len(data_acq)):
  28. df_new = pd.concat([
  29. df_new,
  30. pd.DataFrame([
  31. data_acq[j], data_rel[j]
  32. ])
  33. ])
  34. print(df_new)

Output:

  1. ts action name
  2. 0 2023-06-05 16:51:27.561 ACQ location
  3. 3 2023-06-05 16:51:27.571 REL location
  4. 1 2023-06-05 16:51:27.564 ACQ location
  5. 4 2023-06-05 16:51:27.573 REL location
  6. 2 2023-06-05 16:51:27.567 ACQ location
  7. 5 2023-06-05 16:51:27.587 REL location
  8. 6 2023-06-05 16:51:28.559 ACQ location
  9. 9 2023-06-05 16:51:28.566 REL location
  10. 7 2023-06-05 16:51:28.561 ACQ location
  11. 10 2023-06-05 16:51:28.569 REL location
  12. 8 2023-06-05 16:51:28.563 ACQ location
  13. 11 2023-06-05 16:51:28.575 REL location

答案2

得分: 1

以下是翻译好的部分:

尝试这个:

  1. df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])

输出:

  1. ts action name sortkey
  2. 0 2023-06-05 16:51:27.561 ACQ location 0
  3. 3 2023-06-05 16:51:27.571 REL location 0
  4. 1 2023-06-05 16:51:27.564 ACQ location 1
  5. 4 2023-06-05 16:51:27.573 REL location 1
  6. 2 2023-06-05 16:51:27.567 ACQ location 2
  7. 5 2023-06-05 16:51:27.587 REL location 2
  8. 6 2023-06-05 16:51:28.559 ACQ location 3
  9. 9 2023-06-05 16:51:28.566 REL location 3
  10. 7 2023-06-05 16:51:28.561 ACQ location 4
  11. 10 2023-06-05 16:51:28.569 REL location 4
  12. 8 2023-06-05 16:51:28.563 ACQ location 5
  13. 11 2023-06-05 16:51:28.575 REL location 5
英文:

Try this:

  1. df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])

Output:

  1. ts action name sortkey
  2. 0 2023-06-05 16:51:27.561 ACQ location 0
  3. 3 2023-06-05 16:51:27.571 REL location 0
  4. 1 2023-06-05 16:51:27.564 ACQ location 1
  5. 4 2023-06-05 16:51:27.573 REL location 1
  6. 2 2023-06-05 16:51:27.567 ACQ location 2
  7. 5 2023-06-05 16:51:27.587 REL location 2
  8. 6 2023-06-05 16:51:28.559 ACQ location 3
  9. 9 2023-06-05 16:51:28.566 REL location 3
  10. 7 2023-06-05 16:51:28.561 ACQ location 4
  11. 10 2023-06-05 16:51:28.569 REL location 4
  12. 8 2023-06-05 16:51:28.563 ACQ location 5
  13. 11 2023-06-05 16:51:28.575 REL location 5

答案3

得分: 1

以下是翻译好的部分:

使用列表作为堆栈,您可以计算相应ACQ位置的REL的偏移量(即ACQ的索引)。然后根据调整后的位置(ACQ保持在其原始位置,REL偏移回ACQ的位置)对索引进行排序,以获取新顺序中的索引:

acq = list() # ACQ位置的堆栈
iREL = enumerate(x=="REL" for ,x, in data) # 识别REL索引

offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))

data = [data[i] for i in order] # 重新排序数据:df.reindex(order)

print(*data,sep="\n")

['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']

如果数据已经在数据框中,可以使用order列表使用reindex()方法对数据框进行排序

英文:

Using a list as a stack, you can compute offsets for the REL that indicate how far back the corresponding ACQ is located (i.e. the ACQ's index). Then sort the indices according to the adjusted positions (ACQ remaining at their original position and REL offset back to the ACQ's position) to get the indexes in the new order:

  1. acq = list() # stack of ACQ positions
  2. iREL = enumerate(x=="REL" for _,x,_ in data) # identify REL indices
  3. offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
  4. order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))
  5. data = [data[i] for i in order] # reorder data : df.reindex(order)
  6. print(*data,sep="\n")
  7. ['2023-06-05 16:51:27.561', 'ACQ', 'location']
  8. ['2023-06-05 16:51:27.587', 'REL', 'location']
  9. ['2023-06-05 16:51:27.564', 'ACQ', 'location']
  10. ['2023-06-05 16:51:27.573', 'REL', 'location']
  11. ['2023-06-05 16:51:27.567', 'ACQ', 'location']
  12. ['2023-06-05 16:51:27.571', 'REL', 'location']
  13. ['2023-06-05 16:51:28.559', 'ACQ', 'location']
  14. ['2023-06-05 16:51:28.575', 'REL', 'location']
  15. ['2023-06-05 16:51:28.561', 'ACQ', 'location']
  16. ['2023-06-05 16:51:28.569', 'REL', 'location']
  17. ['2023-06-05 16:51:28.563', 'ACQ', 'location']
  18. ['2023-06-05 16:51:28.566', 'REL', 'location']

If the data is already in the dataframe, the order list could be used to sort the dataframe with the reindex() method

huangapple
  • 本文由 发表于 2023年6月6日 06:35:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76410402.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定