2023年6月6日 06:35:45go评论101阅读模式

英文:

re-arrange data by pairs recursively

问题

我有一个包含ACQ/REL对的递归数据框，如下所示：

import pandas as pd
data = [
    ['2023-06-05 16:51:27.561','ACQ','location'],    
    ['2023-06-05 16:51:27.564','ACQ','location'],
    ['2023-06-05 16:51:27.567','ACQ','location'],
    ['2023-06-05 16:51:27.571','REL','location'],
    ['2023-06-05 16:51:27.573','REL','location'],
    ['2023-06-05 16:51:27.587','REL','location'],
    ['2023-06-05 16:51:28.559','ACQ','location'],
    ['2023-06-05 16:51:28.561','ACQ','location'],
    ['2023-06-05 16:51:28.563','ACQ','location'],
    ['2023-06-05 16:51:28.566','REL','location'],
    ['2023-06-05 16:51:28.569','REL','location'],
    ['2023-06-05 16:51:28.575','REL','location']
]
df = pd.DataFrame(data,columns=['ts','action','name'])

我想重新组织它，以ACQ/REL对作为一个组，使输出数据框如下所示：

0   2023-06-05 16:51:27.561    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
11  2023-06-05 16:51:28.575    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
9   2023-06-05 16:51:28.566    REL  location

当前示例是3对ACQ/REL作为一组，但不一定始终相同。如何正确获得这样的结果？

英文:

I have dataframe contains ACQ/REL pair recusively as below:

import pandas as pd
data = [
    [&#39;2023-06-05 16:51:27.561&#39;,&#39;ACQ&#39;,&#39;location&#39;],    
    [&#39;2023-06-05 16:51:27.564&#39;,&#39;ACQ&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:27.567&#39;,&#39;ACQ&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:27.571&#39;,&#39;REL&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:27.573&#39;,&#39;REL&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:27.587&#39;,&#39;REL&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.559&#39;,&#39;ACQ&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.561&#39;,&#39;ACQ&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.563&#39;,&#39;ACQ&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.566&#39;,&#39;REL&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.569&#39;,&#39;REL&#39;,&#39;location&#39;],
    [&#39;2023-06-05 16:51:28.575&#39;,&#39;REL&#39;,&#39;location&#39;]
]
df = pd.DataFrame(data,columns=[&#39;ts&#39;,&#39;action&#39;,&#39;name&#39;])

I would re-orgnize it by ACQ/REL pairs, the outer ACQ/REL pairs as a group, so that the output dataframe looks like below:

0   2023-06-05 16:51:27.561    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
11  2023-06-05 16:51:28.575    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
9   2023-06-05 16:51:28.566    REL  location

Current example is 3 pairs a group but it's not constantly the same. What's proper way to get such results?

答案1

得分: 1

以下是您要翻译的内容：

"If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list."

"如果始终保证ACQ和REL的数量相同，并且它们分别按正确顺序排列（即，第n个ACQ应始终与第n个REL配对），则解决方案是首先通过其第二个元素将原始列表拆分为两个，然后循环从每个列表中输出一个。"

请告诉我如果您需要翻译其他部分。

英文:

If it is always guaranteed that there are the same number of ACQ and REL, and that they are respectively in correct order (i.e., the n-th ACQ should always be paired with the n-th REL), then the solution is to split the original list into two first, by its second element, then loop to output one from each list.

Code:

import pandas as pd
data = [
    [&#39;2023-06-05 16:51:27.561&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:27.564&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:27.567&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:27.571&#39;, &#39;REL&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:27.573&#39;, &#39;REL&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:27.587&#39;, &#39;REL&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.559&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.561&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.563&#39;, &#39;ACQ&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.566&#39;, &#39;REL&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.569&#39;, &#39;REL&#39;, &#39;location&#39;],
    [&#39;2023-06-05 16:51:28.575&#39;, &#39;REL&#39;, &#39;location&#39;]
]
df = pd.DataFrame(data, columns=[&#39;ts&#39;, &#39;action&#39;, &#39;name&#39;])
# print(df)
data_acq = []
data_rel = []
for index, row in df.iterrows():
    if row[&#39;action&#39;] == &#39;ACQ&#39;:
        data_acq.append(row)
    elif row[&#39;action&#39;] == &#39;REL&#39;:
        data_rel.append(row)
assert len(data_acq) == len(data_rel)
df_new = pd.DataFrame([], columns=[&#39;ts&#39;, &#39;action&#39;, &#39;name&#39;])
for j in range(len(data_acq)):
    df_new = pd.concat([
        df_new,
        pd.DataFrame([
            data_acq[j], data_rel[j]
        ])
    ])
print(df_new)

Output:

                         ts action      name
0   2023-06-05 16:51:27.561    ACQ  location
3   2023-06-05 16:51:27.571    REL  location
1   2023-06-05 16:51:27.564    ACQ  location
4   2023-06-05 16:51:27.573    REL  location
2   2023-06-05 16:51:27.567    ACQ  location
5   2023-06-05 16:51:27.587    REL  location
6   2023-06-05 16:51:28.559    ACQ  location
9   2023-06-05 16:51:28.566    REL  location
7   2023-06-05 16:51:28.561    ACQ  location
10  2023-06-05 16:51:28.569    REL  location
8   2023-06-05 16:51:28.563    ACQ  location
11  2023-06-05 16:51:28.575    REL  location

答案2

得分: 1

以下是翻译好的部分：

尝试这个：

df.sort_values('ts').assign(sortkey=df.groupby('action').cumcount()).sort_values(['sortkey','action'])

输出：

                             ts action      name  sortkey
    0   2023-06-05 16:51:27.561    ACQ  location        0
    3   2023-06-05 16:51:27.571    REL  location        0
    1   2023-06-05 16:51:27.564    ACQ  location        1
    4   2023-06-05 16:51:27.573    REL  location        1
    2   2023-06-05 16:51:27.567    ACQ  location        2
    5   2023-06-05 16:51:27.587    REL  location        2
    6   2023-06-05 16:51:28.559    ACQ  location        3
    9   2023-06-05 16:51:28.566    REL  location        3
    7   2023-06-05 16:51:28.561    ACQ  location        4
    10  2023-06-05 16:51:28.569    REL  location        4
    8   2023-06-05 16:51:28.563    ACQ  location        5
    11  2023-06-05 16:51:28.575    REL  location        5

英文:

Try this:

df.sort_values(&#39;ts&#39;).assign(sortkey=df.groupby(&#39;action&#39;).cumcount()).sort_values([&#39;sortkey&#39;,&#39;action&#39;])

Output:

                         ts action      name  sortkey
0   2023-06-05 16:51:27.561    ACQ  location        0
3   2023-06-05 16:51:27.571    REL  location        0
1   2023-06-05 16:51:27.564    ACQ  location        1
4   2023-06-05 16:51:27.573    REL  location        1
2   2023-06-05 16:51:27.567    ACQ  location        2
5   2023-06-05 16:51:27.587    REL  location        2
6   2023-06-05 16:51:28.559    ACQ  location        3
9   2023-06-05 16:51:28.566    REL  location        3
7   2023-06-05 16:51:28.561    ACQ  location        4
10  2023-06-05 16:51:28.569    REL  location        4
8   2023-06-05 16:51:28.563    ACQ  location        5
11  2023-06-05 16:51:28.575    REL  location        5

答案3

得分: 1

以下是翻译好的部分：

使用列表作为堆栈，您可以计算相应ACQ位置的REL的偏移量（即ACQ的索引）。然后根据调整后的位置（ACQ保持在其原始位置，REL偏移回ACQ的位置）对索引进行排序，以获取新顺序中的索引：

acq = list() # ACQ位置的堆栈
iREL = enumerate(x=="REL" for ,x, in data) # 识别REL索引

offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))

data = [data[i] for i in order] # 重新排序数据：df.reindex(order)

print(*data,sep="\n")

['2023-06-05 16:51:27.561', 'ACQ', 'location']
['2023-06-05 16:51:27.587', 'REL', 'location']
['2023-06-05 16:51:27.564', 'ACQ', 'location']
['2023-06-05 16:51:27.573', 'REL', 'location']
['2023-06-05 16:51:27.567', 'ACQ', 'location']
['2023-06-05 16:51:27.571', 'REL', 'location']
['2023-06-05 16:51:28.559', 'ACQ', 'location']
['2023-06-05 16:51:28.575', 'REL', 'location']
['2023-06-05 16:51:28.561', 'ACQ', 'location']
['2023-06-05 16:51:28.569', 'REL', 'location']
['2023-06-05 16:51:28.563', 'ACQ', 'location']
['2023-06-05 16:51:28.566', 'REL', 'location']

如果数据已经在数据框中，可以使用order列表使用reindex()方法对数据框进行排序

英文:

Using a list as a stack, you can compute offsets for the REL that indicate how far back the corresponding ACQ is located (i.e. the ACQ's index). Then sort the indices according to the adjusted positions (ACQ remaining at their original position and REL offset back to the ACQ's position) to get the indexes in the new order:

acq     = list()                                # stack of ACQ positions
iREL    = enumerate(x==&quot;REL&quot; for _,x,_ in data) # identify REL indices
offsets = (acq.pop() if rel else acq.append(i) or i for i,rel in iREL)
order   = (i for i,_ in sorted(enumerate(offsets),key=lambda x:x[::-1]))
data    = [data[i] for i in order]   # reorder data : df.reindex(order)
print(*data,sep=&quot;\n&quot;)
[&#39;2023-06-05 16:51:27.561&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:27.587&#39;, &#39;REL&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:27.564&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:27.573&#39;, &#39;REL&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:27.567&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:27.571&#39;, &#39;REL&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.559&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.575&#39;, &#39;REL&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.561&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.569&#39;, &#39;REL&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.563&#39;, &#39;ACQ&#39;, &#39;location&#39;]
[&#39;2023-06-05 16:51:28.566&#39;, &#39;REL&#39;, &#39;location&#39;]

If the data is already in the dataframe, the order list could be used to sort the dataframe with the reindex() method

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重新按成对递归地排列数据。

问题

答案1

答案2

答案3

如何正确将这种类型的XML导入数据框架？

在Python中弹出字典时出现错误。

Python Code and Output in Bookdown pdf are not in multiple lines

如何在Python中高效地对逆向排序的列表进行二分查找？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论