在第二行按顺序对 Pandas 数据框列进行排序。

huangapple go评论53阅读模式
英文:

Sort pandas dataframe columns on second row order

问题

根据第二行的顺序对数据框进行排序,例如:

import pandas as pd

data = {'1a': ['C', 3, 1], '2b': ['B', 2, 3], '3c': ['A', 5, 2]}
df = pd.DataFrame(data)
df

输出结果:

  1a 2b 3c
0  C  B  A
1  3  2  5
2  1  3  2

期望的输出:

  3c 2b 1a
0  A  B  C
1  5  2  3
2  2  3  1

你可以使用以下方法实现这个目标:

# 创建一个用于排序的列表
mylist = ['A', 'B', 'C']

# 按照mylist的顺序重新排列数据框的列
df = df[mylist]

# 输出结果
df

这将按照mylist的顺序重新排列数据框的列,得到期望的输出:

  3c 2b 1a
0  A  B  C
1  5  2  3
2  2  3  1
英文:

I need to sort a dataframe based on the order of the second row. For example:

import pandas as pd

data = {'1a': ['C', 3, 1], '2b': ['B', 2, 3], '3c': ['A', 5, 2]}
df = pd.DataFrame(data)
df

Output:

  1a 2b 3c
0  C  B  A
1  3  2  5
2  1  3  2

Desired output:

  3c 2b 1a
0  A  B  C
1  5  2  3
2  2  3  1

So the columns have been order based on the zero index row, on the A, B, C.

Have tried many sorting options without success.

Having a quick way to accomplish this would be beneficial, but having granular control to both order the elements and move a specific column to the first position would be even better. For example move "C" to the first column.

Something like make a list, sort, move and reorder on list.

mylist = ['B', 'A', 'C']
mylist.sort()
mylist.insert(0, mylist.pop(mylist.index('C')))

Then sorting the dataframe on ['C', 'A', 'B'] outputting

  1a 3c 2b
0  C  A  B
1  3  5  2
2  1  2  3

答案1

得分: 2

你可以尝试使用以下代码:

df = df[df.iloc[0].sort_values().index]

返回结果如下:

      3c 2b 1a
    0  A  B  C
    1  5  2  3
    2  2  3  1

在这种情况下,你正在处理第一行数据,对其进行排序,然后返回排序后的索引值。你可以根据需要对多列/多行进行排序。

英文:

You can try with:

df = df[df.iloc[0].sort_values().index]

Returning:

  3c 2b 1a
0  A  B  C
1  5  2  3
2  2  3  1

In this case you are working with the first row, and sorting it to then return the index values in the sorted. You have a lot flexibility you can even sort by multiple columns/rows this way too.

答案2

得分: 1

如果您想将特定列移动到第一位置,您可以修改代码如下所示:

import pandas as pd

data = {"1a": ["C", 3, 1],
        "2b": ["B", 2, 3],
        "3c": ["A", 5, 2]}
df = pd.DataFrame(data)
print(df)
#   1a 2b 3c
# 0  C  B  A
# 1  3  2  5
# 2  1  3  2

# 获取第二行并将其转换为列表
second_row = df.iloc[1, :].tolist()
print(second_row)
# [3, 2, 5]

# 定义您要移动到第一位置的列
target_column = "1a"

# 对列表进行排序
sorted_columns = sorted(range(len(second_row)), key=lambda k: second_row[k])
print(sorted_columns)

# 将目标列移动到第一位置
sorted_columns.remove(df.columns.get_loc(target_column))
sorted_columns.insert(0, df.columns.get_loc(target_column))

# 根据排序后的列表重新排列DataFrame的列
df = df.iloc[:, sorted_columns]

print(df)
#   1a 2b 3c
# 0  C  B  A
# 1  3  2  5
# 2  1  3  2

(注意:代码部分不翻译,只提供翻译好的内容。)

英文:

If you want to move a specific column to the first position, you can modify the code as in example:

import pandas as pd

data = {"1a": ["C", 3, 1],
        "2b": ["B", 2, 3],
        "3c": ["A", 5, 2]}
df = pd.DataFrame(data)
print(df)
#   1a 2b 3c
# 0  C  B  A
# 1  3  2  5
# 2  1  3  2

# Get the second row and convert it to a list
second_row = df.iloc[1, :].tolist()
print(second_row)
# [3, 2, 5]

# Define the column you want to move to the first position
target_column = "1a"

# Sort the list
sorted_columns = sorted(range(len(second_row)), key=lambda k: second_row[k])
print(sorted_columns)

# Move the target column to the first position
sorted_columns.remove(df.columns.get_loc(target_column))
sorted_columns.insert(0, df.columns.get_loc(target_column))

# Reorder the columns of the DataFrame based on the sorted list
df = df.iloc[:, sorted_columns]

print(df)
#   1a 2b 3c
# 0  C  B  A
# 1  3  2  5
# 2  1  3  2

答案3

得分: 0

使用pyjedy和Stingher的帮助,我能够解决这个问题。其中一个问题是由于我的输入引起的。输入是由列表而不是字典组成的,因此我需要进行转换。因此,我对行进行了索引,并在列的顶部进行了索引。因此,从列表中选择元素需要获取索引。

import pandas as pd

def search_list_for_pattern(lst, pattern):
    for idx, item in enumerate(lst):
        if pattern in item:
            break
    return idx

data = [['1a', 'B', 2, 3], ['2b', 'C', 3, 1], ['3c', 'A', 5, 2]]
df = pd.DataFrame(data).transpose()
print(df)

#     0   1   2
# 0  1a  2b  3c
# 1   B   C   A
# 2   2   3   5
# 3   3   1   2

# 获取第二行并将其转换为列表
second_row = df.iloc[1, :].tolist()
print(second_row)

# ['B', 'C', 'A']

# 找到要移动到第一个位置的列的索引
target_column = search_list_for_pattern(second_row, "C")
print(target_column)

# 1

# 对列表进行排序
sorted_columns = sorted(range(len(second_row)), key=lambda k: second_row[k])
print(sorted_columns)

# [2, 0, 1]

# 将目标列移动到第一个位置
sorted_columns.remove(df.columns.get_loc(target_column))
sorted_columns.insert(0, df.columns.get_loc(target_column))

# 根据排序后的列表重新排列DataFrame的列
df = df.iloc[:, sorted_columns]
print(df)

#     1   2   0
# 0  2b  3c  1a
# 1   C   A   B
# 2   3   5   2
# 3   1   2   3

df.to_excel('ordered.xlsx', sheet_name='Sheet1', index=False, header=False)

[![enter image description here][1]][1]


[1]: https://i.stack.imgur.com/kcgzy.png

<details>
<summary>英文:</summary>

With the help of pyjedy and Stingher, I was able to resolve this issue. One of the problems was due to my input. The input consisted of lists instead of dictionaries, so I needed to transform it. As a result, I had indexes for rows and across the top for columns. Consequently, selecting elements from the list required obtaining the index.

import pandas as pd

def search_list_for_pattern(lst, pattern):
for idx, item in enumerate(lst):
if pattern in item:
break
return idx

data = [['1a', 'B', 2, 3], ['2b', 'C', 3, 1], ['3c', 'A', 5, 2]]
df = pd.DataFrame(data).transpose()
print(df)

0 1 2

0 1a 2b 3c

1 B C A

2 2 3 5

3 3 1 2

Get the second row and convert it to a list

second_row = df.iloc[1, :].tolist()
print(second_row)

['B', 'C', 'A']

Find the index of the column you want to move to the first position

target_column = search_list_for_pattern(second_row, "C")
print(target_column)

1

Sort the list

sorted_columns = sorted(range(len(second_row)), key=lambda k: second_row[k])
print(sorted_columns)

[2, 0, 1]

Move the target column to the first position

sorted_columns.remove(df.columns.get_loc(target_column))
sorted_columns.insert(0, df.columns.get_loc(target_column))

Reorder the columns of the DataFrame based on the sorted list

df = df.iloc[:, sorted_columns]
print(df)

1 2 0

0 2b 3c 1a

1 C A B

2 3 5 2

3 1 2 3

df.to_excel('ordered.xlsx', sheet_name='Sheet1', index=False, header=False)

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/kcgzy.png

</details>



huangapple
  • 本文由 发表于 2023年6月2日 02:16:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384653.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定