根据多个条件更改Pandas数据框列中的值

huangapple go评论83阅读模式
英文:

Change value in a Pandas dataframe column based on several conditions

问题

我理解你的问题,你想要处理一个长的Pandas数据框,根据一些条件来更新'number'列的值。你的想法是检查每个文件的'number'是否与前一个文件相同,如果是,并且文件名不同且评论不以'Replacing:'开头,那么'number'和随后的所有'number'都应增加一。以下是可能的代码实现:

import pandas as pd

# 你的数据框
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

# 初始化一个变量来跟踪当前的'number'
current_number = None

# 遍历数据框的行
for index, row in df_test.iterrows():
    # 如果当前'number'为None或与前一个不同,更新current_number
    if current_number is None or current_number != row['number']:
        current_number = row['number']
    else:
        # 如果'number'相同,检查文件名和评论以进行适当的增加
        if row['file'] != df_test.at[index - 1, 'file'] and not row['comment'].startswith('Replacing:'):
            current_number += 1
    # 更新'number'列的值
    df_test.at[index, 'number'] = current_number

# 打印结果
print(df_test['number'].tolist())

这段代码会按照你的描述来更新'number'列的值,得到期望的结果:[12, 12, 12, 13, 13, 14, 15, 15, 16]。希望这对你有帮助!

英文:

What I have is a long Pandas dataframe in Python that contains three columns named 'file', 'comment', and 'number'. A simple example is:

import pandas as pd
df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

Each file should have a unique number associated with it, but it currently has numerous errors where many unique files have been given the same number. There are also files which has the same name but are different versions that should have the number and files which have different names but the comment will show that they are supposed to have the same number as well.

In the example, files that have the same name or has a comment that starts with the string 'Replacing: ' should not have the number changed, but if the file has a different name but the same number as a previous file, I want the number of that file and every subsequent number to increase by one, meaning the end result here should be:

[12, 12, 12, 13, 13, 14, 15, 15, 16]

My idea was to check if each file has the same number as the previous in the list, and if it does, and the name of the file is not the same, and the comment does not start with the string 'Replacing: ', the value of the number and all following numbers will increase by one, but I am not sure how to write this code. Any help is really appreciated, thanks!

答案1

得分: 2

你可以 extract 文件名,并使用 fillna,然后 factorize,最后添加 min

df_test['number'] = pd.factorize(df_test['comment']
                                 .str.extract('Replacing: (.*)', expand=False)
                                 .fillna(df_test['file'])
                                )[0]+df_test['number'].min()

输出结果如下:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16
英文:

You can extract the file name, and fillna, then factorize and add the min:

df_test['number'] = pd.factorize(df_test['comment']
                                 .str.extract('Replacing: (.*)', expand=False)
                                 .fillna(df_test['file'])
                                )[0]+df_test['number'].min()

Output:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

答案2

得分: 1

这是您提供的代码的翻译部分:

import pandas as pd

# 您的数据
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

# 对DataFrame进行排序
df_test.sort_values(by='number', inplace=True)

# 初始化位移计数器
shift = 0

def compare_files(row, prev_row) -> bool:
    if row['comment'].startswith('Replacing:'):
        return True
    return row['file'] == prev_row['file']

for i in range(1, len(df_test)):
    # 应用位移
    df_test.loc[i, 'number'] += shift

    # 检查文件是否相同
    is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])

    # 检查数字是否相同
    is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']

    # 如果文件相同且数字相同,增加数字
    if not is_same_file and is_same_number:
        df_test.loc[i, 'number'] += 1
        shift += 1

print(df_test)

结果:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16
英文:
import pandas as pd

# Your data
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

# Sort the DataFrame
df_test.sort_values(by='number', inplace=True)

# Initialize the shift counter
shift = 0


def compare_files(row, prev_row) -> bool:
    if row['comment'].startswith('Replacing:'):
        return True
    return row['file'] == prev_row['file']


for i in range(1, len(df_test)):
    # Apply the shift
    df_test.loc[i, 'number'] += shift

    # Check if the file is the same
    is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])

    # Check if the number is the same
    is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']

    # If the file is the same and the number is the same, increment the number
    if not is_same_file and is_same_number:
        df_test.loc[i, 'number'] += 1
        shift += 1

print(df_test)

Result:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

huangapple
  • 本文由 发表于 2023年5月11日 18:22:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226579.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定