根据多个条件更改Pandas数据框列中的值

huangapple go评论114阅读模式
英文:

Change value in a Pandas dataframe column based on several conditions

问题

我理解你的问题,你想要处理一个长的Pandas数据框,根据一些条件来更新'number'列的值。你的想法是检查每个文件的'number'是否与前一个文件相同,如果是,并且文件名不同且评论不以'Replacing:'开头,那么'number'和随后的所有'number'都应增加一。以下是可能的代码实现:

  1. import pandas as pd
  2. # 你的数据框
  3. df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
  4. df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  5. df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
  6. df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
  7. # 初始化一个变量来跟踪当前的'number'
  8. current_number = None
  9. # 遍历数据框的行
  10. for index, row in df_test.iterrows():
  11. # 如果当前'number'为None或与前一个不同,更新current_number
  12. if current_number is None or current_number != row['number']:
  13. current_number = row['number']
  14. else:
  15. # 如果'number'相同,检查文件名和评论以进行适当的增加
  16. if row['file'] != df_test.at[index - 1, 'file'] and not row['comment'].startswith('Replacing:'):
  17. current_number += 1
  18. # 更新'number'列的值
  19. df_test.at[index, 'number'] = current_number
  20. # 打印结果
  21. print(df_test['number'].tolist())

这段代码会按照你的描述来更新'number'列的值,得到期望的结果:[12, 12, 12, 13, 13, 14, 15, 15, 16]。希望这对你有帮助!

英文:

What I have is a long Pandas dataframe in Python that contains three columns named 'file', 'comment', and 'number'. A simple example is:

  1. import pandas as pd
  2. df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
  3. df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  4. df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
  5. df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

Each file should have a unique number associated with it, but it currently has numerous errors where many unique files have been given the same number. There are also files which has the same name but are different versions that should have the number and files which have different names but the comment will show that they are supposed to have the same number as well.

In the example, files that have the same name or has a comment that starts with the string 'Replacing: ' should not have the number changed, but if the file has a different name but the same number as a previous file, I want the number of that file and every subsequent number to increase by one, meaning the end result here should be:

[12, 12, 12, 13, 13, 14, 15, 15, 16]

My idea was to check if each file has the same number as the previous in the list, and if it does, and the name of the file is not the same, and the comment does not start with the string 'Replacing: ', the value of the number and all following numbers will increase by one, but I am not sure how to write this code. Any help is really appreciated, thanks!

答案1

得分: 2

你可以 extract 文件名,并使用 fillna,然后 factorize,最后添加 min

  1. df_test['number'] = pd.factorize(df_test['comment']
  2. .str.extract('Replacing: (.*)', expand=False)
  3. .fillna(df_test['file'])
  4. )[0]+df_test['number'].min()

输出结果如下:

  1. file comment number
  2. 0 file_1 none: 2 12
  3. 1 file_1 old 12
  4. 2 file_1_v2 Replacing: file_1 12
  5. 3 file_2 v1 13
  6. 4 file_2 v2 13
  7. 5 file_3 none 14
  8. 6 file_4 old 15
  9. 7 file_4_v2 Replacing: file_4 15
  10. 8 file_5 none 16
英文:

You can extract the file name, and fillna, then factorize and add the min:

  1. df_test['number'] = pd.factorize(df_test['comment']
  2. .str.extract('Replacing: (.*)', expand=False)
  3. .fillna(df_test['file'])
  4. )[0]+df_test['number'].min()

Output:

  1. file comment number
  2. 0 file_1 none: 2 12
  3. 1 file_1 old 12
  4. 2 file_1_v2 Replacing: file_1 12
  5. 3 file_2 v1 13
  6. 4 file_2 v2 13
  7. 5 file_3 none 14
  8. 6 file_4 old 15
  9. 7 file_4_v2 Replacing: file_4 15
  10. 8 file_5 none 16

答案2

得分: 1

这是您提供的代码的翻译部分:

  1. import pandas as pd
  2. # 您的数据
  3. df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
  4. df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  5. df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
  6. df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
  7. # 对DataFrame进行排序
  8. df_test.sort_values(by='number', inplace=True)
  9. # 初始化位移计数器
  10. shift = 0
  11. def compare_files(row, prev_row) -> bool:
  12. if row['comment'].startswith('Replacing:'):
  13. return True
  14. return row['file'] == prev_row['file']
  15. for i in range(1, len(df_test)):
  16. # 应用位移
  17. df_test.loc[i, 'number'] += shift
  18. # 检查文件是否相同
  19. is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
  20. # 检查数字是否相同
  21. is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']
  22. # 如果文件相同且数字相同,增加数字
  23. if not is_same_file and is_same_number:
  24. df_test.loc[i, 'number'] += 1
  25. shift += 1
  26. print(df_test)

结果:

  1. file comment number
  2. 0 file_1 none: 2 12
  3. 1 file_1 old 12
  4. 2 file_1_v2 Replacing: file_1 12
  5. 3 file_2 v1 13
  6. 4 file_2 v2 13
  7. 5 file_3 none 14
  8. 6 file_4 old 15
  9. 7 file_4_v2 Replacing: file_4 15
  10. 8 file_5 none 16
英文:
  1. import pandas as pd
  2. # Your data
  3. df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
  4. df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  5. df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
  6. df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
  7. # Sort the DataFrame
  8. df_test.sort_values(by='number', inplace=True)
  9. # Initialize the shift counter
  10. shift = 0
  11. def compare_files(row, prev_row) -> bool:
  12. if row['comment'].startswith('Replacing:'):
  13. return True
  14. return row['file'] == prev_row['file']
  15. for i in range(1, len(df_test)):
  16. # Apply the shift
  17. df_test.loc[i, 'number'] += shift
  18. # Check if the file is the same
  19. is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
  20. # Check if the number is the same
  21. is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']
  22. # If the file is the same and the number is the same, increment the number
  23. if not is_same_file and is_same_number:
  24. df_test.loc[i, 'number'] += 1
  25. shift += 1
  26. print(df_test)

Result:

  1. file comment number
  2. 0 file_1 none: 2 12
  3. 1 file_1 old 12
  4. 2 file_1_v2 Replacing: file_1 12
  5. 3 file_2 v1 13
  6. 4 file_2 v2 13
  7. 5 file_3 none 14
  8. 6 file_4 old 15
  9. 7 file_4_v2 Replacing: file_4 15
  10. 8 file_5 none 16

huangapple
  • 本文由 发表于 2023年5月11日 18:22:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226579.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定