Add a value to each cell in a Pandas dataframe column if another column contains a certain string

huangapple go评论92阅读模式
英文:

Add a value to each cell in a Pandas dataframe column if another column contains a certain string

问题

I can help you with that. Here's the translated code snippet for your task:

  1. import pandas as pd
  2. # Your existing dataframe
  3. df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
  4. df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  5. df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
  6. df_test.number = ['12', '12', '13', '13', '14', '14', '15']
  7. # Convert 'number' column to integers
  8. df_test['number'] = df_test['number'].astype(int)
  9. # Iterate through the dataframe and update 'number' accordingly
  10. current_number = None
  11. for index, row in df_test.iterrows():
  12. if row['comment'].startswith('Replacing: '):
  13. if current_number is not None:
  14. current_number += 1
  15. df_test.at[index, 'number'] = current_number
  16. else:
  17. current_number = row['number']
  18. # Convert 'number' column back to strings if needed
  19. # df_test['number'] = df_test['number'].astype(str)
  20. # Resulting dataframe
  21. print(df_test['number'].tolist())

This code will update the 'number' column as described in your request, and the result should be [12, 12, 13, 14, 15, 15, 16].

英文:

I have a very long and complicated Pandas dataframe in Python consisting of many columns, but an example would be something like:

  1. df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
  2. df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
  3. df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
  4. df_test.number = ['12', '12', '13', '13', '14', '14', '15']

What this shows is that the dataframe contains the names of several files which each has a comment and a number associated with them. Here, the files which has a comment that starts with 'Replacing: ' should have the same value in 'number' as the file it is replacing, but other files should not have the same number, as you can see that 'file_2' and 'file_3' has.

What I want to do is to increase the 'number' value whenever a duplicate of that value is found for both the duplicate and all files after that, as long as the 'comment' cell does not start with the string 'Replacing: '. This means that the 'number' column should end up looking like:

  1. [12, 12, 13, 14, 15, 15, 16]

I figured it might work with a for- and if-loop, but I'm really not sure and any help would be appreciated, thanks!

答案1

得分: 1

  1. (~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)
  2. 输出:
  3. 0 12
  4. 1 12
  5. 2 13
  6. 3 14
  7. 4 15
  8. 5 15
  9. 6 16
英文:
  1. (~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)

output:

  1. 0 12
  2. 1 12
  3. 2 13
  4. 3 14
  5. 4 15
  6. 5 15
  7. 6 16

huangapple
  • 本文由 发表于 2023年5月11日 16:24:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225556.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定