英文:
Add a value to each cell in a Pandas dataframe column if another column contains a certain string
问题
I can help you with that. Here's the translated code snippet for your task:
import pandas as pd
# Your existing dataframe
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
df_test.number = ['12', '12', '13', '13', '14', '14', '15']
# Convert 'number' column to integers
df_test['number'] = df_test['number'].astype(int)
# Iterate through the dataframe and update 'number' accordingly
current_number = None
for index, row in df_test.iterrows():
if row['comment'].startswith('Replacing: '):
if current_number is not None:
current_number += 1
df_test.at[index, 'number'] = current_number
else:
current_number = row['number']
# Convert 'number' column back to strings if needed
# df_test['number'] = df_test['number'].astype(str)
# Resulting dataframe
print(df_test['number'].tolist())
This code will update the 'number' column as described in your request, and the result should be [12, 12, 13, 14, 15, 15, 16]
.
英文:
I have a very long and complicated Pandas dataframe in Python consisting of many columns, but an example would be something like:
df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
df_test.number = ['12', '12', '13', '13', '14', '14', '15']
What this shows is that the dataframe contains the names of several files which each has a comment and a number associated with them. Here, the files which has a comment that starts with 'Replacing: ' should have the same value in 'number' as the file it is replacing, but other files should not have the same number, as you can see that 'file_2' and 'file_3' has.
What I want to do is to increase the 'number' value whenever a duplicate of that value is found for both the duplicate and all files after that, as long as the 'comment' cell does not start with the string 'Replacing: '. This means that the 'number' column should end up looking like:
[12, 12, 13, 14, 15, 15, 16]
I figured it might work with a for- and if-loop, but I'm really not sure and any help would be appreciated, thanks!
答案1
得分: 1
(~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)
输出:
0 12
1 12
2 13
3 14
4 15
5 15
6 16
英文:
(~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)
output:
0 12
1 12
2 13
3 14
4 15
5 15
6 16
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论