Add a value to each cell in a Pandas dataframe column if another column contains a certain string

huangapple go评论65阅读模式
英文:

Add a value to each cell in a Pandas dataframe column if another column contains a certain string

问题

I can help you with that. Here's the translated code snippet for your task:

import pandas as pd

# Your existing dataframe
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
df_test.number = ['12', '12', '13', '13', '14', '14', '15']

# Convert 'number' column to integers
df_test['number'] = df_test['number'].astype(int)

# Iterate through the dataframe and update 'number' accordingly
current_number = None
for index, row in df_test.iterrows():
    if row['comment'].startswith('Replacing: '):
        if current_number is not None:
            current_number += 1
            df_test.at[index, 'number'] = current_number
    else:
        current_number = row['number']

# Convert 'number' column back to strings if needed
# df_test['number'] = df_test['number'].astype(str)

# Resulting dataframe
print(df_test['number'].tolist())

This code will update the 'number' column as described in your request, and the result should be [12, 12, 13, 14, 15, 15, 16].

英文:

I have a very long and complicated Pandas dataframe in Python consisting of many columns, but an example would be something like:

df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
df_test.number = ['12', '12', '13', '13', '14', '14', '15']

What this shows is that the dataframe contains the names of several files which each has a comment and a number associated with them. Here, the files which has a comment that starts with 'Replacing: ' should have the same value in 'number' as the file it is replacing, but other files should not have the same number, as you can see that 'file_2' and 'file_3' has.

What I want to do is to increase the 'number' value whenever a duplicate of that value is found for both the duplicate and all files after that, as long as the 'comment' cell does not start with the string 'Replacing: '. This means that the 'number' column should end up looking like:

[12, 12, 13, 14, 15, 15, 16]

I figured it might work with a for- and if-loop, but I'm really not sure and any help would be appreciated, thanks!

答案1

得分: 1

(~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)

输出:

    0    12
    1    12
    2    13
    3    14
    4    15
    5    15
    6    16
英文:
(~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)

output:

0    12
1    12
2    13
3    14
4    15
5    15
6    16

huangapple
  • 本文由 发表于 2023年5月11日 16:24:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225556.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定