英文:
Change value in a Pandas dataframe column based on several conditions
问题
我理解你的问题,你想要处理一个长的Pandas数据框,根据一些条件来更新'number'列的值。你的想法是检查每个文件的'number'是否与前一个文件相同,如果是,并且文件名不同且评论不以'Replacing:'开头,那么'number'和随后的所有'number'都应增加一。以下是可能的代码实现:
import pandas as pd
# 你的数据框
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# 初始化一个变量来跟踪当前的'number'
current_number = None
# 遍历数据框的行
for index, row in df_test.iterrows():
# 如果当前'number'为None或与前一个不同,更新current_number
if current_number is None or current_number != row['number']:
current_number = row['number']
else:
# 如果'number'相同,检查文件名和评论以进行适当的增加
if row['file'] != df_test.at[index - 1, 'file'] and not row['comment'].startswith('Replacing:'):
current_number += 1
# 更新'number'列的值
df_test.at[index, 'number'] = current_number
# 打印结果
print(df_test['number'].tolist())
这段代码会按照你的描述来更新'number'列的值,得到期望的结果:[12, 12, 12, 13, 13, 14, 15, 15, 16]。希望这对你有帮助!
英文:
What I have is a long Pandas dataframe in Python that contains three columns named 'file', 'comment', and 'number'. A simple example is:
import pandas as pd
df_test = pd.DataFrame(data = None, columns = ['file','comment','number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
Each file should have a unique number associated with it, but it currently has numerous errors where many unique files have been given the same number. There are also files which has the same name but are different versions that should have the number and files which have different names but the comment will show that they are supposed to have the same number as well.
In the example, files that have the same name or has a comment that starts with the string 'Replacing: ' should not have the number changed, but if the file has a different name but the same number as a previous file, I want the number of that file and every subsequent number to increase by one, meaning the end result here should be:
[12, 12, 12, 13, 13, 14, 15, 15, 16]
My idea was to check if each file has the same number as the previous in the list, and if it does, and the name of the file is not the same, and the comment does not start with the string 'Replacing: ', the value of the number and all following numbers will increase by one, but I am not sure how to write this code. Any help is really appreciated, thanks!
答案1
得分: 2
你可以 extract
文件名,并使用 fillna
,然后 factorize
,最后添加 min
:
df_test['number'] = pd.factorize(df_test['comment']
.str.extract('Replacing: (.*)', expand=False)
.fillna(df_test['file'])
)[0]+df_test['number'].min()
输出结果如下:
file comment number
0 file_1 none: 2 12
1 file_1 old 12
2 file_1_v2 Replacing: file_1 12
3 file_2 v1 13
4 file_2 v2 13
5 file_3 none 14
6 file_4 old 15
7 file_4_v2 Replacing: file_4 15
8 file_5 none 16
英文:
You can extract
the file name, and fillna
, then factorize
and add the min
:
df_test['number'] = pd.factorize(df_test['comment']
.str.extract('Replacing: (.*)', expand=False)
.fillna(df_test['file'])
)[0]+df_test['number'].min()
Output:
file comment number
0 file_1 none: 2 12
1 file_1 old 12
2 file_1_v2 Replacing: file_1 12
3 file_2 v1 13
4 file_2 v2 13
5 file_3 none 14
6 file_4 old 15
7 file_4_v2 Replacing: file_4 15
8 file_5 none 16
答案2
得分: 1
这是您提供的代码的翻译部分:
import pandas as pd
# 您的数据
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# 对DataFrame进行排序
df_test.sort_values(by='number', inplace=True)
# 初始化位移计数器
shift = 0
def compare_files(row, prev_row) -> bool:
if row['comment'].startswith('Replacing:'):
return True
return row['file'] == prev_row['file']
for i in range(1, len(df_test)):
# 应用位移
df_test.loc[i, 'number'] += shift
# 检查文件是否相同
is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
# 检查数字是否相同
is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']
# 如果文件相同且数字相同,增加数字
if not is_same_file and is_same_number:
df_test.loc[i, 'number'] += 1
shift += 1
print(df_test)
结果:
file comment number
0 file_1 none: 2 12
1 file_1 old 12
2 file_1_v2 Replacing: file_1 12
3 file_2 v1 13
4 file_2 v2 13
5 file_3 none 14
6 file_4 old 15
7 file_4_v2 Replacing: file_4 15
8 file_5 none 16
英文:
import pandas as pd
# Your data
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# Sort the DataFrame
df_test.sort_values(by='number', inplace=True)
# Initialize the shift counter
shift = 0
def compare_files(row, prev_row) -> bool:
if row['comment'].startswith('Replacing:'):
return True
return row['file'] == prev_row['file']
for i in range(1, len(df_test)):
# Apply the shift
df_test.loc[i, 'number'] += shift
# Check if the file is the same
is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
# Check if the number is the same
is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']
# If the file is the same and the number is the same, increment the number
if not is_same_file and is_same_number:
df_test.loc[i, 'number'] += 1
shift += 1
print(df_test)
Result:
file comment number
0 file_1 none: 2 12
1 file_1 old 12
2 file_1_v2 Replacing: file_1 12
3 file_2 v1 13
4 file_2 v2 13
5 file_3 none 14
6 file_4 old 15
7 file_4_v2 Replacing: file_4 15
8 file_5 none 16
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论