2023年5月11日 18:22:27go评论114阅读模式

英文:

Change value in a Pandas dataframe column based on several conditions

问题

我理解你的问题，你想要处理一个长的Pandas数据框，根据一些条件来更新'number'列的值。你的想法是检查每个文件的'number'是否与前一个文件相同，如果是，并且文件名不同且评论不以'Replacing:'开头，那么'number'和随后的所有'number'都应增加一。以下是可能的代码实现：

import pandas as pd
# 你的数据框
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# 初始化一个变量来跟踪当前的'number'
current_number = None
# 遍历数据框的行
for index, row in df_test.iterrows():
    # 如果当前'number'为None或与前一个不同，更新current_number
    if current_number is None or current_number != row['number']:
        current_number = row['number']
    else:
        # 如果'number'相同，检查文件名和评论以进行适当的增加
        if row['file'] != df_test.at[index - 1, 'file'] and not row['comment'].startswith('Replacing:'):
            current_number += 1
    # 更新'number'列的值
    df_test.at[index, 'number'] = current_number
# 打印结果
print(df_test['number'].tolist())

这段代码会按照你的描述来更新'number'列的值，得到期望的结果：[12, 12, 12, 13, 13, 14, 15, 15, 16]。希望这对你有帮助！

英文:

What I have is a long Pandas dataframe in Python that contains three columns named 'file', 'comment', and 'number'. A simple example is:

import pandas as pd
df_test = pd.DataFrame(data = None, columns = [&#39;file&#39;,&#39;comment&#39;,&#39;number&#39;])
df_test.file = [&#39;file_1&#39;, &#39;file_1&#39;, &#39;file_1_v2&#39;, &#39;file_2&#39;, &#39;file_2&#39;, &#39;file_3&#39;, &#39;file_4&#39;, &#39;file_4_v2&#39;, &#39;file_5&#39;]
df_test.comment = [&#39;none: 2&#39;, &#39;old&#39;, &#39;Replacing: file_1&#39;, &#39;v1&#39;, &#39;v2&#39;, &#39;none&#39;, &#39;old&#39;, &#39;Replacing: file_4&#39;, &#39;none&#39;]
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]

Each file should have a unique number associated with it, but it currently has numerous errors where many unique files have been given the same number. There are also files which has the same name but are different versions that should have the number and files which have different names but the comment will show that they are supposed to have the same number as well.

In the example, files that have the same name or has a comment that starts with the string 'Replacing: ' should not have the number changed, but if the file has a different name but the same number as a previous file, I want the number of that file and every subsequent number to increase by one, meaning the end result here should be:

[12, 12, 12, 13, 13, 14, 15, 15, 16]

My idea was to check if each file has the same number as the previous in the list, and if it does, and the name of the file is not the same, and the comment does not start with the string 'Replacing: ', the value of the number and all following numbers will increase by one, but I am not sure how to write this code. Any help is really appreciated, thanks!

答案1

得分: 2

你可以 extract 文件名，并使用 fillna，然后 factorize，最后添加 min：

df_test['number'] = pd.factorize(df_test['comment']
                                 .str.extract('Replacing: (.*)', expand=False)
                                 .fillna(df_test['file'])
                                )[0]+df_test['number'].min()

输出结果如下：

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

英文:

You can extract the file name, and fillna, then factorize and add the min:

df_test[&#39;number&#39;] = pd.factorize(df_test[&#39;comment&#39;]
                                 .str.extract(&#39;Replacing: (.*)&#39;, expand=False)
                                 .fillna(df_test[&#39;file&#39;])
                                )[0]+df_test[&#39;number&#39;].min()

Output:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

答案2

得分: 1

这是您提供的代码的翻译部分：

import pandas as pd
# 您的数据
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1', 'file_1_v2', 'file_2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['none: 2', 'old', 'Replacing: file_1', 'v1', 'v2', 'none', 'old', 'Replacing: file_4', 'none']
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# 对DataFrame进行排序
df_test.sort_values(by='number', inplace=True)
# 初始化位移计数器
shift = 0
def compare_files(row, prev_row) -> bool:
    if row['comment'].startswith('Replacing:'):
        return True
    return row['file'] == prev_row['file']
for i in range(1, len(df_test)):
    # 应用位移
    df_test.loc[i, 'number'] += shift
    # 检查文件是否相同
    is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
    # 检查数字是否相同
    is_same_number = df_test.loc[i, 'number'] == df_test.loc[i - 1, 'number']
    # 如果文件相同且数字相同，增加数字
    if not is_same_file and is_same_number:
        df_test.loc[i, 'number'] += 1
        shift += 1
print(df_test)

结果：

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

英文:

import pandas as pd
# Your data
df_test = pd.DataFrame(data=None, columns=[&#39;file&#39;, &#39;comment&#39;, &#39;number&#39;])
df_test.file = [&#39;file_1&#39;, &#39;file_1&#39;, &#39;file_1_v2&#39;, &#39;file_2&#39;, &#39;file_2&#39;, &#39;file_3&#39;, &#39;file_4&#39;, &#39;file_4_v2&#39;, &#39;file_5&#39;]
df_test.comment = [&#39;none: 2&#39;, &#39;old&#39;, &#39;Replacing: file_1&#39;, &#39;v1&#39;, &#39;v2&#39;, &#39;none&#39;, &#39;old&#39;, &#39;Replacing: file_4&#39;, &#39;none&#39;]
df_test.number = [12, 12, 12, 13, 13, 13, 14, 14, 15]
# Sort the DataFrame
df_test.sort_values(by=&#39;number&#39;, inplace=True)
# Initialize the shift counter
shift = 0
def compare_files(row, prev_row) -&gt; bool:
    if row[&#39;comment&#39;].startswith(&#39;Replacing:&#39;):
        return True
    return row[&#39;file&#39;] == prev_row[&#39;file&#39;]
for i in range(1, len(df_test)):
    # Apply the shift
    df_test.loc[i, &#39;number&#39;] += shift
    # Check if the file is the same
    is_same_file = compare_files(df_test.loc[i], df_test.loc[i - 1])
    # Check if the number is the same
    is_same_number = df_test.loc[i, &#39;number&#39;] == df_test.loc[i - 1, &#39;number&#39;]
    # If the file is the same and the number is the same, increment the number
    if not is_same_file and is_same_number:
        df_test.loc[i, &#39;number&#39;] += 1
        shift += 1
print(df_test)

Result:

        file            comment  number
0     file_1            none: 2      12
1     file_1                old      12
2  file_1_v2  Replacing: file_1      12
3     file_2                 v1      13
4     file_2                 v2      13
5     file_3               none      14
6     file_4                old      15
7  file_4_v2  Replacing: file_4      15
8     file_5               none      16

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据多个条件更改Pandas数据框列中的值

问题

答案1

答案2

Jupyter Notebook导出时省略了Markdown。

如何融化数据框并列出列下的单词？

使用唯一的分组从路径中移除文件。

在尝试在Django中创建订单时出现错误。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。