2023年5月11日 16:24:04go评论92阅读模式

英文:

Add a value to each cell in a Pandas dataframe column if another column contains a certain string

问题

I can help you with that. Here's the translated code snippet for your task:

import pandas as pd
# Your existing dataframe
df_test = pd.DataFrame(data=None, columns=['file', 'comment', 'number'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_3', 'file_4', 'file_4_v2', 'file_5']
df_test.comment = ['old', 'Replacing: file_1', 'none', 'new: 3', 'maybe', 'Replacing: file_4', 'none']
df_test.number = ['12', '12', '13', '13', '14', '14', '15']
# Convert 'number' column to integers
df_test['number'] = df_test['number'].astype(int)
# Iterate through the dataframe and update 'number' accordingly
current_number = None
for index, row in df_test.iterrows():
    if row['comment'].startswith('Replacing: '):
        if current_number is not None:
            current_number += 1
            df_test.at[index, 'number'] = current_number
    else:
        current_number = row['number']
# Convert 'number' column back to strings if needed
# df_test['number'] = df_test['number'].astype(str)
# Resulting dataframe
print(df_test['number'].tolist())

This code will update the 'number' column as described in your request, and the result should be [12, 12, 13, 14, 15, 15, 16].

英文:

I have a very long and complicated Pandas dataframe in Python consisting of many columns, but an example would be something like:

df_test = pd.DataFrame(data = None, columns = [&#39;file&#39;,&#39;comment&#39;,&#39;number&#39;])
df_test.file = [&#39;file_1&#39;, &#39;file_1_v2&#39;, &#39;file_2&#39;, &#39;file_3&#39;, &#39;file_4&#39;, &#39;file_4_v2&#39;, &#39;file_5&#39;]
df_test.comment = [&#39;old&#39;, &#39;Replacing: file_1&#39;, &#39;none&#39;, &#39;new: 3&#39;, &#39;maybe&#39;, &#39;Replacing: file_4&#39;, &#39;none&#39;]
df_test.number = [&#39;12&#39;, &#39;12&#39;, &#39;13&#39;, &#39;13&#39;, &#39;14&#39;, &#39;14&#39;, &#39;15&#39;]

What this shows is that the dataframe contains the names of several files which each has a comment and a number associated with them. Here, the files which has a comment that starts with 'Replacing: ' should have the same value in 'number' as the file it is replacing, but other files should not have the same number, as you can see that 'file_2' and 'file_3' has.

What I want to do is to increase the 'number' value whenever a duplicate of that value is found for both the duplicate and all files after that, as long as the 'comment' cell does not start with the string 'Replacing: '. This means that the 'number' column should end up looking like:

[12, 12, 13, 14, 15, 15, 16]

I figured it might work with a for- and if-loop, but I'm really not sure and any help would be appreciated, thanks!

答案1

得分: 1

(~df_test['comment'].str.startswith('Replacing:')).cumsum().add(11)
输出：
    0    12
    1    12
    2    13
    3    14
    4    15
    5    15
    6    16

英文:

(~df_test[&#39;comment&#39;].str.startswith(&#39;Replacing:&#39;)).cumsum().add(11)

output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Add a value to each cell in a Pandas dataframe column if another column contains a certain string

问题

答案1

Polars Series.to_numpy() 不返回 ndarray。

继承/子类化自 `tuple`，具有正确的索引和切片功能。

使用条件对一列进行汇总，并返回一个新行，其中包含汇总后的值。

Anaconda Navigator无法启动Jupyter Notebook。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。