2023年5月25日 05:54:34go评论165阅读模式

英文:

Trying to fill a cell with pandas with None and/or none but return a string

问题

I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            x = []
            for i in range(0, len(df.columns)):
                x.append(None)
            new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
            new_rows.append(dict(new_row))

        new_rows.append(dict(row))
        previous_value = current_value
        
    new_df = pd.DataFrame(new_rows, columns = df.columns)
    return new_df

It isn't the most elegant but kinda works in exception that in place of the nothing we expected with None-type variables, but it returned to me a string.

I also tried to substitute None with np.nan or np.NaN and all returned the following string:

0   NaN
Name: NUM_DOCUMENTO, dtype: float64

I also tried to not use a list but put it directly in the dataframe, but didn't matter.

Can someone explain to me what I am doing wrong, please?

英文:

I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            x = []
            for i in range(0, len(df.columns)):
                x.append(None)
            new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
            new_rows.append(dict(new_row))

        new_rows.append(dict(row))
        previous_value = current_value
        
    new_df = pd.DataFrame(new_rows, columns = df.columns)
    return new_df

It isn't the most elegant but kinda works in exception that in place of the nothing we spected with None-type variables, but it returned to me a string.

I also tried to substitute None with np.nan or np.NaN and all returned the following string:

0   NaN
Name: NUM_DOCUMENTO, dtype: float64

I also tried to not use a list, but put it directly in the dataframe, but didn't matter.

Can someone explain to me what I am doing wrong, please?

答案1

得分: 1

Sure, here is the translated content:

"IIUC, you can fix/adjust your custom func this way:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            new_rows.append(pd.Series([''] * len(df.columns), index=df.columns, name=''))

        new_rows.append(row)
        previous_value = current_value

    new_df = pd.DataFrame(new_rows, columns=df.columns)

    return new_df

Another variant with groupby and concat:

def add_nan_rows(df, column):
    new_df = (
        df.groupby(column, sort=False, group_keys=False)
            .apply(lambda g: pd.concat([g, pd.DataFrame('', columns=df.columns, index=[''])]))
            .iloc[:-1]
    )
    return new_df

Test/Output:

out = add_nan_rows(df, 'DUMMY_COL')

print(out)

  NUM_DOCUMENTO DUMMY_COL
0             1       foo
1             1       foo

2             2       bar

3             2       baz
4             3       baz

5             3       qux

Excel view: 尝试使用Pandas填充单元格为None和/或none，但返回一个字符串。

Input used:

df = pd.DataFrame({
    'NUM_DOCUMENTO': [1, 1, 2, 2, 3, 3],
    'DUMMY_COL': ['foo', 'foo', 'bar', 'baz', 'baz', 'qux']}
)

英文:

IIUC, you can fix/adjust your custom func this way :

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            new_rows.append(pd.Series([&quot;&quot;] * len(df.columns), index=df.columns, name=&quot;&quot;))

        new_rows.append(row)
        previous_value = current_value
    
    new_df = pd.DataFrame(new_rows, columns=df.columns)

    return new_df

Another variant with groupby and concat :

def add_nan_rows(df, column):
    new_df = (
        df.groupby(column, sort=False, group_keys=False)
            .apply(lambda g: pd.concat([g, pd.DataFrame(&quot;&quot;, columns=df.columns, index=[&quot;&quot;])]))
            .iloc[:-1]
    )
    return new_df

Test/Output :

out = add_nan_rows(df, &quot;DUMMY_COL&quot;)

print(out)

  NUM_DOCUMENTO DUMMY_COL
0             1       foo
1             1       foo
                         
2             2       bar
                         
3             2       baz
4             3       baz
                         
5             3       qux

Excel view :

Input used :

df = pd.DataFrame({
    &quot;NUM_DOCUMENTO&quot;: [1, 1, 2, 2, 3, 3],
    &quot;DUMMY_COL&quot;: [&quot;foo&quot;, &quot;foo&quot;, &quot;bar&quot;, &quot;baz&quot;, &quot;baz&quot;, &quot;qux&quot;]}
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

尝试使用Pandas填充单元格为None和/或none，但返回一个字符串。

问题

答案1

有人可以帮我使用Python和BS4正确地抓取YouTube标题吗？

如何使用Graph API从FastAPI应用程序上传文件到Facebook页面？

Path not found when running webui-user.bat.

Python Shiny：如何使用两个按钮切换条件面板的可见性？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论