尝试使用Pandas填充单元格为None和/或none,但返回一个字符串。

huangapple go评论136阅读模式
英文:

Trying to fill a cell with pandas with None and/or none but return a string

问题

I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            x = []
            for i in range(0, len(df.columns)):
                x.append(None)
            new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
            new_rows.append(dict(new_row))

        new_rows.append(dict(row))
        previous_value = current_value
        
    new_df = pd.DataFrame(new_rows, columns = df.columns)
    return new_df

It isn't the most elegant but kinda works in exception that in place of the nothing we expected with None-type variables, but it returned to me a string.

I also tried to substitute None with np.nan or np.NaN and all returned the following string:

0   NaN
Name: NUM_DOCUMENTO, dtype: float64

I also tried to not use a list but put it directly in the dataframe, but didn't matter.

Can someone explain to me what I am doing wrong, please?

英文:

I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            x = []
            for i in range(0, len(df.columns)):
                x.append(None)
            new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
            new_rows.append(dict(new_row))

        new_rows.append(dict(row))
        previous_value = current_value
        
    new_df = pd.DataFrame(new_rows, columns = df.columns)
    return new_df

It isn't the most elegant but kinda works in exception that in place of the nothing we spected with None-type variables, but it returned to me a string.

I also tried to substitute None with np.nan or np.NaN and all returned the following string:

0   NaN
Name: NUM_DOCUMENTO, dtype: float64

I also tried to not use a list, but put it directly in the dataframe, but didn't matter.

Can someone explain to me what I am doing wrong, please?

答案1

得分: 1

Sure, here is the translated content:

"IIUC, you can fix/adjust your custom func this way:

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            new_rows.append(pd.Series([''] * len(df.columns), index=df.columns, name=''))

        new_rows.append(row)
        previous_value = current_value

    new_df = pd.DataFrame(new_rows, columns=df.columns)

    return new_df

Another variant with groupby and concat:

def add_nan_rows(df, column):
    new_df = (
        df.groupby(column, sort=False, group_keys=False)
            .apply(lambda g: pd.concat([g, pd.DataFrame('', columns=df.columns, index=[''])]))
            .iloc[:-1]
    )
    return new_df

Test/Output:

out = add_nan_rows(df, 'DUMMY_COL')

print(out)

  NUM_DOCUMENTO DUMMY_COL
0             1       foo
1             1       foo

2             2       bar

3             2       baz
4             3       baz

5             3       qux

Excel view: 尝试使用Pandas填充单元格为None和/或none,但返回一个字符串。

Input used:

df = pd.DataFrame({
    'NUM_DOCUMENTO': [1, 1, 2, 2, 3, 3],
    'DUMMY_COL': ['foo', 'foo', 'bar', 'baz', 'baz', 'qux']}
)
英文:

IIUC, you can fix/adjust your custom func this way :

def add_nan_rows(df, column):
    new_rows = []
    previous_value = None

    for _, row in df.iterrows():
        current_value = row[column]

        if previous_value is not None and current_value != previous_value:
            new_rows.append(pd.Series([""] * len(df.columns), index=df.columns, name=""))

        new_rows.append(row)
        previous_value = current_value
    
    new_df = pd.DataFrame(new_rows, columns=df.columns)

    return new_df

Another variant with groupby and concat :

def add_nan_rows(df, column):
    new_df = (
        df.groupby(column, sort=False, group_keys=False)
            .apply(lambda g: pd.concat([g, pd.DataFrame("", columns=df.columns, index=[""])]))
            .iloc[:-1]
    )
    return new_df

Test/Output :

out = add_nan_rows(df, "DUMMY_COL")
​
print(out)

  NUM_DOCUMENTO DUMMY_COL
0             1       foo
1             1       foo
                         
2             2       bar
                         
3             2       baz
4             3       baz
                         
5             3       qux

Excel view :

尝试使用Pandas填充单元格为None和/或none,但返回一个字符串。

Input used :

df = pd.DataFrame({
    "NUM_DOCUMENTO": [1, 1, 2, 2, 3, 3],
    "DUMMY_COL": ["foo", "foo", "bar", "baz", "baz", "qux"]}
)

huangapple
  • 本文由 发表于 2023年5月25日 05:54:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327639.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定