英文:
Trying to fill a cell with pandas with None and/or none but return a string
问题
I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:
def add_nan_rows(df, column):
new_rows = []
previous_value = None
for _, row in df.iterrows():
current_value = row[column]
if previous_value is not None and current_value != previous_value:
x = []
for i in range(0, len(df.columns)):
x.append(None)
new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
new_rows.append(dict(new_row))
new_rows.append(dict(row))
previous_value = current_value
new_df = pd.DataFrame(new_rows, columns = df.columns)
return new_df
It isn't the most elegant but kinda works in exception that in place of the nothing we expected with None-type variables, but it returned to me a string.
I also tried to substitute None with np.nan or np.NaN and all returned the following string:
0 NaN
Name: NUM_DOCUMENTO, dtype: float64
I also tried to not use a list but put it directly in the dataframe, but didn't matter.
Can someone explain to me what I am doing wrong, please?
英文:
I'm trying to use Python 3.11 and pandas 2.0.1 to automate a process in work that needs to be a .xlsx file, but I'm facing some issues. I developed the following function:
def add_nan_rows(df, column):
new_rows = []
previous_value = None
for _, row in df.iterrows():
current_value = row[column]
if previous_value is not None and current_value != previous_value:
x = []
for i in range(0, len(df.columns)):
x.append(None)
new_row = pd.DataFrame([pd.Series(x)], columns=df.columns)
new_rows.append(dict(new_row))
new_rows.append(dict(row))
previous_value = current_value
new_df = pd.DataFrame(new_rows, columns = df.columns)
return new_df
It isn't the most elegant but kinda works in exception that in place of the nothing we spected with None-type variables, but it returned to me a string.
I also tried to substitute None with np.nan or np.NaN and all returned the following string:
0 NaN
Name: NUM_DOCUMENTO, dtype: float64
I also tried to not use a list, but put it directly in the dataframe, but didn't matter.
Can someone explain to me what I am doing wrong, please?
答案1
得分: 1
Sure, here is the translated content:
"IIUC, you can fix/adjust your custom func
this way:
def add_nan_rows(df, column):
new_rows = []
previous_value = None
for _, row in df.iterrows():
current_value = row[column]
if previous_value is not None and current_value != previous_value:
new_rows.append(pd.Series([''] * len(df.columns), index=df.columns, name=''))
new_rows.append(row)
previous_value = current_value
new_df = pd.DataFrame(new_rows, columns=df.columns)
return new_df
Another variant with groupby
and concat
:
def add_nan_rows(df, column):
new_df = (
df.groupby(column, sort=False, group_keys=False)
.apply(lambda g: pd.concat([g, pd.DataFrame('', columns=df.columns, index=[''])]))
.iloc[:-1]
)
return new_df
Test/Output:
out = add_nan_rows(df, 'DUMMY_COL')
print(out)
NUM_DOCUMENTO DUMMY_COL
0 1 foo
1 1 foo
2 2 bar
3 2 baz
4 3 baz
5 3 qux
Excel view:
Input used:
df = pd.DataFrame({
'NUM_DOCUMENTO': [1, 1, 2, 2, 3, 3],
'DUMMY_COL': ['foo', 'foo', 'bar', 'baz', 'baz', 'qux']}
)
英文:
IIUC, you can fix/adjust your custom func
this way :
def add_nan_rows(df, column):
new_rows = []
previous_value = None
for _, row in df.iterrows():
current_value = row[column]
if previous_value is not None and current_value != previous_value:
new_rows.append(pd.Series([""] * len(df.columns), index=df.columns, name=""))
new_rows.append(row)
previous_value = current_value
new_df = pd.DataFrame(new_rows, columns=df.columns)
return new_df
Another variant with groupby
and concat
:
def add_nan_rows(df, column):
new_df = (
df.groupby(column, sort=False, group_keys=False)
.apply(lambda g: pd.concat([g, pd.DataFrame("", columns=df.columns, index=[""])]))
.iloc[:-1]
)
return new_df
Test/Output :
out = add_nan_rows(df, "DUMMY_COL")
print(out)
NUM_DOCUMENTO DUMMY_COL
0 1 foo
1 1 foo
2 2 bar
3 2 baz
4 3 baz
5 3 qux
Excel view :
Input used :
df = pd.DataFrame({
"NUM_DOCUMENTO": [1, 1, 2, 2, 3, 3],
"DUMMY_COL": ["foo", "foo", "bar", "baz", "baz", "qux"]}
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论