英文:
How to add the same value to all cells within a dataframe column (pandas in Python)
问题
I have an EXCEL table that I want to transfer into a dataframe matching our project's standard with 22 different columns. The original EXCEL table, however, only has 13 columns, so I am trying to add the missing ones to the dataframe I have read from the file.
However, this has caused several challenges:
-
When assigning an empty list
[]
to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows. -
When assigning
np.nan
to the dataframe, creating the joint dataframe with all required columns works perfectly:
f_unique.loc[:, "additional_info"] = np.nan
But having np.nan
in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.
So I have tried to replace np.nan
by a list containing the string "n/a":
grouped_df = grouped_df.replace(np.nan, ["n/a"])
However, this gives me the following error:
TypeError: Invalid "to_replace" type: 'float'
Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly? That would most likely solve the issue.
英文:
I have an EXCEL table that I want to transfer into a dataframe matching our project's standard with 22 different columns. The original EXCEL table, however, only has 13 columns, so I am trying to add the missing ones to the dataframe I have read from the file.
However, this has caused several challenges:
-
When assigning an empty list
[]
to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows. -
When assigning
np.nan
to the dataframe, creating the joint dataframe with all required columns works perfectly:
f_unique.loc[:, "additional_info"] = np.nan
But having np.nan
in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.
So I have tried to replace np.nan
by a list containing the string "n/a":
grouped_df = grouped_df.replace(np.nan, ["n/a"])
However, this gives me the following error:
TypeError: Invalid "to_replace" type: 'float'
Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly?
That would most likely solve the issue.
答案1
得分: 1
使用 DataFrame.reindex
与 Index.union
,对于新列使用以下填充列表:
df = pd.DataFrame({'a':range(3)})
new_cols = ['additional_info','new']
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
a additional_info new
0 0 [] []
1 1 [] []
2 2 [] []
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=['n/a'])
print (df)
a additional_info new
0 0 ['n/a'] ['n/a']
1 1 ['n/a'] ['n/a']
2 2 ['n/a'] ['n/a']
英文:
Use DataFrame.reindex
with Index.union
and for new columns use for filling list:
df = pd.DataFrame({'a':range(3)})
new_cols = ['additional_info','new']
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
a additional_info new
0 0 [] []
1 1 [] []
2 2 [] []
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=["n/a"])
print (df)
a additional_info new
0 0 [n/a] [n/a]
1 1 [n/a] [n/a]
2 2 [n/a] [n/a]
答案2
得分: 1
只 reindex
:
out = df.reindex(columns=list_of_cols)
如果你想要一个列表作为默认值(实际上你应该尽量避免这样做):
out = df.reindex(columns=list_of_cols, fill_value=['n/a'])
英文:
Just reindex
:
out = df.reindex(columns=list_of_cols)
If you want a list as default value (which you should really avoid):
out = df.reindex(columns=list_of_cols, fill_value=['n/a'])
答案3
得分: 1
在一天结束时,重新索引(参见答案)与多次将相同列表分配给数据框列相结合对我的用例效果最好:
df2 = f_unique.reindex(columns=column_names, fill_value="n/a")
# 填充一些空列数据
df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsausübung"] * 31414
多次添加相同的列表肯定不是最优雅的解决方案,但它起到了作用。
英文:
At the end of the day, reindexing (see answers) above in combination with assigning the same list multiple times to a dataframe column worked best for my use case:
df2 = f_unique.reindex(columns=column_names, fill_value="n/a")
# populate some of the empty columns with data
df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsausübung"] * 31414
Adding a list multiple times is certainly not the most elegant solution, but it did the trick.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论