如何将相同的值添加到DataFrame列中的所有单元格(Python中的pandas)

huangapple go评论59阅读模式
英文:

How to add the same value to all cells within a dataframe column (pandas in Python)

问题

I have an EXCEL table that I want to transfer into a dataframe matching our project's standard with 22 different columns. The original EXCEL table, however, only has 13 columns, so I am trying to add the missing ones to the dataframe I have read from the file.

However, this has caused several challenges:

  1. When assigning an empty list [] to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows.

  2. When assigning np.nan to the dataframe, creating the joint dataframe with all required columns works perfectly:

f_unique.loc[:, "additional_info"] = np.nan

But having np.nan in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.

So I have tried to replace np.nan by a list containing the string "n/a":

grouped_df = grouped_df.replace(np.nan, ["n/a"])

However, this gives me the following error:

TypeError: Invalid "to_replace" type: 'float'

Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly? That would most likely solve the issue.

英文:

I have an EXCEL table that I want to transfer into a dataframe matching our project's standard with 22 different columns. The original EXCEL table, however, only has 13 columns, so I am trying to add the missing ones to the dataframe I have read from the file.

However, this has caused several challenges:

  1. When assigning an empty list [] to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows.

  2. When assigning np.nan to the dataframe, creating the joint dataframe with all required columns works perfectly:

f_unique.loc[:, "additional_info"] = np.nan

But having np.nan in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.

So I have tried to replace np.nan by a list containing the string "n/a":

grouped_df = grouped_df.replace(np.nan, ["n/a"])

However, this gives me the following error:

TypeError: Invalid "to_replace" type: 'float'

Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly?
That would most likely solve the issue.

答案1

得分: 1

使用 DataFrame.reindexIndex.union,对于新列使用以下填充列表:

df = pd.DataFrame({'a':range(3)})

new_cols = ['additional_info','new']

df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
   a additional_info new
0  0              []  []
1  1              []  []
2  2              []  []
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=['n/a'])
print (df)
   a additional_info    new
0  0           ['n/a']  ['n/a']
1  1           ['n/a']  ['n/a']
2  2           ['n/a']  ['n/a']
英文:

Use DataFrame.reindex with Index.union and for new columns use for filling list:

df = pd.DataFrame({'a':range(3)})

new_cols = ['additional_info','new']

df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
   a additional_info new
0  0              []  []
1  1              []  []
2  2              []  []

df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=["n/a"])
print (df)
   a additional_info    new
0  0           [n/a]  [n/a]
1  1           [n/a]  [n/a]
2  2           [n/a]  [n/a]

答案2

得分: 1

reindex:

out = df.reindex(columns=list_of_cols)

如果你想要一个列表作为默认值(实际上你应该尽量避免这样做):

out = df.reindex(columns=list_of_cols, fill_value=['n/a'])
英文:

Just reindex:

out = df.reindex(columns=list_of_cols)

If you want a list as default value (which you should really avoid):

out = df.reindex(columns=list_of_cols, fill_value=['n/a'])

答案3

得分: 1

在一天结束时,重新索引(参见答案)与多次将相同列表分配给数据框列相结合对我的用例效果最好:

df2 = f_unique.reindex(columns=column_names, fill_value="n/a")

# 填充一些空列数据

df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsausübung"] * 31414

多次添加相同的列表肯定不是最优雅的解决方案,但它起到了作用。

英文:

At the end of the day, reindexing (see answers) above in combination with assigning the same list multiple times to a dataframe column worked best for my use case:

df2 = f_unique.reindex(columns=column_names, fill_value="n/a")

# populate some of the empty columns with data

df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsausübung"] * 31414

Adding a list multiple times is certainly not the most elegant solution, but it did the trick.

huangapple
  • 本文由 发表于 2023年6月26日 16:34:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554931.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定