2023年6月26日 16:34:20go评论97阅读模式

英文:

How to add the same value to all cells within a dataframe column (pandas in Python)

问题

I have an EXCEL table that I want to transfer into a dataframe matching our project's standard with 22 different columns. The original EXCEL table, however, only has 13 columns, so I am trying to add the missing ones to the dataframe I have read from the file.

However, this has caused several challenges:

When assigning an empty list [] to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows.
When assigning np.nan to the dataframe, creating the joint dataframe with all required columns works perfectly:

f_unique.loc[:, "additional_info"] = np.nan

But having np.nan in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.

So I have tried to replace np.nan by a list containing the string "n/a":

grouped_df = grouped_df.replace(np.nan, ["n/a"])

However, this gives me the following error:

TypeError: Invalid "to_replace" type: 'float'

Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly? That would most likely solve the issue.

英文:

However, this has caused several challenges:

When assigning an empty list [] to the dataframe, I get the notification that the size of the added columns does not match the original dataframe, which has circa 9000 rows.
When assigning np.nan to the dataframe, creating the joint dataframe with all required columns works perfectly:

f_unique.loc[:, "additional_info"] = np.nan

But having np.nan in my data causes issues later in my script when I flatten the cell data as all other cells contain lists.

So I have tried to replace np.nan by a list containing the string "n/a":

grouped_df = grouped_df.replace(np.nan, ["n/a"])

However, this gives me the following error:

TypeError: Invalid "to_replace" type: 'float'

Is there a way in which I can assign 9000 x ["n/a"] to each new column in my dataframe directly?
That would most likely solve the issue.

答案1

得分: 1

使用 DataFrame.reindex 与 Index.union，对于新列使用以下填充列表：

df = pd.DataFrame({'a':range(3)})
new_cols = ['additional_info','new']
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
   a additional_info new
0  0              []  []
1  1              []  []
2  2              []  []

df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=['n/a'])
print (df)
   a additional_info    new
0  0           ['n/a']  ['n/a']
1  1           ['n/a']  ['n/a']
2  2           ['n/a']  ['n/a']

英文:

Use DataFrame.reindex with Index.union and for new columns use for filling list:

df = pd.DataFrame({&#39;a&#39;:range(3)})
new_cols = [&#39;additional_info&#39;,&#39;new&#39;]
df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[])
print (df)
   a additional_info new
0  0              []  []
1  1              []  []
2  2              []  []

df = df.reindex(df.columns.union(new_cols, sort=False), axis=1, fill_value=[&quot;n/a&quot;])
print (df)
   a additional_info    new
0  0           [n/a]  [n/a]
1  1           [n/a]  [n/a]
2  2           [n/a]  [n/a]

答案2

得分: 1

只 reindex:

out = df.reindex(columns=list_of_cols)

如果你想要一个列表作为默认值（实际上你应该尽量避免这样做）:

out = df.reindex(columns=list_of_cols, fill_value=['n/a'])

英文:

Just reindex:

out = df.reindex(columns=list_of_cols)

If you want a list as default value (which you should really avoid):

out = df.reindex(columns=list_of_cols, fill_value=[&#39;n/a&#39;])

答案3

得分: 1

在一天结束时，重新索引（参见答案）与多次将相同列表分配给数据框列相结合对我的用例效果最好：

df2 = f_unique.reindex(columns=column_names, fill_value="n/a")
# 填充一些空列数据
df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsaus&#252;bung"] * 31414

多次添加相同的列表肯定不是最优雅的解决方案，但它起到了作用。

英文:

At the end of the day, reindexing (see answers) above in combination with assigning the same list multiple times to a dataframe column worked best for my use case:

df2 = f_unique.reindex(columns=column_names, fill_value=&quot;n/a&quot;)
# populate some of the empty columns with data
df2.loc[:, &quot;event_end&quot;] = df2[&quot;event_start&quot;]
df2.loc[:, &quot;event_type&quot;] = [&quot;Funktionsaus&#252;bung&quot;] * 31414

Adding a list multiple times is certainly not the most elegant solution, but it did the trick.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将相同的值添加到DataFrame列中的所有单元格（Python中的pandas）

问题

答案1

答案2

答案3

Pandas Vlookup True

如何仅删除连续重复的字符串，前提是这些字符串位于”((VERB)”和”)”之间？

Combining Dumper class with string representer to get exact required YAML output

Alternatives to glPushMatrix() & co. in Pyglet 2.0?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论