2023年7月12日 23:01:47go评论104阅读模式

英文:

Adding multiple rows to newly created columns in a pandas dataframe

问题

我正在使用pandas来存储机器学习模型的结果，我有一个存储输入数据的数据帧。我想扩展该数据帧，以包含模型返回的两个输出，但我不知道该如何做。

我尝试过像这样做：

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})
df[['col3', 'col4']] = [[1,2,3,4,5],[1,2,3,4,5]]

但它会抛出错误：

Exception has occurred: ValueError
Columns must be same length as key
  File "D:\InSilicoOP-FUAM\In Silico OP\src\pruebas.py", line 20, in <module>
    df[['col3', 'col4']] = [[1,2,3,4,5],[1,2,3,4,5]]
ValueError: Columns must be same length as key

我还尝试过以下两种方式：

df['col3', 'col4'] = [1,2,3,4,5],[1,2,3,4,5]

和

df['col3', 'col4'] = [[1,2,3,4,5],[1,2,3,4,5]]

这两种方式都会引发ValueError: Length of values (2) does not match length of index (5)错误。

我知道我可以分别分配每一列，像这样：

df['col3'] = [1,2,3,4,5]

但那样我就不得不将模型的结果分开（这本身就是一个大问题...）

是否有一种方法可以同时分配多个列？

英文:

I'm using pandas to store the results of a machine learning model, and I have a dataframe that stores the input data. I want to extend that dataframe with the two outputs that the model returns, but I don't know how to do it.

I've tried doing somethin like this:

import pandas
df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})
df[[&#39;col3&#39;, &#39;col4&#39;]] = [[1,2,3,4,5],[1,2,3,4,5]]

But it throws an error

Exception has occurred: ValueError
Columns must be same length as key
  File &quot;D:\InSilicoOP-FUAM\In Silico OP\src\pruebas.py&quot;, line 20, in &lt;module&gt;
    df[[&#39;col3&#39;, &#39;col4&#39;]] = [[1,2,3,4,5],[1,2,3,4,5]]
ValueError: Columns must be same length as key

I've also tried with
df['col3', 'col4'] = [1,2,3,4,5],[1,2,3,4,5] and df['col3', 'col4'] = [[1,2,3,4,5],[1,2,3,4,5]] and those throw ValueError: Length of values (2) does not match length of index (5)

I know I can assign each column separatedly, like so

df[&#39;col3&#39;] = [1,2,3,4,5]

But then I'd have to separate the results from the model (which is a big problem on it's own...)

Is there a way to assign multiple

答案1

得分: 0

假设您的结果存储在一个列表的列表中，您可以将其转换为pandas DataFrame，然后与原始数据进行join操作：

results = [[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]
df.join(pd.DataFrame(results, index=["col_3", "col_4"]).T)
   col1  col2  col_3  col_4
0     1     1      1      1
1     2     2      2      2
2     3     3      3      3
3     4     4      4      4
4     5     5      5      5

英文:

Assuming your results are stored in a list of lists, you could convert that to pandas DataFrame and join to the original:

results = [[1,2,3,4,5],[1,2,3,4,5]]
&gt;&gt;&gt; df.join(pd.DataFrame(results, index=[&quot;col_3&quot;,&quot;col_4&quot;]).T)
   col1  col2  col_3  col_4
0     1     1      1      1
1     2     2      2      2
2     3     3      3      3
3     4     4      4      4
4     5     5      5      5

答案2

得分: 0

在您的具体示例中，您可以使用 loc 属性。但确保将数组插入正确的形状：

df.loc[:,["col3", "col4"]] = [[1,1],[2,2],[3,3],[4,4],[5,5]]

或者，您可以使用 numpy 的转置来从您示例中的数组创建正确的形状：

df.loc[:,["col3", "col4"]] = np.transpose([[1,2,3,4,5],[1,2,3,4,5]])

英文:

In your specific example you could use the loc property. Make sure you insert the array in the right shape though:

df.loc[:,[&quot;col3&quot;, &quot;col4&quot;]] = [[1,1],[2,2],[3,3],[4,4],[5,5]]

Alternatively you can use numpy's transpose to create the right shape from the array that you had in your example.

df.loc[:,[&quot;col3&quot;, &quot;col4&quot;]] = np.transpose([[1,2,3,4,5],[1,2,3,4,5]])

答案3

得分: 0

你可以将col3和col4的输出写入另一个数据框，然后将它们连接起来。我不确定这是否仍然会将您的结果与模型分开？

df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})
df2 = pd.DataFrame({'col3':[1,2,3,4,5],'col4':[1,2,3,4,5]})
df3 = df.join(df2)
print(df3)
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
3     4     4     4     4
4     5     5     5     5

英文:

You could write the outputs of col3 and col4 to another dataframe and then join them. I'm not sure if this is still separating your results out from the model?

df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})
df2 = pd.DataFrame({&#39;col3&#39;:[1,2,3,4,5],&#39;col4&#39;:[1,2,3,4,5]})
df3 = df.join(df2)
print(df3)
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
3     4     4     4     4
4     5     5     5     5

答案4

得分: 0

你可以从结果列表中解压值并将它们分配给列 col3 和 col4：

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})
results = [[1,2,3,4,5],[1,2,3,4,5]]
df['col3'], df['col4'] = results

英文:

You can unpack the values from the results list and assign them to the columns col3 and col4:

import pandas as pd
df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})
results = [[1,2,3,4,5],[1,2,3,4,5]]
df[&#39;col3&#39;], df[&#39;col4&#39;] = results

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个 pandas 数据框中添加多行到新创建的列中

问题

答案1

答案2

答案3

答案4

无法在NumPy中拆分数据框。

如何更改字典的键，以便以特定字符串开头的任何前缀都映射到相同的值？

无法使Tkinter正确显示。

Python排序意外行为

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论