在一个 pandas 数据框中添加多行到新创建的列中

huangapple go评论72阅读模式
英文:

Adding multiple rows to newly created columns in a pandas dataframe

问题

我正在使用pandas来存储机器学习模型的结果,我有一个存储输入数据的数据帧。我想扩展该数据帧,以包含模型返回的两个输出,但我不知道该如何做。

我尝试过像这样做:

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})
df[['col3', 'col4']] = [[1,2,3,4,5],[1,2,3,4,5]]

但它会抛出错误:

Exception has occurred: ValueError
Columns must be same length as key
  File "D:\InSilicoOP-FUAM\In Silico OP\src\pruebas.py", line 20, in <module>
    df[['col3', 'col4']] = [[1,2,3,4,5],[1,2,3,4,5]]
ValueError: Columns must be same length as key

我还尝试过以下两种方式:

df['col3', 'col4'] = [1,2,3,4,5],[1,2,3,4,5]

df['col3', 'col4'] = [[1,2,3,4,5],[1,2,3,4,5]]

这两种方式都会引发ValueError: Length of values (2) does not match length of index (5)错误。

我知道我可以分别分配每一列,像这样:

df['col3'] = [1,2,3,4,5]

但那样我就不得不将模型的结果分开(这本身就是一个大问题...)

是否有一种方法可以同时分配多个列?

英文:

I'm using pandas to store the results of a machine learning model, and I have a dataframe that stores the input data. I want to extend that dataframe with the two outputs that the model returns, but I don't know how to do it.

I've tried doing somethin like this:

import pandas
df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})
df[[&#39;col3&#39;, &#39;col4&#39;]] = [[1,2,3,4,5],[1,2,3,4,5]]

But it throws an error

Exception has occurred: ValueError
Columns must be same length as key
  File &quot;D:\InSilicoOP-FUAM\In Silico OP\src\pruebas.py&quot;, line 20, in &lt;module&gt;
    df[[&#39;col3&#39;, &#39;col4&#39;]] = [[1,2,3,4,5],[1,2,3,4,5]]
ValueError: Columns must be same length as key

I've also tried with
df[&#39;col3&#39;, &#39;col4&#39;] = [1,2,3,4,5],[1,2,3,4,5] and df[&#39;col3&#39;, &#39;col4&#39;] = [[1,2,3,4,5],[1,2,3,4,5]] and those throw ValueError: Length of values (2) does not match length of index (5)

I know I can assign each column separatedly, like so

df[&#39;col3&#39;] = [1,2,3,4,5]

But then I'd have to separate the results from the model (which is a big problem on it's own...)

Is there a way to assign multiple

答案1

得分: 0

假设您的结果存储在一个列表的列表中,您可以将其转换为pandas DataFrame,然后与原始数据进行join操作:

results = [[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]

df.join(pd.DataFrame(results, index=["col_3", "col_4"]).T)

   col1  col2  col_3  col_4
0     1     1      1      1
1     2     2      2      2
2     3     3      3      3
3     4     4      4      4
4     5     5      5      5
英文:

Assuming your results are stored in a list of lists, you could convert that to pandas DataFrame and join to the original:

results = [[1,2,3,4,5],[1,2,3,4,5]]

&gt;&gt;&gt; df.join(pd.DataFrame(results, index=[&quot;col_3&quot;,&quot;col_4&quot;]).T)

   col1  col2  col_3  col_4
0     1     1      1      1
1     2     2      2      2
2     3     3      3      3
3     4     4      4      4
4     5     5      5      5

答案2

得分: 0

在您的具体示例中,您可以使用 loc 属性。但确保将数组插入正确的形状:

df.loc[:,["col3", "col4"]] = [[1,1],[2,2],[3,3],[4,4],[5,5]]

或者,您可以使用 numpy 的转置来从您示例中的数组创建正确的形状:

df.loc[:,["col3", "col4"]] = np.transpose([[1,2,3,4,5],[1,2,3,4,5]])
英文:

In your specific example you could use the loc property. Make sure you insert the array in the right shape though:

df.loc[:,[&quot;col3&quot;, &quot;col4&quot;]] = [[1,1],[2,2],[3,3],[4,4],[5,5]]

Alternatively you can use numpy's transpose to create the right shape from the array that you had in your example.

df.loc[:,[&quot;col3&quot;, &quot;col4&quot;]] = np.transpose([[1,2,3,4,5],[1,2,3,4,5]])

答案3

得分: 0

你可以将col3和col4的输出写入另一个数据框,然后将它们连接起来。我不确定这是否仍然会将您的结果与模型分开?

df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})
df2 = pd.DataFrame({'col3':[1,2,3,4,5],'col4':[1,2,3,4,5]})
df3 = df.join(df2)
print(df3)
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
3     4     4     4     4
4     5     5     5     5
英文:

You could write the outputs of col3 and col4 to another dataframe and then join them. I'm not sure if this is still separating your results out from the model?

df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})
df2 = pd.DataFrame({&#39;col3&#39;:[1,2,3,4,5],&#39;col4&#39;:[1,2,3,4,5]})
df3 = df.join(df2)
print(df3)
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
3     4     4     4     4
4     5     5     5     5

答案4

得分: 0

你可以从结果列表中解压值并将它们分配给列 col3col4

import pandas as pd

df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[1,2,3,4,5]})

results = [[1,2,3,4,5],[1,2,3,4,5]]

df['col3'], df['col4'] = results
英文:

You can unpack the values from the results list and assign them to the columns col3 and col4:

import pandas as pd

df = pd.DataFrame({&#39;col1&#39;:[1,2,3,4,5], &#39;col2&#39;:[1,2,3,4,5]})

results = [[1,2,3,4,5],[1,2,3,4,5]]

df[&#39;col3&#39;], df[&#39;col4&#39;] = results

huangapple
  • 本文由 发表于 2023年7月12日 23:01:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定