不能将具有多列的DataFrame设置为单列total_servings。

huangapple go评论65阅读模式
英文:

Cannot set a DataFrame with multiple columns to the single column total_servings

问题

我是一个初学者,正在熟悉pandas。
当我尝试以这种方式创建新列时,出现错误:

drinks['total_servings'] = drinks.loc[:, 'beer_servings':'wine_servings'].apply(calculate, axis=1)

以下是我的代码,第9行出现以下错误:

"无法将具有多个列的DataFrame设置为单列total_servings"

任何帮助或建议将不胜感激 不能将具有多列的DataFrame设置为单列total_servings。

import pandas as pd
drinks = pd.read_csv('drinks.csv')

def calculate(drinks):
    return drinks['beer_servings'] + drinks['spirit_servings'] + drinks['wine_servings']

print(drinks)
drinks['total_servings'] = drinks.loc[:, 'beer_servings':'wine_servings'].apply(calculate, axis=1)

drinks['beer_sales'] = drinks['beer_servings'].apply(lambda x: x * 2)
drinks['spirit_sales'] = drinks['spirit_servings'].apply(lambda x: x * 4)
drinks['wine_sales'] = drinks['wine_servings'].apply(lambda x: x * 6)
drinks
英文:

I am a beginner and getting familiar with pandas .
It is throwing an error , When I was trying to create a new column this way :

drinks['total_servings'] = drinks.loc[: ,'beer_servings':'wine_servings'].apply(calculate,axis=1)

Below is my code, and I get the following error for line number 9:

"Cannot set a DataFrame with multiple columns to the single column total_servings"

Any help or suggestion would be appreciated 不能将具有多列的DataFrame设置为单列total_servings。

import pandas as pd
drinks = pd.read_csv('drinks.csv')

def calculate(drinks):
    return drinks['beer_servings']+drinks['spirit_servings']+drinks['wine_servings']
print(drinks)
drinks['total_servings'] = drinks.loc[:, 'beer_servings':'wine_servings'].apply(calculate,axis=1)

drinks['beer_sales'] = drinks['beer_servings'].apply(lambda x: x*2)
drinks['spirit_sales'] = drinks['spirit_servings'].apply(lambda x: x*4)
drinks['wine_sales'] = drinks['wine_servings'].apply(lambda x: x*6)
drinks

答案1

得分: 3

在你的代码中,当调用calculate函数并传递axis=1时,它将每一行的DataFrame作为参数传递。在这里,calculate函数返回一个具有多列的DataFrame,但你试图将其赋值给单独的一列,这是不可能的。你可以尝试更新你的代码如下:

def calculate(each_row):
    return each_row['beer_servings'] + each_row['spirit_servings'] + each_row['wine_servings']

drinks['total_servings'] = drinks.apply(calculate, axis=1)
drinks['beer_sales'] = drinks['beer_servings'].apply(lambda x: x*2)
drinks['spirit_sales'] = drinks['spirit_servings'].apply(lambda x: x*4)
drinks['wine_sales'] = drinks['wine_servings'].apply(lambda x: x*6)

print(drinks)
英文:

In your code, when functioncalculate is called with axis=1, it passes each row of the Dataframe as an argument. Here, the function calculate is returning dataframe with multiple columns but you are trying to assigned to a single column, which is not possible. You can try updating your code to this,

def calculate(each_row):
    return each_row['beer_servings'] + each_row['spirit_servings'] + each_row['wine_servings']

drinks['total_servings'] = drinks.apply(calculate, axis=1)
drinks['beer_sales'] = drinks['beer_servings'].apply(lambda x: x*2)
drinks['spirit_sales'] = drinks['spirit_servings'].apply(lambda x: x*4)
drinks['wine_sales'] = drinks['wine_servings'].apply(lambda x: x*6)

print(drinks)   

答案2

得分: 0

我认为问题出在calculate方法内的错误参数名称。给定的参数是drink,但用于计算列和的是drinks

原因是drink是一个表示行的Series对象,其元素的总和是标量。而drinks是一个DataFrame,其列的总和将是一个Series对象。

示例代码显示了这个方法的工作方式。

import pandas as pd

df = pd.DataFrame({
    "A":[1,1,1,1,1], 
    "B":[2,2,2,2,2], 
    "C":[3,3,3,3,3]
})

def calculate(to_calc_df):
    return to_calc_df["A"] + to_calc_df["B"] +  to_calc_df["C"]
    
df["total"] = df.loc[:, "A":"C"].apply(calculate, axis=1)

print(df)

结果

   A  B  C  total
0  1  2  3      6
1  1  2  3      6
2  1  2  3      6
3  1  2  3      6
4  1  2  3      6
英文:

I suppose the reason is the wrong argument name inside calculate method. The given argument is drink but drinks used to calculate sum of columns.

The reason is drink is Series object that represents Row and sum of its elements is scalar. Meanwhile drinks is a DataFrame and sum of its columns will be a Series object

Sample code shows that this method works.

import pandas as pd

df = pd.DataFrame({
    "A":[1,1,1,1,1], 
    "B":[2,2,2,2,2], 
    "C":[3,3,3,3,3]
})

def calculate(to_calc_df):
    return to_calc_df["A"] + to_calc_df["B"] +  to_calc_df["C"]
    
df["total"] = df.loc[:, "A":"C"].apply(calculate, axis=1)

print(df)

Result

   A  B  C  total
0  1  2  3      6
1  1  2  3      6
2  1  2  3      6
3  1  2  3      6
4  1  2  3      6

huangapple
  • 本文由 发表于 2023年2月19日 22:14:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75500734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定