英文:
Vectorize addition of multiple pandas columns
问题
Sure, here's the translated code portion you requested:
我有一个示例的数据框:
```python
val1 val2 val3 val4 total
0 1 2 3 4 50
1 5 6 4 8 65
2 2 3 6 5 74
我想要基于列的简单加法和除法创建一些新的列。每次都会得到比例/比率,所以我设置了一个快速的函数来完成这个任务:
def vectorize(df, value_cols, total_col):
return df[value_cols] / df[total_col]
现在,我想要获取val1
相对于total
的比例:
total = 'total'
values = 'val1'
df['result'] = vectorize(df, values, total)
val1 val2 val3 val4 total result
0 1 2 3 4 50 0.020000
1 5 6 4 8 65 0.076923
2 2 3 6 5 74 0.027027
这看起来很不错。
但是当我想要在除以total
之前添加多个列时,遇到了问题。
例如,我尝试找到val1 + val2
相对于total
的比例:
total = 'total'
values = ['val1', 'val2']
df['result2'] = vectorize(df, values, total)
ValueError: Wrong number of items passed 5, placement implies 1
这不起作用,因为现在我的列名在一个列表中。它返回了一个包含NaN
的数据框。
是否有简单的方法解决这个问题?
我尝试在函数中添加sum()
,但我得不到正确的结果:
return df[value_cols].sum() / df[total_col]
我尝试使用enumerate
在除以total
之前将列相加:
for i, col in enumerate(value_cols):
sums = df[col] += df[col]
return sums / df[total_col]
但这些方法都不起作用。
我基本上想要在函数中自动化执行以下操作:
df['val1'] + df['val2'] / df['total']
但也要允许单列,即只有val1
。我的实际数据框有数百列,我想要像这样一次性添加一个或多个列。我可以手动输入所有内容,但我希望通过创建一个向量化函数来加快速度。
我用于复制的数据框:
import pandas as pd
pd.DataFrame({'val1': pd.Series([1, 5, 2], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val2': pd.Series([2, 6, 3], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val3': pd.Series([3, 4, 6], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val4': pd.Series([4, 8, 5], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'total': pd.Series([50, 65, 74], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1))}, index=pd.RangeIndex(start=0, stop=3, step=1))
这是您要求的翻译的部分。如果需要更多帮助,请告诉我。
<details>
<summary>英文:</summary>
I have a sample dataframe:
val1 val2 val3 val4 total
0 1 2 3 4 50
1 5 6 4 8 65
2 2 3 6 5 74
And I would like to create some new columns based on simple addition and division of columns. I will be getting a proportion/ratio each time so I set up a quick function to do this:
def vectorize(df,value_cols,total_col):
return df[value_cols] / df[total_col]
Now I would like to get the proportion of `val1` in relation to `total`:
total = 'total'
values = 'val1'
df['result'] = vectorize(df,values,total)
val1 val2 val3 val4 total result
0 1 2 3 4 50 0.020000
1 5 6 4 8 65 0.076923
2 2 3 6 5 74 0.027027
That looks great.
Now when I want to add multiple columns before dividing by `total`, I run into problems.
For example, I try to find the proportion of `val1 + val2` in relation to `total`:
total = 'total'
values = ['val1','val2']
df['result2'] = vectorize(df,values,total)
ValueError: Wrong number of items passed 5, placement implies 1
This doesn't work, because my column names are now in a list. It's returning a dataframe of `Nans`.
Is there an easy way around this?
I tried adding `sum()` to the function, but I'm not getting the correct results:
return df[value_cols].sum() / df[total_col]
I tried using `enumerate` to add columns together before dividing by `total`:
for i,col in enumerate(value_cols):
sums = df[col] += df[col]
return sums / df[total_col]
But none of this is working.
I am basically trying to automate this in the function:
df['val1'] + df['val2'] / df['total']
But also allow for single columns i.e. just `val1` alone.
My real world dataframe has hundreds of columns and I would like to to add one or multiple columns together like this. I could type it all out by hand, but I wanted to try speed it up by creating a vectorizing function.
My df for reproducability:
import pandas as pd
pd.DataFrame({'val1': pd.Series([1, 5, 2],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val2': pd.Series([2, 6, 3],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val3': pd.Series([3, 4, 6],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val4': pd.Series([4, 8, 5],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'total': pd.Series([50, 65, 74],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1))}, index=pd.RangeIndex(start=0, stop=3, step=1))
</details>
# 答案1
**得分**: 1
以下是翻译好的部分:
```python
这是一种方法:
def vectorize(df, value_cols, total_col):
# 对于多列
if isinstance(value_cols, list):
return df[value_cols].sum(axis=1) / df[total_col]
# 对于单列
return df[value_cols] / df[total_col]
英文:
Here's a way to do:
def vectorize(df,value_cols,total_col):
# for multiple columns
if isinstance(value_cols, list):
return df[value_cols].sum(axis=1) / df[total_col]
# for single column
return df[value_cols] / df[total_col]
答案2
得分: 1
这是您要翻译的代码部分:
def vectorize(df, value_cols, total_col):
if(isinstance(value_cols, list)):
return df[value_cols].apply(sum, axis=1).div(df[total_col])
else:
return df[value_cols].div(df[total_col])
输出部分:
val1 val2 val3 val4 total result result2
0 1 2 3 4 50 0.020000 0.060000
1 5 6 4 8 65 0.076923 0.169231
2 2 3 6 5 74 0.027027 0.067568
请注意,我只提供代码和输出的翻译,不会回答其他问题。
英文:
Just tweak your function:
def vectorize(df,value_cols,total_col):
if(isinstance(value_cols, list)):
return df[value_cols].apply(sum, axis=1).div(df[total_col])
else:
return df[value_cols].div(df[total_col])
Output:
val1 val2 val3 val4 total result result2
0 1 2 3 4 50 0.020000 0.060000
1 5 6 4 8 65 0.076923 0.169231
2 2 3 6 5 74 0.027027 0.067568
答案3
得分: 0
使用.sum()
是正确的做法。但是你需要指定你想要添加的轴。默认情况下,它会添加行,而不是列。这是你需要的:
return df[value_cols].sum(axis=1) / df[total_col]
英文:
Using .sum()
was the right thing to do. But you need to specify the axis along which you want to add. By default it adds the rows, not the columns. This is what you need:
return df[value_cols].sum(axis=1) / df[total_col]
答案4
得分: 0
由于您需要执行许多除法操作,并且可能需要在求和列之前执行加法运算,因此我会稍微修改函数。将一个 `DataFrame`、您的分母列以及一个列表传递给它,该列表指定了每个子列表基于哪些列进行加法运算。
```python
def sum_then_divide(df, total_col, numer_col_list):
"""
df : pd.DataFrame
total_col : str,分母列
numer_col_list: 列表的列表
在除法之前对每个子列表中的所有列进行求和
"""
u = pd.concat([df[cols].sum(1).rename('+'.join(cols)) for cols in numer_col_list], axis=1)
return u.divide(df[total_col], axis=0)
sum_then_divide(df, 'total', [['val1'], ['val1', 'val2'], ['val1', 'val3', 'val4']])
# val1 val1+val2 val1+val3+val4
#0 0.020000 0.060000 0.160000
#1 0.076923 0.169231 0.261538
#2 0.027027 0.067568 0.175676
英文:
Since you need to perform many divisions, with the possibility of summing columns prior, I would slightly modify the function. Pass it a DataFrame
, your denominator column, and then a list of lists that specifies which columns to add based on each sublist.
def sum_then_divide(df, total_col, numer_col_list):
"""
df : pd.DataFrame
total_col : str, denominator
numer_col_list: list of lists
Sum all columns in each sublist before dividing
"""
u = pd.concat([df[cols].sum(1).rename('+'.join(cols)) for cols in numer_col_list], axis=1)
return u.divide(df[total_col], axis=0)
sum_then_divide(df, 'total', [['val1'], ['val1', 'val2'], ['val1', 'val3', 'val4']])
# val1 val1+val2 val1+val3+val4
#0 0.020000 0.060000 0.160000
#1 0.076923 0.169231 0.261538
#2 0.027027 0.067568 0.175676
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论