2020年1月3日 23:31:24go评论100阅读模式

英文:

Vectorize addition of multiple pandas columns

问题

Sure, here's the translated code portion you requested:

我有一个示例的数据框：
```python
   val1  val2  val3  val4  total
0     1     2     3     4     50
1     5     6     4     8     65
2     2     3     6     5     74

我想要基于列的简单加法和除法创建一些新的列。每次都会得到比例/比率，所以我设置了一个快速的函数来完成这个任务：

def vectorize(df, value_cols, total_col):
    
    return df[value_cols] / df[total_col]

现在，我想要获取val1相对于total的比例：

total = 'total'
values = 'val1'
df['result'] = vectorize(df, values, total)
   val1  val2  val3  val4  total    result
0     1     2     3     4     50  0.020000
1     5     6     4     8     65  0.076923
2     2     3     6     5     74  0.027027

这看起来很不错。

但是当我想要在除以total之前添加多个列时，遇到了问题。

例如，我尝试找到val1 + val2相对于total的比例：

total = 'total'
values = ['val1', 'val2']
df['result2'] = vectorize(df, values, total)
ValueError: Wrong number of items passed 5, placement implies 1

这不起作用，因为现在我的列名在一个列表中。它返回了一个包含NaN的数据框。

是否有简单的方法解决这个问题？

我尝试在函数中添加sum()，但我得不到正确的结果：

return df[value_cols].sum() / df[total_col]

我尝试使用enumerate在除以total之前将列相加：

for i, col in enumerate(value_cols):
    
    sums = df[col] += df[col]
    return sums / df[total_col]

但这些方法都不起作用。

我基本上想要在函数中自动化执行以下操作：

df['val1'] + df['val2'] / df['total']

但也要允许单列，即只有val1。我的实际数据框有数百列，我想要像这样一次性添加一个或多个列。我可以手动输入所有内容，但我希望通过创建一个向量化函数来加快速度。

我用于复制的数据框：

import pandas as pd
pd.DataFrame({'val1': pd.Series([1, 5, 2], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val2': pd.Series([2, 6, 3], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val3': pd.Series([3, 4, 6], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'val4': pd.Series([4, 8, 5], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1)), 'total': pd.Series([50, 65, 74], dtype='int64', index=pd.RangeIndex(start=0, stop=3, step=1))}, index=pd.RangeIndex(start=0, stop=3, step=1))


这是您要求的翻译的部分。如果需要更多帮助，请告诉我。
<details>
<summary>英文:</summary>
I have a sample dataframe:

val1 val2 val3 val4 total
0 1 2 3 4 50
1 5 6 4 8 65
2 2 3 6 5 74


And I would like to create some new columns based on simple addition and division of columns. I will be getting a proportion/ratio each time so I set up a quick function to do this:

def vectorize(df,value_cols,total_col):

return df[value_cols] / df[total_col]


Now I would like to get the proportion of `val1` in relation to `total`:

total = 'total'
values = 'val1'

df['result'] = vectorize(df,values,total)

val1 val2 val3 val4 total result
0 1 2 3 4 50 0.020000
1 5 6 4 8 65 0.076923
2 2 3 6 5 74 0.027027

That looks great. 
Now when I want to add multiple columns before dividing by `total`, I run into problems. 
For example, I try to find the proportion of `val1 + val2` in relation to `total`:

total = 'total'
values = ['val1','val2']

df['result2'] = vectorize(df,values,total)

ValueError: Wrong number of items passed 5, placement implies 1

This doesn&#39;t work, because my column names are now in a list. It&#39;s returning a dataframe of `Nans`. 
Is there an easy way around this? 
I tried adding `sum()` to the function, but I&#39;m not getting the correct results:

return df[value_cols].sum() / df[total_col]

I tried using `enumerate` to add columns together before dividing by `total`:

for i,col in enumerate(value_cols):

sums = df[col] += df[col]
return sums / df[total_col]

But none of this is working. 
I am basically trying to automate this in the function:

df['val1'] + df['val2'] / df['total']

But also allow for single columns i.e. just `val1` alone. 
My real world dataframe has hundreds of columns and I would like to to add one or multiple columns together like this. I could type it all out by hand, but I wanted to try speed it up by creating a vectorizing function.
My df for reproducability:

import pandas as pd

pd.DataFrame({'val1': pd.Series([1, 5, 2],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val2': pd.Series([2, 6, 3],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val3': pd.Series([3, 4, 6],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'val4': pd.Series([4, 8, 5],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1)), 'total': pd.Series([50, 65, 74],dtype='int64',index=pd.RangeIndex(start=0, stop=3, step=1))}, index=pd.RangeIndex(start=0, stop=3, step=1))


</details>
# 答案1
**得分**: 1
以下是翻译好的部分：
```python
这是一种方法：
    def vectorize(df, value_cols, total_col):
        # 对于多列
        if isinstance(value_cols, list):
            return df[value_cols].sum(axis=1) / df[total_col]
        # 对于单列
        return df[value_cols] / df[total_col]

英文:

Here's a way to do:

def vectorize(df,value_cols,total_col):
    # for multiple columns
    if isinstance(value_cols, list):
        return df[value_cols].sum(axis=1) / df[total_col]
    # for single column
    return df[value_cols] / df[total_col]

答案2

得分: 1

这是您要翻译的代码部分：

def vectorize(df, value_cols, total_col):
    if(isinstance(value_cols, list)):
        return df[value_cols].apply(sum, axis=1).div(df[total_col])
    else:
        return df[value_cols].div(df[total_col])

输出部分：

   val1  val2  val3  val4  total    result   result2
0     1     2     3     4     50  0.020000  0.060000
1     5     6     4     8     65  0.076923  0.169231
2     2     3     6     5     74  0.027027  0.067568

请注意，我只提供代码和输出的翻译，不会回答其他问题。

英文:

Just tweak your function:

def vectorize(df,value_cols,total_col):
	if(isinstance(value_cols, list)):
		return df[value_cols].apply(sum, axis=1).div(df[total_col])
	else:
		return df[value_cols].div(df[total_col])

Output:

   val1  val2  val3  val4  total    result   result2
0     1     2     3     4     50  0.020000  0.060000
1     5     6     4     8     65  0.076923  0.169231
2     2     3     6     5     74  0.027027  0.067568

答案3

得分: 0

使用.sum()是正确的做法。但是你需要指定你想要添加的轴。默认情况下，它会添加行，而不是列。这是你需要的：

return df[value_cols].sum(axis=1) / df[total_col]

英文:

Using .sum() was the right thing to do. But you need to specify the axis along which you want to add. By default it adds the rows, not the columns. This is what you need:

return df[value_cols].sum(axis=1) / df[total_col]

答案4

得分: 0

由于您需要执行许多除法操作，并且可能需要在求和列之前执行加法运算，因此我会稍微修改函数。将一个 `DataFrame`、您的分母列以及一个列表传递给它，该列表指定了每个子列表基于哪些列进行加法运算。
```python
def sum_then_divide(df, total_col, numer_col_list):
    """
    df : pd.DataFrame
    total_col : str，分母列
    numer_col_list: 列表的列表
        在除法之前对每个子列表中的所有列进行求和
    """
    u = pd.concat([df[cols].sum(1).rename('+'.join(cols)) for cols in numer_col_list], axis=1)
    return u.divide(df[total_col], axis=0)
sum_then_divide(df, 'total', [['val1'], ['val1', 'val2'], ['val1', 'val3', 'val4']])
#       val1  val1+val2  val1+val3+val4
#0  0.020000   0.060000        0.160000
#1  0.076923   0.169231        0.261538
#2  0.027027   0.067568        0.175676

英文:

Since you need to perform many divisions, with the possibility of summing columns prior, I would slightly modify the function. Pass it a DataFrame, your denominator column, and then a list of lists that specifies which columns to add based on each sublist.

def sum_then_divide(df, total_col, numer_col_list):
    &quot;&quot;&quot;
    df : pd.DataFrame
    total_col : str, denominator
    numer_col_list: list of lists
        Sum all columns in each sublist before dividing
    &quot;&quot;&quot;
    u = pd.concat([df[cols].sum(1).rename(&#39;+&#39;.join(cols)) for cols in numer_col_list], axis=1)
    return u.divide(df[total_col], axis=0)
sum_then_divide(df, &#39;total&#39;, [[&#39;val1&#39;], [&#39;val1&#39;, &#39;val2&#39;], [&#39;val1&#39;, &#39;val3&#39;, &#39;val4&#39;]])
#       val1  val1+val2  val1+val3+val4
#0  0.020000   0.060000        0.160000
#1  0.076923   0.169231        0.261538
#2  0.027027   0.067568        0.175676

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Vectorize多个pandas列的加法

问题

答案2

答案3

答案4

如何向 Pandas 数据框按另一列分组添加具体数值的列？

在VS Code中创建一个正则表达式搜索。

当将函数用作默认参数时，为什么该函数总是被调用？

从干净和嘈杂的NumPy数组中提取正弦波的频率

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。