2023年6月12日 16:20:26go评论68阅读模式

英文:

Conditional math operations with columns in a pandas dataframe

问题

我有一个包含不同值的数据框，如下所示：

Especies  Especies_0  Especies_1  Especies_2  Especies_3
2.20        3.44        1.90        1.24        0.00
2.20        3.04        2.55        0.00        0.00
1.88        2.19        0.00        0.00        0.00
2.20        3.44        2.28        2.55        0.00
3.44        2.20        0.00        0.00        0.00
2.20        2.58        0.00        0.00        0.00
1.88        2.19        0.00        0.00        0.00
3.44        1.91        3.04        1.83        3.98
3.44        2.20        0.00        0.00        0.00
2.20        2.55        1.90        0.00        0.00
1.88        2.20        0.00        0.00        0.00

我想执行的操作是：

avg(abs(max - col) for col in cols)

其中max是每行列的最大值（例如，对于第一行，max将是3.44，cols是列中其余的值），abs是绝对值函数，avg表示取平均值。

例如，对于第一行，操作将是：((3.44-2.20)+(3.44-1.90)+(3.44-1.24))/3 = 1.66

对于第5行，具有值(3.44, 2.20, 0.00, 0.00, 0.00)，结果将是：(3.44 - 2.20) / 1 = 1.24

这很简单，但有一个问题，我不想考虑最大值所在的列，或者包含0.0的列（请注意，最大值列会更改，0.0的列数量也会更改）。

我已经成功处理了单个标量值，甚至编写了一个执行此操作的函数：

def ele_diff(esp0, esp1, esp2, esp3, esp4):
    species = sorted([esp0, esp1, esp2, esp3, esp4])
    diff = [species[-1] - spec for spec in species if spec != 0.0 and spec != species[-1]]
    return (sum(diff) / len(diff))

但我无法将我的函数应用于数据框。我尝试过df.apply()和df.applymap()，但它们似乎不适用于我编写的函数（applymap只考虑1个输入和1个输出，而apply不会将函数分别应用于每一行，因此函数会返回ValueError，因为系列的真值是模糊的）。

我还尝试直接使用数据框来完成，但由于它具有复杂的逻辑，我无法找到解决方案。

我面临的主要问题似乎是检查要减去的值既不是0.0也不是最大值。

英文:

I have a bunch of columns in my dataframe with different values, as seen in this sample:

Especies  Especies_0  Especies_1  Especies_2  Especies_3
2.20        3.44        1.90        1.24        0.00
2.20        3.04        2.55        0.00        0.00
1.88        2.19        0.00        0.00        0.00
2.20        3.44        2.28        2.55        0.00
3.44        2.20        0.00        0.00        0.00
2.20        2.58        0.00        0.00        0.00
1.88        2.19        0.00        0.00        0.00
3.44        1.91        3.04        1.83        3.98
3.44        2.20        0.00        0.00        0.00
2.20        2.55        1.90        0.00        0.00
1.88        2.20        0.00        0.00        0.00

The operation i want to perform is:

avg(abs(max - col) for col in cols)

where max is the maximum value of the columns in each row (for example, for the first row, max would be 3.44 and cols is the rest of the values in the columns), abs is the absolute function and avg means taking the average.

For example, for the first row, the operation would be: ((3.44-2.20)+(3.44-1.90)+(3.44-1.24))/3 = 1.66

and for the 5th row, with values (3.44, 2.20, 0.00, 0.00, 0.00) the result would be: (3.44 -2.20) /1 = 1.24

This is simple enough, but there's a catch, I don't want to consider the column of the max value, or any columns with 0.0 in them (take into account that the max value column changes, it's not always the same as do the number of columns with 0.0 in them).

I have managed to do it with single, scalar values, I even did a function that does that

def ele_diff(esp0, esp1, esp2, esp3, esp4):
    species = sorted([esp0, esp1, esp2, esp3, esp4])
    diff = [species[-1] - spec for spec in species if spec != 0.0 and spec !=species[-1]]
    return (sum(diff)/len(diff))

But I'm not able to apply my function to the dataframe. I've tried df.apply() and df.applymap(), but they don't seem to work with the function I've made (applymap considers only 1 input and 1 output, while apply does not feed the function with each row separatedly, so the function returns ValueError because the truth value of a series is ambiguous).

I've also tried to do it directly with the dataframe, but as it's got complex logic, I haven't been able to come with a solution.

The main problem I've faced seems to be with checking that the values I'm going to substract are not 0.0 or the maximum.

答案1

得分: 0

以下是您提供的代码部分的翻译：

首先，通过在DataFrame.filter中以子字符串Especies开头来仅筛选必要的列，将最大值和0值设置为缺失值，然后减去最大值，转换为绝对值，最后取平均值：

df1 = df.filter(regex='^Especies')

#或者如果需要，可以在列名列表中指定列名
#cols = ['Especies', 'Especies_0', 'Especies_1', 'Especies_2', 'Especies_3']
#df1 = df[cols]

max1 = df1.max(axis=1)
mask = df1.ne(max1, axis=0) & df1.ne(0)
df['new'] = df1.where(mask).sub(max1, axis=0).abs().mean(axis=1)
print(df)

详细信息：

print(df1.where(mask))

print(df1.where(mask).sub(max1, axis=0))

如果您需要进一步的解释或帮助，请随时告诉我。

英文:

First filter only necessary columns by starts substring Especies in DataFrame.filter, set missing values for maximal and 0 values, subtract maximal, convert to absolute values and last get average:

df1 = df.filter(regex=&#39;^Especies&#39;)

#or if necessary specify columns names in list
#cols = [&#39;Especies&#39;, &#39;Especies_0&#39;, &#39;Especies_1&#39;, &#39;Especies_2&#39;, &#39;Especies_3&#39;]
#df1 = df[cols]

max1 = df1.max(axis=1)
mask = df1.ne(max1, axis=0) &amp; df1.ne(0)
df[&#39;new&#39;] = df1.where(mask).sub(max1, axis=0).abs().mean(axis=1)
print (df)
    Especies  Especies_0  Especies_1  Especies_2  Especies_3       new
0       2.20        3.44        1.90        1.24        0.00  1.660000
1       2.20        3.04        2.55        0.00        0.00  0.665000
2       1.88        2.19        0.00        0.00        0.00  0.310000
3       2.20        3.44        2.28        2.55        0.00  1.096667
4       3.44        2.20        0.00        0.00        0.00  1.240000
5       2.20        2.58        0.00        0.00        0.00  0.380000
6       1.88        2.19        0.00        0.00        0.00  0.310000
7       3.44        1.91        3.04        1.83        3.98  1.425000
8       3.44        2.20        0.00        0.00        0.00  1.240000
9       2.20        2.55        1.90        0.00        0.00  0.500000
10      1.88        2.20        0.00        0.00        0.00  0.320000

Details:

print (df1.where(mask))
    Especies  Especies_0  Especies_1  Especies_2  Especies_3
0       2.20         NaN        1.90        1.24         NaN
1       2.20         NaN        2.55         NaN         NaN
2       1.88         NaN         NaN         NaN         NaN
3       2.20         NaN        2.28        2.55         NaN
4        NaN        2.20         NaN         NaN         NaN
5       2.20         NaN         NaN         NaN         NaN
6       1.88         NaN         NaN         NaN         NaN
7       3.44        1.91        3.04        1.83         NaN
8        NaN        2.20         NaN         NaN         NaN
9       2.20         NaN        1.90         NaN         NaN
10      1.88         NaN         NaN         NaN         NaN

print (df1.where(mask).sub(max1, axis=0))
    Especies  Especies_0  Especies_1  Especies_2  Especies_3
0      -1.24         NaN       -1.54       -2.20         NaN
1      -0.84         NaN       -0.49         NaN         NaN
2      -0.31         NaN         NaN         NaN         NaN
3      -1.24         NaN       -1.16       -0.89         NaN
4        NaN       -1.24         NaN         NaN         NaN
5      -0.38         NaN         NaN         NaN         NaN
6      -0.31         NaN         NaN         NaN         NaN
7      -0.54       -2.07       -0.94       -2.15         NaN
8        NaN       -1.24         NaN         NaN         NaN
9      -0.35         NaN       -0.65         NaN         NaN
10     -0.32         NaN         NaN         NaN         NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

条件数学运算与pandas数据框中的列

问题

答案1

提取第二个句点之后的所有内容。

如何在Python中使用非阻塞函数从用户获取输入？

I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets

Struggling with a Type Error on my coding for a basic calculation.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论