2023年3月7日 04:30:17go评论169阅读模式

英文:

calculating values in python using formula for a dataframe

问题

我有一个包含以下数据的Excel表格，这里我们需要根据其他列的值计算2列，但我得到了NaN值。

在Excel中，我使用以下公式：

f_organic = D2 = C2*(1+E1)
这意味着第x周的f_organic = 第x周的p_organic * (1 + (x-1)周的pct_diff)

而pct_diff是这样给出的：

pct_diff = D2/C2-1
pct_diff = 第2行的f_organic / 第2行的p_organic - 1

但当我尝试通过Python来实现时，我失败了，得到了空值。

我的代码如下：

# 填充f_organic和pct_diff列中的空值
for i in range(1, len(df)):
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    
# 打印最终的数据框
df

欢迎任何帮助或建议

英文:

I have an excel sheet with the data below, here we need to calculate 2 columns based on other column values but I am getting NaN values

import pandas as pd
# Define the initial data
data = {&#39;statMonthName&#39;: [&#39;Mar&#39;, &#39;Mar&#39;, &#39;Mar&#39;, &#39;Mar&#39;, &#39;Apr&#39;, &#39;Apr&#39;, &#39;Apr&#39;, &#39;Apr&#39;, &#39;Apr&#39;],
        &#39;statWeek&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9],
        &#39;p_organic&#39;: [3646049.56, 3867696.284, 4051128.056, 4095508.5, 2778538.164, 2789640.51, 2736064.373, 3105200.772, 3112694.166],
        &#39;f_organic&#39;: [2289567, None, None, None, None, None, None, None, None],
        &#39;pct_diff&#39;: [None, None, None, None, None, None, None, None, None]}
# Create a Pandas dataframe to store the data
df = pd.DataFrame(data)
# Print the dataframe
print(df)

in excel I am using the formula

f_organic = D2 =C2*(1+E1)
which means f_organic of x week  = p_organic of x week * (1+ pct_diff of (x-1) week)

and pct_diff is given as

pct_diff = =D2/C2-1
pct_diff = f_organic row2/ p_organic row 2 - 1

when I am trying to achieve this via python, I am failing and getting only null values

my code is below

# fill null values in f_organic and pct_diff columns
for i in range(1, len(df)):
    df.at[i, &#39;pct_diff&#39;] = df.at[i, &#39;f_organic&#39;] / df.at[i, &#39;p_organic&#39;] - 1
    df.at[i, &#39;f_organic&#39;] = df.at[i, &#39;p_organic&#39;] * (1 + df.at[i-1, &#39;pct_diff&#39;])
    
# print final dataframe
df

any help or suggestion is appreciated

答案1

得分: 1

你的DataFrame：

  statMonthName  statWeek    p_organic  f_organic pct_diff
0           Mar         1  3646049.560  2289567.0     None
1           Mar         2  3867696.284        NaN     None
2           Mar         3  4051128.056        NaN     None
3           Mar         4  4095508.500        NaN     None
4           Apr         5  2778538.164        NaN     None
5           Apr         6  2789640.510        NaN     None
6           Apr         7  2736064.373        NaN     None
7           Apr         8  3105200.772        NaN     None
8           Apr         9  3112694.166        NaN     None

你的公式（数组表示法）：

f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
pct_diff[i]  = (f_organic[i] / p_organic[i]) - 1

正确填充缺失值

从你的代码尝试填充缺失值来看，问题不在于公式，而在于for循环的构建方式。

因为它从索引i=1（第二行）开始，它会错过pct_diff[0]（第一行）的值。因此，所有计算都涉及到空值，结果只会是NaN。

解决方案是首先在循环之外添加一行，用于填充pct_diff[0]的值，然后重新排列循环中的代码以反映这一点：

# 填充第一行的缺失值
df.at[0, 'pct_diff'] = df.at[0, 'f_organic'] / df.at[0, 'p_organic'] - 1
# 填充剩余的f_organic和pct_diff值
for i in range(1, len(df)):
    # f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    
    # pct_diff[i] = (f_organic[i] / p_organic[i]) - 1
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1
print(df)

输出：

  statMonthName  statWeek    p_organic     f_organic  pct_diff
0           Mar         1  3646049.560  2.289567e+06 -0.372042
1           Mar         2  3867696.284  2.428752e+06 -0.372042
2           Mar         3  4051128.056  2.543939e+06 -0.372042
3           Mar         4  4095508.500  2.571808e+06 -0.372042
4           Apr         5  2778538.164  1.744806e+06 -0.372042
5           Apr         6  2789640.510  1.751778e+06 -0.372042
6           Apr         7  2736064.373  1.718134e+06 -0.372042
7           Apr         8  3105200.772  1.949937e+06 -0.372042
8           Apr         9  3112694.166  1.954642e+06 -0.372042

英文:

Your DataFrame:

  statMonthName  statWeek    p_organic  f_organic pct_diff
0           Mar         1  3646049.560  2289567.0     None
1           Mar         2  3867696.284        NaN     None
2           Mar         3  4051128.056        NaN     None
3           Mar         4  4095508.500        NaN     None
4           Apr         5  2778538.164        NaN     None
5           Apr         6  2789640.510        NaN     None
6           Apr         7  2736064.373        NaN     None
7           Apr         8  3105200.772        NaN     None
8           Apr         9  3112694.166        NaN     None

Your formula (in array notation):

f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
pct_diff[i]  = (f_organic[i] / p_organic[i]) - 1

Correctly filling in the missing values

From your code that tries to fill in the missing values, the problem lies not in the formula, but in the way the for loop was constructed.

Since it starts with index i=1 (the second row), it misses the value of pct_diff[0] (the first row). So all the calculations will involve null values, resulting only in NaNs.

A solution would be to first add a line outside the loop that fills in pct_diff[0], and then reorder the code in the loop to reflect this:

# Fill in the missing value of pct_diff on the first row.
df.at[0, &#39;pct_diff&#39;] = df.at[0, &#39;f_organic&#39;] / df.at[0, &#39;p_organic&#39;] - 1
# Fill in the remaining values of f_organic and pct_diff.
for i in range(1, len(df)):
    # f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
    df.at[i, &#39;f_organic&#39;] = df.at[i, &#39;p_organic&#39;] * (1 + df.at[i-1, &#39;pct_diff&#39;])
    
    # pct_diff[i] = (f_organic[i] / p_organic[i]) - 1
    df.at[i, &#39;pct_diff&#39;] = df.at[i, &#39;f_organic&#39;] / df.at[i, &#39;p_organic&#39;] - 1
print(df)

Output:

  statMonthName  statWeek    p_organic     f_organic  pct_diff
0           Mar         1  3646049.560  2.289567e+06 -0.372042
1           Mar         2  3867696.284  2.428752e+06 -0.372042
2           Mar         3  4051128.056  2.543939e+06 -0.372042
3           Mar         4  4095508.500  2.571808e+06 -0.372042
4           Apr         5  2778538.164  1.744806e+06 -0.372042
5           Apr         6  2789640.510  1.751778e+06 -0.372042
6           Apr         7  2736064.373  1.718134e+06 -0.372042
7           Apr         8  3105200.772  1.949937e+06 -0.372042
8           Apr         9  3112694.166  1.954642e+06 -0.372042

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中使用DataFrame计算数值的公式。

问题

答案1

正确填充缺失值

Correctly filling in the missing values

ModuleNotFoundError: No module named 'XYZ', but when I prtint sys.path I can see the absolute path to folder I want to import

当连接两个数据框时，会添加一个额外的行。

如何使用Python中的Playwright查找包含特定单词的多个ID？

使用单词距离校正列内的拼写错误

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。