在Python中使用DataFrame计算数值的公式。

huangapple go评论147阅读模式
英文:

calculating values in python using formula for a dataframe

问题

我有一个包含以下数据的Excel表格,这里我们需要根据其他列的值计算2列,但我得到了NaN值。

在Excel中,我使用以下公式:

f_organic = D2 = C2*(1+E1)
这意味着第x周的f_organic = 第x周的p_organic * (1 + (x-1)周的pct_diff)

而pct_diff是这样给出的:

pct_diff = D2/C2-1
pct_diff = 第2行的f_organic / 第2行的p_organic - 1

但当我尝试通过Python来实现时,我失败了,得到了空值。

我的代码如下:

# 填充f_organic和pct_diff列中的空值
for i in range(1, len(df)):
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    
# 打印最终的数据框
df

欢迎任何帮助或建议 在Python中使用DataFrame计算数值的公式。

英文:

I have an excel sheet with the data below, here we need to calculate 2 columns based on other column values but I am getting NaN values

import pandas as pd

# Define the initial data
data = {'statMonthName': ['Mar', 'Mar', 'Mar', 'Mar', 'Apr', 'Apr', 'Apr', 'Apr', 'Apr'],
        'statWeek': [1, 2, 3, 4, 5, 6, 7, 8, 9],
        'p_organic': [3646049.56, 3867696.284, 4051128.056, 4095508.5, 2778538.164, 2789640.51, 2736064.373, 3105200.772, 3112694.166],
        'f_organic': [2289567, None, None, None, None, None, None, None, None],
        'pct_diff': [None, None, None, None, None, None, None, None, None]}

# Create a Pandas dataframe to store the data
df = pd.DataFrame(data)

# Print the dataframe
print(df)

in excel I am using the formula

f_organic = D2 =C2*(1+E1)
which means f_organic of x week  = p_organic of x week * (1+ pct_diff of (x-1) week)

and pct_diff is given as

pct_diff = =D2/C2-1
pct_diff = f_organic row2/ p_organic row 2 - 1

when I am trying to achieve this via python, I am failing and getting only null values

my code is below

# fill null values in f_organic and pct_diff columns

for i in range(1, len(df)):
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    

# print final dataframe
df

any help or suggestion is appreciated 在Python中使用DataFrame计算数值的公式。

答案1

得分: 1

你的DataFrame:

  statMonthName  statWeek    p_organic  f_organic pct_diff
0           Mar         1  3646049.560  2289567.0     None
1           Mar         2  3867696.284        NaN     None
2           Mar         3  4051128.056        NaN     None
3           Mar         4  4095508.500        NaN     None
4           Apr         5  2778538.164        NaN     None
5           Apr         6  2789640.510        NaN     None
6           Apr         7  2736064.373        NaN     None
7           Apr         8  3105200.772        NaN     None
8           Apr         9  3112694.166        NaN     None

你的公式(数组表示法):

f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])

pct_diff[i]  = (f_organic[i] / p_organic[i]) - 1

正确填充缺失值

从你的代码尝试填充缺失值来看,问题不在于公式,而在于for循环的构建方式。

因为它从索引i=1(第二行)开始,它会错过pct_diff[0](第一行)的值。因此,所有计算都涉及到空值,结果只会是NaN

解决方案是首先在循环之外添加一行,用于填充pct_diff[0]的值,然后重新排列循环中的代码以反映这一点:

# 填充第一行的缺失值
df.at[0, 'pct_diff'] = df.at[0, 'f_organic'] / df.at[0, 'p_organic'] - 1

# 填充剩余的f_organic和pct_diff值
for i in range(1, len(df)):

    # f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    
    # pct_diff[i] = (f_organic[i] / p_organic[i]) - 1
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1


print(df)

输出:

  statMonthName  statWeek    p_organic     f_organic  pct_diff
0           Mar         1  3646049.560  2.289567e+06 -0.372042
1           Mar         2  3867696.284  2.428752e+06 -0.372042
2           Mar         3  4051128.056  2.543939e+06 -0.372042
3           Mar         4  4095508.500  2.571808e+06 -0.372042
4           Apr         5  2778538.164  1.744806e+06 -0.372042
5           Apr         6  2789640.510  1.751778e+06 -0.372042
6           Apr         7  2736064.373  1.718134e+06 -0.372042
7           Apr         8  3105200.772  1.949937e+06 -0.372042
8           Apr         9  3112694.166  1.954642e+06 -0.372042
英文:

Your DataFrame:

  statMonthName  statWeek    p_organic  f_organic pct_diff
0           Mar         1  3646049.560  2289567.0     None
1           Mar         2  3867696.284        NaN     None
2           Mar         3  4051128.056        NaN     None
3           Mar         4  4095508.500        NaN     None
4           Apr         5  2778538.164        NaN     None
5           Apr         6  2789640.510        NaN     None
6           Apr         7  2736064.373        NaN     None
7           Apr         8  3105200.772        NaN     None
8           Apr         9  3112694.166        NaN     None

Your formula (in array notation):

f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])

pct_diff[i]  = (f_organic[i] / p_organic[i]) - 1

Correctly filling in the missing values

From your code that tries to fill in the missing values, the problem lies not in the formula, but in the way the for loop was constructed.

Since it starts with index i=1 (the second row), it misses the value of pct_diff[0] (the first row). So all the calculations will involve null values, resulting only in NaNs.

A solution would be to first add a line outside the loop that fills in pct_diff[0], and then reorder the code in the loop to reflect this:

# Fill in the missing value of pct_diff on the first row.
df.at[0, 'pct_diff'] = df.at[0, 'f_organic'] / df.at[0, 'p_organic'] - 1

# Fill in the remaining values of f_organic and pct_diff.
for i in range(1, len(df)):

    # f_organic[i] = p_organic[i] * (1 + pct_diff[i-1])
    df.at[i, 'f_organic'] = df.at[i, 'p_organic'] * (1 + df.at[i-1, 'pct_diff'])
    
    # pct_diff[i] = (f_organic[i] / p_organic[i]) - 1
    df.at[i, 'pct_diff'] = df.at[i, 'f_organic'] / df.at[i, 'p_organic'] - 1


print(df)

Output:

  statMonthName  statWeek    p_organic     f_organic  pct_diff
0           Mar         1  3646049.560  2.289567e+06 -0.372042
1           Mar         2  3867696.284  2.428752e+06 -0.372042
2           Mar         3  4051128.056  2.543939e+06 -0.372042
3           Mar         4  4095508.500  2.571808e+06 -0.372042
4           Apr         5  2778538.164  1.744806e+06 -0.372042
5           Apr         6  2789640.510  1.751778e+06 -0.372042
6           Apr         7  2736064.373  1.718134e+06 -0.372042
7           Apr         8  3105200.772  1.949937e+06 -0.372042
8           Apr         9  3112694.166  1.954642e+06 -0.372042

huangapple
  • 本文由 发表于 2023年3月7日 04:30:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655544.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定