条件移动 pandas 列

huangapple go评论99阅读模式
英文:

Conditional shifting pandas column

问题

以下是您要翻译的内容:

  1. import pandas as pd
  2. import numpy as np
  3. # 创建'i'和'price'列的数据
  4. n = 10 # 条目数
  5. i_values = list(range(1, n+1))
  6. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  7. # 创建DataFrame
  8. data = {'i': i_values,
  9. 'price': price_values}
  10. df = pd.DataFrame(data)
  11. df['price_new'] = df.loc[df.i > 6, 'price'].shift(-3)

期望的输出:

  1. n = 10 # 条目数
  2. i_values = list(range(1, n+1))
  3. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  4. new_price_values = [np.NaN, np.NaN, np.NaN, 9.99, 14.99, 6.99, 11.99, np.NaN, np.NaN, np.NaN]
  5. # 创建DataFrame
  6. data = {'i': i_values,
  7. 'price': price_values,
  8. 'new_price': new_price_values}
  9. df = pd.DataFrame(data)
英文:

I want to conditional shift pandas column, would want to shift all columns with i > 6 below is what I am doing and it is not working

  1. import pandas as pd
  2. import numpy as np
  3. # Creating data for 'i' and 'price' columns
  4. n = 10 # Number of entries
  5. i_values = list(range(1, n+1))
  6. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  7. # Creating DataFrame
  8. data = {'i': i_values,
  9. 'price': price_values}
  10. df = pd.DataFrame(data)
  11. df['price_new'] = df.loc[df.i>6, 'price'].shift(-3)

Expected output:

  1. n = 10 # Number of entries
  2. i_values = list(range(1, n+1))
  3. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  4. new_price_values = [np.NaN, np.NaN, np.NaN, 9.99, 14.99, 6.99, 11.99, np.NaN, np.NaN, np.NaN]
  5. # Creating DataFrame
  6. data = {'i': i_values,
  7. 'price': price_values,
  8. 'new_price': new_price_values}
  9. df = pd.DataFrame(data)

答案1

得分: 1

应用偏移量,然后选择您希望保留的单元格。看起来您试图一次完成所有操作,只是在过程中错误地获取了索引。

您所寻求的一行代码

  1. shift_from = 6
  2. shift_by = -3
  3. df['price_new'] = df.loc[df.i>(shift_from+shift_by),'price'].shift(shift_by)

这将产生与您期望的输出完全相同的结果。

为了清晰起见,拆分成两个步骤

带有可舍弃的中间列。

1) 应用偏移

  1. df['price_shift'] = df['price'].shift(shift_by)
  2. df
  3. i price price_shift
  4. 0 1 10.99 8.49
  5. 1 2 19.99 12.99
  6. 2 3 5.99 15.99
  7. 3 4 8.49 9.99
  8. 4 5 12.99 14.99
  9. 5 6 15.99 6.99
  10. 6 7 9.99 11.99
  11. 7 8 14.99 NaN
  12. 8 9 6.99 NaN
  13. 9 10 11.99 NaN

2) 选择单元格

  1. df['price_new'] = df.loc[df.i>(shift_from+shift_by), 'price_shift']
  2. df
  3. i price price_shift price_new
  4. 0 1 10.99 8.49 NaN
  5. 1 2 19.99 12.99 NaN
  6. 2 3 5.99 15.99 NaN
  7. 3 4 8.49 9.99 9.99
  8. 4 5 12.99 14.99 14.99
  9. 5 6 15.99 6.99 6.99
  10. 6 7 9.99 11.99 11.99
  11. 7 8 14.99 NaN NaN
  12. 8 9 6.99 NaN NaN
  13. 9 10 11.99 NaN NaN
英文:

Apply the shift, then select the cells you wish to keep. It looks like you're attempting to do it all at once and simply getting the indices wrong in the process.

What you seek as a one-liner

  1. shift_from = 6
  2. shift_by = -3
  3. df['price_new'] = df.loc[df.i>(shift_from+shift_by),'price'].shift(shift_by)

This produces exactly your expected output.

Decomposed in 2 steps for clarity

With dispensable intermediate column.

1) Apply shift

  1. df['price_shift'] = df['price'].shift(shift_by)
  2. df
  3. i price price_shift
  4. 0 1 10.99 8.49
  5. 1 2 19.99 12.99
  6. 2 3 5.99 15.99
  7. 3 4 8.49 9.99
  8. 4 5 12.99 14.99
  9. 5 6 15.99 6.99
  10. 6 7 9.99 11.99
  11. 7 8 14.99 NaN
  12. 8 9 6.99 NaN
  13. 9 10 11.99 NaN

2) Select cells

  1. df['price_new'] = df.loc[df.i>(shift_from+shift_by), 'price_shift']
  2. df
  3. i price price_shift price_new
  4. 0 1 10.99 8.49 NaN
  5. 1 2 19.99 12.99 NaN
  6. 2 3 5.99 15.99 NaN
  7. 3 4 8.49 9.99 9.99
  8. 4 5 12.99 14.99 14.99
  9. 5 6 15.99 6.99 6.99
  10. 6 7 9.99 11.99 11.99
  11. 7 8 14.99 NaN NaN
  12. 8 9 6.99 NaN NaN
  13. 9 10 11.99 NaN NaN

答案2

得分: 0

这是一种方法:

  1. df['new_price'] = df['price'].where(df.index >= 6, np.NaN).shift(-3)

使用df.loc[df.i>6, 'price'].shift(-3)的问题在于它选择了最后四行(其中索引大于6的行):

  1. >>> df.loc[df.i>6, 'price']
  2. 6 9.99
  3. 7 14.99
  4. 8 6.99
  5. 9 11.99

然后对它们进行了向前平移:

  1. >>> df.loc[df.i>6, 'price'].shift(-3)
  2. 6 11.99
  3. 7 NaN
  4. 8 NaN
  5. 9 NaN
英文:

Here's one approach:

  1. df['new_price'] = df['price'].where(df.index >= 6, np.NaN).shift(-3)

The problem with df.loc[df.i>6, 'price'].shift(-3) is that it's selecting the last four rows (the ones where index is greater than 6:

  1. >>> df.loc[df.i>6, 'price']
  2. 6 9.99
  3. 7 14.99
  4. 8 6.99
  5. 9 11.99

and then it's shifting those:

  1. >>> df.loc[df.i>6, 'price'].shift(-3)
  2. 6 11.99
  3. 7 NaN
  4. 8 NaN
  5. 9 NaN

答案3

得分: 0

以下是翻译好的内容:

这是另一种方法。

  1. import pandas as pd
  2. import numpy as np
  3. # 为'i'和'price'列创建数据
  4. n = 10 # 条目数
  5. i_values = list(range(1, n+1))
  6. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  7. # 创建DataFrame
  8. data = {'i': i_values,
  9. 'price': price_values}
  10. df = pd.DataFrame(data)
  11. df['price_new'] = df.loc[df.i > 6, 'price']
  12. df['price_new'] = df['price_new'].shift(-3)

所以,首先创建新列(price_new),然后应用移位。

英文:

Here is another approach.

  1. import pandas as pd
  2. import numpy as np
  3. # Creating data for 'i' and 'price' columns
  4. n = 10 # Number of entries
  5. i_values = list(range(1, n+1))
  6. price_values = [10.99, 19.99, 5.99, 8.49, 12.99, 15.99, 9.99, 14.99, 6.99, 11.99]
  7. # Creating DataFrame
  8. data = {'i': i_values,
  9. 'price': price_values}
  10. df = pd.DataFrame(data)
  11. df['price_new'] = df.loc[df.i>6, 'price']
  12. df['price_new'] = df['price_new'].shift(-3)

So, first create new column (price_new), and then apply shift.

huangapple
  • 本文由 发表于 2023年7月18日 14:46:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76710147.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定