Python / Pandas: 将行中的实体向右移动(末尾)

huangapple go评论104阅读模式
英文:

Python / Pandas: Shift entities of a row to the right (end)

问题

以下是翻译好的部分:

  1. import numpy as np
  2. import pandas as pd
  3. data = {
  4. 'Customer': ['A', 'B', 'C'],
  5. 'Date1': [10, 20, 30],
  6. 'Date2': [40, 50, np.nan],
  7. 'Date3': [np.nan, np.nan, np.nan],
  8. 'Date4': [60, np.nan, np.nan]
  9. }
  10. df = pd.DataFrame(data)
  11. for i in range(1, len(df.columns)):
  12. df.iloc[:, i] = df.iloc[:, i-1].shift(fill_value=np.nan)
  13. print(df)

希望这对你有所帮助。如果有任何其他问题,请随时提出。

英文:

I have the following data frame (number of "Date" columns can vary):

Customer Date1 Date2 Date3 Date4
0 A 10 40.0 NaN 60.0

1 B 20 50.0 NaN NaN

2 C 30 NaN NaN NaN

If there is a "NaN" in the last column (as said, number of columns can vary), I want to right shift all the columns to the end of the data frame such that it then looks like this:

Customer Date1 Date2 Date3 Date4

0 A 10 40.0 NaN 60.0

1 B NaN NaN 20 50.0

2 C NaN NaN NaN 30

All the values which remain empty can be set to NaN.

How can I do that in Python?

I tried this code but didn't work:

  1. import numpy as np
  2. import pandas as pd
  3. data = {
  4. 'Customer': ['A', 'B', 'C'],
  5. 'Date1': [10, 20, 30],
  6. 'Date2': [40, 50, np.nan],
  7. 'Date3': [np.nan, np.nan, np.nan],
  8. 'Date4': [60, np.nan, np.nan]
  9. }
  10. df = pd.DataFrame(data)
  11. for i in range(1, len(df.columns)):
  12. df.iloc[:, i] = df.iloc[:, i-1].shift(fill_value=np.nan)
  13. print(df)

答案1

得分: 1

如果您没有一行只包含NaN值,您可以使用:

  1. for i in range(len(df)):
  2. while(np.isnan(df.iloc[i,-1])):
  3. df.iloc[i,1:]=df.iloc[i,1:].shift(periods=1, fill_value=np.nan)

输出:

Python / Pandas: 将行中的实体向右移动(末尾)

英文:

If you don't have a row with only NaN values, you could use:

  1. for i in range(len(df)):
  2. while(np.isnan(df.iloc[i,-1])):
  3. df.iloc[i,1:]=df.iloc[i,1:].shift(periods=1, fill_value=np.nan)

Output:

Python / Pandas: 将行中的实体向右移动(末尾)

答案2

得分: 0

你可以临时将非目标列设置为索引(或删除它们),然后通过排序将非NaN值推到右边,仅更新与特定掩码匹配的行(这里是最后一列中的NaN值):

  1. out = (df
  2. .set_index('Customer', append=True)
  3. .pipe(lambda d: d.mask(d.iloc[:, -1].isna(),
  4. d.transform(lambda x : sorted(x, key=pd.notnull), axis=1)
  5. )
  6. )
  7. .reset_index('Customer')
  8. )

备选方案:

  1. other_cols = ['Customer']
  2. out = df.drop(columns=other_cols)
  3. m = out.iloc[:, -1].isna()
  4. out.loc[m, :] = out.loc[m, :].transform(lambda x : sorted(x, key=pd.notnull), axis=1)
  5. out = df[other_cols].join(out)[df.columns]

注:有多种移动非NaN值的方法,这里是其中一种,但如果这是一个瓶颈,也可以使用非排序的方法。

输出:

  1. Customer Date1 Date2 Date3 Date4
  2. 0 A 10.0 40.0 NaN 60.0
  3. 1 B NaN NaN 20.0 50.0
  4. 2 C NaN NaN NaN 30.0
英文:

You can temporarily set the non-target columns as index (or drop them), then push the non-NaNs to the right with sorting, and only update the rows that are matching a specific mask (here NaN in the last column):

  1. out = (df
  2. .set_index('Customer', append=True)
  3. .pipe(lambda d: d.mask(d.iloc[:, -1].isna(),
  4. d.transform(lambda x : sorted(x, key=pd.notnull), axis=1)
  5. )
  6. )
  7. .reset_index('Customer')
  8. )

Alternative:

  1. other_cols = ['Customer']
  2. out = df.drop(columns=other_cols)
  3. m = out.iloc[:, -1].isna()
  4. out.loc[m, :] = out.loc[m, :].transform(lambda x : sorted(x, key=pd.notnull), axis=1)
  5. out = df[other_cols].join(out)[df.columns]

NB. there are several methods to shift non-NaNs, here is one, but non-sorting based methods are possible if this is a bottleneck.

Output:

  1. Customer Date1 Date2 Date3 Date4
  2. 0 A 10.0 40.0 NaN 60.0
  3. 1 B NaN NaN 20.0 50.0
  4. 2 C NaN NaN NaN 30.0

huangapple
  • 本文由 发表于 2023年7月17日 19:43:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76704112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定