“pd.wide_to_long” 使用后,某些列的总和不正确。

huangapple go评论64阅读模式
英文:

Sum of certain columns is wrong after using pd.wide_to_long?

问题

Here's the translated code portion you requested:

df['sum-Md', 'sum-Lo', 'sum-Up'] = (
    pd.wide_to_long(
        df, stubnames=["Medium", 'Lower', 'Upper'],
        i=["Date",], j="zone",
        sep="-", suffix='\w+'
    )
    .query("year>=2022", engine="python")
    .groupby("Date")
    .Medium
    .sum()
    .array
)

Please note that the translated code assumes you have the necessary Python libraries and modules imported and that the DataFrame df is defined as per your provided DataFrame structure.

英文:

My problem is: sum of columns which starts with Medium (5+7) not 12 as well the remaining columns which starts Lower and Upper using pd.wide_to_long and only shows the first sum i.e sum-Md.

I have the following dataframe:

   Date  Medium-Ab  Lower-B.c   Upper-Dd  Medium-Fb  Lower-Gc Upper-H.I  year
09/2022          5          3         10          7         4        12  2022
10/2022          8          4         12          9         6        14  2022
11/2022          9          6         14         10         9        16  2022
12/2022         15         14         20          5         4        18  2022
01/2023         17         13         25         13         8        12  2023 
    ...        ...        ...        ...        ...       ...       ...  ...
12/2023         16         11         24         16        12        19  2023
01/2024         27         23         35         33        28        42  2023 
    ...        ...        ...        ...        ...       ...       ...   ...
12/2024         10         11         14         16        12        19  2023
    ...        ...        ...        ...        ...       ...       ...  ...
12/2032        ...        ...        ...        ...       ...       ...  ...

What I want is:

   Date  Medium-Ab  Lower-B.c  Upper-Dd  Medium-Fb  Lower-Gc  Upper-H.I  year  sum-Md sum-Lo  sum-Up
09/2022          5          3        10          7         4         12  2022      12      7      22 
10/2022          8          4        12          9         6         14  2022     ...    ...     ...
11/2022          9          6        14         10         9         16  2022     ...    ...     ...
    ...        ...        ...       ...        ...       ...        ...   ...     ...    ...     ...
11/2022        ...        ...       ...        ...       ...        ...   ...     ...    ...     ...

My try is:

df['sum-Md','sum-Lo','sum-Up'] = (
   pd.wide_to_long(
       df, stubnames=["Medium", 'Lower', 'Upper'],
       i=["Date",], j="zone",
       sep="-", suffix='\w+'
   )
   .query("year>=2022", engine="python")
   .groupby("Date")
   .Medium
   .sum()
   .array
)

答案1

得分: 0

If I understand correctly, then the most straightforward and explicit way to do that would be:

df['sum-Md'] = df['Medium-Ab'] + df['Medium-Fb']
df['sum-Lo'] = df['Lower-B.c'] + df['Lower-Gc']
df['sum-Up'] = df['Upper-Dd'] + df['Upper-H.I']
# Part in comment
df['Medium-Ab'] = df['Medium-Ab'] / df['sum-Md']
df['Medium-Fb'] = df['Medium-Fb'] / df['sum-Md']
df['Lower-B.c'] = df['Lower-B.c'] / df['sum-Lo']
df['Lower-Gc'] = df['Lower-Gc'] / df['sum-Lo']
df['Upper-Dd'] = df['Upper-Dd'] / df['sum-Up']
df['Upper-H.I'] = df['Upper-H.I'] / df['sum-Up']

If the dataframe has more columns of the same sort, you could try more programmatic approaches like:

for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    col, cols = f'sum-{short}', [c for c in df.columns if c.startswith(start)]
    df[col] = df[cols].sum(axis=1)
    # Part in comment
    df[cols] = df[cols].div(df[col].values, axis=0)

or

df[['sum-Md','sum-Lo','sum-Up']] = (
    df.drop(columns=['Date', 'year'])
    .groupby(lambda c: c.split('-')[0], axis=1, sort=False)
    .sum()
)
# Part in comment
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    base = f'sum-{short}'
    for col in df.filter(regex=f'^{start}').columns:
        df[col] = df[col] / df[base]
英文:

If I understand

> Yes I want sum up Medium-Ab and Medium-Fb in a new column its name sum-Md as well as Lower columns in a new column its name sum-Lo and Upper columns in a new column sum-Up then append the new three columns to the original dataframe.

correctly, then the most straight-forward and explicit way to do that would be:

df['sum-Md'] = df['Medium-Ab'] + df['Medium-Fb']
df['sum-Lo'] = df['Lower-B.c'] + df['Lower-Gc']
df['sum-Up'] = df['Upper-Dd'] + df['Upper-H.I']
# Part in comment
df['Medium-Ab'] = df['Medium-Ab'] / df['sum-Md']
df['Medium-Fb'] = df['Medium-Fb'] / df['sum-Md']
df['Lower-B.c'] = df['Lower-B.c'] / df['sum-Lo']
df['Lower-Gc'] = df['Lower-Gc'] / df['sum-Lo']
df['Upper-Dd'] = df['Upper-Dd'] / df['sum-Up']
df['Upper-H.I'] = df['Upper-H.I'] / df['sum-Up']

If the dataframe has actually more columns of the sort then you could try more programatically approaches like

for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    col, cols = f'sum-{short}', [c for c in df.columns if c.startswith(start)]
    df[col] = df[cols].sum(axis=1)
    # Part in comment
    df[cols] = df[cols].div(df[col].values, axis=0)

or

df[['sum-Md','sum-Lo','sum-Up']] = (
    df.drop(columns=['Date', 'year'])
    .groupby(lambda c: c.split('-')[0], axis=1, sort=False)
    .sum()
)
# Part in comment
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    base = f'sum-{short}'
    for col in df.filter(regex=f'^{start}').columns:
        df[col] = df[col] / df[base]

huangapple
  • 本文由 发表于 2023年5月22日 17:51:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304949.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定