英文:
Sum of certain columns is wrong after using pd.wide_to_long?
问题
Here's the translated code portion you requested:
df['sum-Md', 'sum-Lo', 'sum-Up'] = (
pd.wide_to_long(
df, stubnames=["Medium", 'Lower', 'Upper'],
i=["Date",], j="zone",
sep="-", suffix='\w+'
)
.query("year>=2022", engine="python")
.groupby("Date")
.Medium
.sum()
.array
)
Please note that the translated code assumes you have the necessary Python libraries and modules imported and that the DataFrame df
is defined as per your provided DataFrame structure.
英文:
My problem is: sum of columns which starts with Medium (5+7) not 12 as well the remaining columns which starts Lower and Upper using pd.wide_to_long
and only shows the first sum i.e sum-Md.
I have the following dataframe:
Date Medium-Ab Lower-B.c Upper-Dd Medium-Fb Lower-Gc Upper-H.I year
09/2022 5 3 10 7 4 12 2022
10/2022 8 4 12 9 6 14 2022
11/2022 9 6 14 10 9 16 2022
12/2022 15 14 20 5 4 18 2022
01/2023 17 13 25 13 8 12 2023
... ... ... ... ... ... ... ...
12/2023 16 11 24 16 12 19 2023
01/2024 27 23 35 33 28 42 2023
... ... ... ... ... ... ... ...
12/2024 10 11 14 16 12 19 2023
... ... ... ... ... ... ... ...
12/2032 ... ... ... ... ... ... ...
What I want is:
Date Medium-Ab Lower-B.c Upper-Dd Medium-Fb Lower-Gc Upper-H.I year sum-Md sum-Lo sum-Up
09/2022 5 3 10 7 4 12 2022 12 7 22
10/2022 8 4 12 9 6 14 2022 ... ... ...
11/2022 9 6 14 10 9 16 2022 ... ... ...
... ... ... ... ... ... ... ... ... ... ...
11/2022 ... ... ... ... ... ... ... ... ... ...
My try is:
df['sum-Md','sum-Lo','sum-Up'] = (
pd.wide_to_long(
df, stubnames=["Medium", 'Lower', 'Upper'],
i=["Date",], j="zone",
sep="-", suffix='\w+'
)
.query("year>=2022", engine="python")
.groupby("Date")
.Medium
.sum()
.array
)
答案1
得分: 0
If I understand correctly, then the most straightforward and explicit way to do that would be:
df['sum-Md'] = df['Medium-Ab'] + df['Medium-Fb']
df['sum-Lo'] = df['Lower-B.c'] + df['Lower-Gc']
df['sum-Up'] = df['Upper-Dd'] + df['Upper-H.I']
# Part in comment
df['Medium-Ab'] = df['Medium-Ab'] / df['sum-Md']
df['Medium-Fb'] = df['Medium-Fb'] / df['sum-Md']
df['Lower-B.c'] = df['Lower-B.c'] / df['sum-Lo']
df['Lower-Gc'] = df['Lower-Gc'] / df['sum-Lo']
df['Upper-Dd'] = df['Upper-Dd'] / df['sum-Up']
df['Upper-H.I'] = df['Upper-H.I'] / df['sum-Up']
If the dataframe has more columns of the same sort, you could try more programmatic approaches like:
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
col, cols = f'sum-{short}', [c for c in df.columns if c.startswith(start)]
df[col] = df[cols].sum(axis=1)
# Part in comment
df[cols] = df[cols].div(df[col].values, axis=0)
or
df[['sum-Md','sum-Lo','sum-Up']] = (
df.drop(columns=['Date', 'year'])
.groupby(lambda c: c.split('-')[0], axis=1, sort=False)
.sum()
)
# Part in comment
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
base = f'sum-{short}'
for col in df.filter(regex=f'^{start}').columns:
df[col] = df[col] / df[base]
英文:
If I understand
> Yes I want sum up Medium-Ab and Medium-Fb in a new column its name sum-Md as well as Lower columns in a new column its name sum-Lo and Upper columns in a new column sum-Up then append the new three columns to the original dataframe.
correctly, then the most straight-forward and explicit way to do that would be:
df['sum-Md'] = df['Medium-Ab'] + df['Medium-Fb']
df['sum-Lo'] = df['Lower-B.c'] + df['Lower-Gc']
df['sum-Up'] = df['Upper-Dd'] + df['Upper-H.I']
# Part in comment
df['Medium-Ab'] = df['Medium-Ab'] / df['sum-Md']
df['Medium-Fb'] = df['Medium-Fb'] / df['sum-Md']
df['Lower-B.c'] = df['Lower-B.c'] / df['sum-Lo']
df['Lower-Gc'] = df['Lower-Gc'] / df['sum-Lo']
df['Upper-Dd'] = df['Upper-Dd'] / df['sum-Up']
df['Upper-H.I'] = df['Upper-H.I'] / df['sum-Up']
If the dataframe has actually more columns of the sort then you could try more programatically approaches like
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
col, cols = f'sum-{short}', [c for c in df.columns if c.startswith(start)]
df[col] = df[cols].sum(axis=1)
# Part in comment
df[cols] = df[cols].div(df[col].values, axis=0)
or
df[['sum-Md','sum-Lo','sum-Up']] = (
df.drop(columns=['Date', 'year'])
.groupby(lambda c: c.split('-')[0], axis=1, sort=False)
.sum()
)
# Part in comment
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
base = f'sum-{short}'
for col in df.filter(regex=f'^{start}').columns:
df[col] = df[col] / df[base]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论