“transforming data frame with python” 可以翻译为 “使用Python转换数据框”。

huangapple go评论56阅读模式
英文:

transforming data frame with python

问题

假设我在pandas中有以下数据框:

数据框

我想要将它转换成以下形式:

转换后的数据框

我该如何做呢?

我尝试过使用transpose、pd.wide_to_long和pd.melt,但它们都报错。我是新手,需要帮助,请帮忙!

英文:

Let's assume that I have the following data frame in pandas:
dataframe

and I want to transform it to the following:
transformeddataframe

How can I do it?

I have tried doing transpose, pd.wide_to_long, pd.melt but they are throwing errors. I am new to this and need help please!

答案1

得分: 1

以下是翻译好的代码部分:

import pandas as pd
# 重新创建您的起始数据框架
df = pd.read_csv(r"C:\...\bmi.csv", index_col=0)
df = df.iloc[2:, :]  # 删除前两行以匹配您的初始图片

# 将列名更改为仅年份
df.columns = pd.Series(df.columns).str.split(".", n=1, expand=True)[0]
# 并添加第一行
df.columns = pd.MultiIndex.from_tuples(list(zip(df.columns,
                                                df.iloc[0].str.lstrip())))
# 然后删除第一行
df = df.iloc[1:]

# 堆叠多级索引列的第一级(年份),并排序索引
df = df.stack(level=0).sort_index(level=[0, 1], ascending=[True, False])

# 在每一列中...
for col in df.columns:
    # ...提取字符串开头的浮点数(并转换为浮点数)
    df[col] = df[col].str.extract(r'(\d+\.\d+)', expand=False).astype(float)

如果您有任何问题,请告诉我。

英文:

You can use the following code to recreate the output dataframe from the initial pictured dataframe (the first few lines just recreate your dataframe).

import pandas as pd
# recreating your starting dataframe
df = pd.read_csv(r"C:\...\bmi.csv", index_col=0)
df = df.iloc[2:, :]  # drop first 2 rows to match your initial picture


# Change column names to just the year
df.columns = pd.Series(df.columns).str.split(".", n=1, expand=True)[0]
# and add the first row
df.columns = pd.MultiIndex.from_tuples(list(zip(df.columns,
                                                df.iloc[0].str.lstrip())))
# then remove first row
df = df.iloc[1:]

# stack the first level of the MultiIndex column (year), and sort the index
df = df.stack(level=0).sort_index(level=[0, 1], ascending=[True, False])

# in each column...
for col in df.columns:
    # ...extract the float at the start of the string (and convert to float)
    df[col] = df[col].str.extract(r'(\d+\.\d+)', expand=False).astype(float)

Let me know if you have any questions.

答案2

得分: 0

可能的解决方案:

df = (
    pd.read_csv("bmi.csv", index_col=0, header=[0,3], na_values="No data")
        .replace("\s+.+", "", regex=True).stack(0, dropna=False)
        .astype(float).reset_index(level=0, names="Country").pipe(
            lambda x: x.set_axis(x.index.str.split(".").str[0]))
        .groupby([pd.Grouper(level=0), "Country"], sort=False).first()
        .reset_index(names=["Year", "Country"]).rename_axis(columns=None)
        .sort_values(by=["Country", "Year"], ascending=[True, False])
        .reset_index(drop=True)
)

输出:

print(df)

      Year      Country   Both sexes   Female   Male
0     2016  Afghanistan         23.0     23.7   22.3
1     2015  Afghanistan         22.9     23.6   22.3
2     2014  Afghanistan         22.8     23.5   22.2
3     2013  Afghanistan         22.8     23.4   22.1
4     2012  Afghanistan         22.7     23.3   22.0
...    ...          ...          ...      ...    ...
8143  1979     Zimbabwe         22.0     23.6   20.3
8144  1978     Zimbabwe         21.9     23.6   20.2
8145  1977     Zimbabwe         21.9     23.5   20.2
8146  1976     Zimbabwe         21.8     23.5   20.1
8147  1975     Zimbabwe         21.8     23.5   20.0

[8148 rows x 5 columns]
英文:

A possible solution :

df = (
    pd.read_csv("bmi.csv", index_col=0, header=[0,3], na_values="No data")
        .replace("\s+.+", "", regex=True).stack(0, dropna=False)
        .astype(float).reset_index(level=0, names="Country").pipe(
            lambda x: x.set_axis(x.index.str.split(".").str[0]))
        .groupby([pd.Grouper(level=0), "Country"], sort=False).first()
        .reset_index(names=["Year", "Country"]).rename_axis(columns=None)
        .sort_values(by=["Country", "Year"], ascending=[True, False])
        .reset_index(drop=True)
)

Output :

print(df)

      Year      Country   Both sexes   Female   Male
0     2016  Afghanistan         23.0     23.7   22.3
1     2015  Afghanistan         22.9     23.6   22.3
2     2014  Afghanistan         22.8     23.5   22.2
3     2013  Afghanistan         22.8     23.4   22.1
4     2012  Afghanistan         22.7     23.3   22.0
...    ...          ...          ...      ...    ...
8143  1979     Zimbabwe         22.0     23.6   20.3
8144  1978     Zimbabwe         21.9     23.6   20.2
8145  1977     Zimbabwe         21.9     23.5   20.2
8146  1976     Zimbabwe         21.8     23.5   20.1
8147  1975     Zimbabwe         21.8     23.5   20.0

[8148 rows x 5 columns]

huangapple
  • 本文由 发表于 2023年6月9日 03:49:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435260.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定