2023年6月9日 03:49:32go评论56阅读模式

英文:

transforming data frame with python

问题

假设我在pandas中有以下数据框：

数据框

我想要将它转换成以下形式：

转换后的数据框

我该如何做呢？

我尝试过使用transpose、pd.wide_to_long和pd.melt，但它们都报错。我是新手，需要帮助，请帮忙！

英文:

Let's assume that I have the following data frame in pandas:
dataframe

and I want to transform it to the following:
transformeddataframe

How can I do it?

I have tried doing transpose, pd.wide_to_long, pd.melt but they are throwing errors. I am new to this and need help please!

答案1

得分: 1

以下是翻译好的代码部分：

import pandas as pd
# 重新创建您的起始数据框架
df = pd.read_csv(r"C:\...\bmi.csv", index_col=0)
df = df.iloc[2:, :]  # 删除前两行以匹配您的初始图片

# 将列名更改为仅年份
df.columns = pd.Series(df.columns).str.split(".", n=1, expand=True)[0]
# 并添加第一行
df.columns = pd.MultiIndex.from_tuples(list(zip(df.columns,
                                                df.iloc[0].str.lstrip())))
# 然后删除第一行
df = df.iloc[1:]

# 堆叠多级索引列的第一级（年份），并排序索引
df = df.stack(level=0).sort_index(level=[0, 1], ascending=[True, False])

# 在每一列中...
for col in df.columns:
    # ...提取字符串开头的浮点数（并转换为浮点数）
    df[col] = df[col].str.extract(r'(\d+\.\d+)', expand=False).astype(float)

如果您有任何问题，请告诉我。

英文:

You can use the following code to recreate the output dataframe from the initial pictured dataframe (the first few lines just recreate your dataframe).

import pandas as pd
# recreating your starting dataframe
df = pd.read_csv(r&quot;C:\...\bmi.csv&quot;, index_col=0)
df = df.iloc[2:, :]  # drop first 2 rows to match your initial picture


# Change column names to just the year
df.columns = pd.Series(df.columns).str.split(&quot;.&quot;, n=1, expand=True)[0]
# and add the first row
df.columns = pd.MultiIndex.from_tuples(list(zip(df.columns,
                                                df.iloc[0].str.lstrip())))
# then remove first row
df = df.iloc[1:]

# stack the first level of the MultiIndex column (year), and sort the index
df = df.stack(level=0).sort_index(level=[0, 1], ascending=[True, False])

# in each column...
for col in df.columns:
    # ...extract the float at the start of the string (and convert to float)
    df[col] = df[col].str.extract(r&#39;(\d+\.\d+)&#39;, expand=False).astype(float)

Let me know if you have any questions.

答案2

得分: 0

可能的解决方案：

df = (
    pd.read_csv("bmi.csv", index_col=0, header=[0,3], na_values="No data")
        .replace("\s+.+", "", regex=True).stack(0, dropna=False)
        .astype(float).reset_index(level=0, names="Country").pipe(
            lambda x: x.set_axis(x.index.str.split(".").str[0]))
        .groupby([pd.Grouper(level=0), "Country"], sort=False).first()
        .reset_index(names=["Year", "Country"]).rename_axis(columns=None)
        .sort_values(by=["Country", "Year"], ascending=[True, False])
        .reset_index(drop=True)
)

输出：

print(df)

      Year      Country   Both sexes   Female   Male
0     2016  Afghanistan         23.0     23.7   22.3
1     2015  Afghanistan         22.9     23.6   22.3
2     2014  Afghanistan         22.8     23.5   22.2
3     2013  Afghanistan         22.8     23.4   22.1
4     2012  Afghanistan         22.7     23.3   22.0
...    ...          ...          ...      ...    ...
8143  1979     Zimbabwe         22.0     23.6   20.3
8144  1978     Zimbabwe         21.9     23.6   20.2
8145  1977     Zimbabwe         21.9     23.5   20.2
8146  1976     Zimbabwe         21.8     23.5   20.1
8147  1975     Zimbabwe         21.8     23.5   20.0

[8148 rows x 5 columns]

英文:

A possible solution :

df = (
    pd.read_csv(&quot;bmi.csv&quot;, index_col=0, header=[0,3], na_values=&quot;No data&quot;)
        .replace(&quot;\s+.+&quot;, &quot;&quot;, regex=True).stack(0, dropna=False)
        .astype(float).reset_index(level=0, names=&quot;Country&quot;).pipe(
            lambda x: x.set_axis(x.index.str.split(&quot;.&quot;).str[0]))
        .groupby([pd.Grouper(level=0), &quot;Country&quot;], sort=False).first()
        .reset_index(names=[&quot;Year&quot;, &quot;Country&quot;]).rename_axis(columns=None)
        .sort_values(by=[&quot;Country&quot;, &quot;Year&quot;], ascending=[True, False])
        .reset_index(drop=True)
)

Output :

print(df)

      Year      Country   Both sexes   Female   Male
0     2016  Afghanistan         23.0     23.7   22.3
1     2015  Afghanistan         22.9     23.6   22.3
2     2014  Afghanistan         22.8     23.5   22.2
3     2013  Afghanistan         22.8     23.4   22.1
4     2012  Afghanistan         22.7     23.3   22.0
...    ...          ...          ...      ...    ...
8143  1979     Zimbabwe         22.0     23.6   20.3
8144  1978     Zimbabwe         21.9     23.6   20.2
8145  1977     Zimbabwe         21.9     23.5   20.2
8146  1976     Zimbabwe         21.8     23.5   20.1
8147  1975     Zimbabwe         21.8     23.5   20.0

[8148 rows x 5 columns]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“transforming data frame with python” 可以翻译为 “使用Python转换数据框”。

问题

答案1

答案2

合并具有数组的数据框。

使用Python分离括号

output in python not as same as python3 (same script)

如何优化与SQLite数据库的工作？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论