2023年5月29日 05:35:13go评论70阅读模式

英文:

How can I transform a DataFrame using pivot and melt in Python Pandas?

问题

我有一个DataFrame：

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

我想将它改变，使得每一列包含给定日期的所有房屋的价格和楼层数。对于一列，该列的前两行是第一栋房屋的数据，接下来两行是第二栋房屋的数据。其余的没有数据的条目用缺失值NaN填充，像这样：

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

我在R中成功实现了这个目标：

df_trans <- df %>%
    pivot_longer(-date) %>%
    mutate(index=row_number(), .by = date) %>%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
    select(-index)

但是当我尝试使用Python时，它不按我想要的方式工作：

df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')

希望这能帮助你解决问题。

英文:

I have a DataFrame

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

and I want to change it, so that each column contains the prices and num_floors for all houses for a given date. For one column, the first two rows of a column refer to the first house, the next two to the second house. The remaining entries without data are filled with the missing value NaN, like this:

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

I succeed using R:

df_trans &lt;- df %&gt;%
    pivot_longer(-date) %&gt;%
    mutate(index=row_number(), .by = date) %&gt;%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %&gt;%
    select(-index)

but when I try with python, it does not work as I want:

df_trans = df.melt(id_vars=&#39;date&#39;)
df_trans[&#39;n&#39;] = df_trans.groupby(&#39;date&#39;).cumcount() + 1
df_trans = df_trans.pivot(index=&#39;n&#39;, columns=&#39;date&#39;, values=&#39;value&#39;)

答案1

得分: 1

尝试：

df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None

print(df)

输出：

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN

英文:

Try:

df = df.drop(columns=&#39;house&#39;)
df[&#39;tmp&#39;] = df.groupby(&#39;date&#39;).cumcount()
df = df.set_index([&#39;date&#39;, &#39;tmp&#39;]).stack().unstack(&#39;date&#39;).reset_index(drop=True)
df.columns.name = None

print(df)

Prints:

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN

答案2

得分: 0

pd.concat([j.set_index("date")[["price", "num_floors"]].T \
    for i, j in df.groupby(df.groupby("date").cumcount())])

英文:

pd.concat([j.set_index(&quot;date&quot;)[[&quot;price&quot;, &quot;num_floors&quot;]].T \
    for i, j in df.groupby(df.groupby(&quot;date&quot;).cumcount())])

In sections parts:

Group by date and cumulative count, and group by this
For each group, manipulate to desired output
Concatenate groups

答案3

得分: 0

以下是已翻译的代码部分：

另一种可能的解决方案：

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df['date'], df['price'], df['num_floors'])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

或者：

(df.assign(
    price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
 .pivot(columns='date', values='price')
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

输出：

      2023-01-01  2023-01-02  2023-01-03
0     94.30076    102.78559    93.29053
1     3                 1                 3
2     95.58771    NaN            NaN
3     2                 NaN           NaN

英文:

Another possible solution:

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df[&#39;date&#39;], df[&#39;price&#39;], df[&#39;num_floors&#39;])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Alternatively,

(df.assign(
    price = [[x,y] for x,y in zip(df[&#39;price&#39;], df[&#39;num_floors&#39;])])
 .pivot(columns=&#39;date&#39;, values=&#39;price&#39;)
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Output:

  2023-01-01 2023-01-02 2023-01-03
0   94.30076  102.78559   93.29053
1          3          1          3
2   95.58771        NaN        NaN
3          2        NaN        NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用 Python Pandas 中的 pivot 和 melt 来转换一个 DataFrame 吗？

问题

答案1

答案2

答案3

如何在继承内置集合类型时避免mypy的投诉？

你能使用NumPy的切片功能来在不知道索引的情况下更改数值吗？

Python字典在for循环中

Python matplotlib stepped axis label

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论