你可以使用 Python Pandas 中的 pivot 和 melt 来转换一个 DataFrame 吗?

huangapple go评论66阅读模式
英文:

How can I transform a DataFrame using pivot and melt in Python Pandas?

问题

我有一个DataFrame:

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

我想将它改变,使得每一列包含给定日期的所有房屋的价格和楼层数。对于一列,该列的前两行是第一栋房屋的数据,接下来两行是第二栋房屋的数据。其余的没有数据的条目用缺失值NaN填充,像这样:

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

我在R中成功实现了这个目标:

df_trans <- df %>%
    pivot_longer(-date) %>%
    mutate(index=row_number(), .by = date) %>%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
    select(-index)

但是当我尝试使用Python时,它不按我想要的方式工作:

df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')

希望这能帮助你解决问题。

英文:

I have a DataFrame

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

and I want to change it, so that each column contains the prices and num_floors for all houses for a given date. For one column, the first two rows of a column refer to the first house, the next two to the second house. The remaining entries without data are filled with the missing value NaN, like this:

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

I succeed using R:

df_trans &lt;- df %&gt;%
    pivot_longer(-date) %&gt;%
    mutate(index=row_number(), .by = date) %&gt;%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %&gt;%
    select(-index)

but when I try with python, it does not work as I want:

df_trans = df.melt(id_vars=&#39;date&#39;)
df_trans[&#39;n&#39;] = df_trans.groupby(&#39;date&#39;).cumcount() + 1
df_trans = df_trans.pivot(index=&#39;n&#39;, columns=&#39;date&#39;, values=&#39;value&#39;)

答案1

得分: 1

尝试:

df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None

print(df)

输出:

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN
英文:

Try:

df = df.drop(columns=&#39;house&#39;)
df[&#39;tmp&#39;] = df.groupby(&#39;date&#39;).cumcount()
df = df.set_index([&#39;date&#39;, &#39;tmp&#39;]).stack().unstack(&#39;date&#39;).reset_index(drop=True)
df.columns.name = None

print(df)

Prints:

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN

答案2

得分: 0

pd.concat([j.set_index("date")[["price", "num_floors"]].T \
    for i, j in df.groupby(df.groupby("date").cumcount())])
英文:
pd.concat([j.set_index(&quot;date&quot;)[[&quot;price&quot;, &quot;num_floors&quot;]].T \
    for i, j in df.groupby(df.groupby(&quot;date&quot;).cumcount())])

In sections parts:

  1. Group by date and cumulative count, and group by this
  2. For each group, manipulate to desired output
  3. Concatenate groups

答案3

得分: 0

以下是已翻译的代码部分:

另一种可能的解决方案:

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df['date'], df['price'], df['num_floors'])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

或者:

(df.assign(
    price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
 .pivot(columns='date', values='price')
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

输出:

      2023-01-01  2023-01-02  2023-01-03
0     94.30076    102.78559    93.29053
1     3                 1                 3
2     95.58771    NaN            NaN
3     2                 NaN           NaN
英文:

Another possible solution:

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df[&#39;date&#39;], df[&#39;price&#39;], df[&#39;num_floors&#39;])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Alternatively,

(df.assign(
    price = [[x,y] for x,y in zip(df[&#39;price&#39;], df[&#39;num_floors&#39;])])
 .pivot(columns=&#39;date&#39;, values=&#39;price&#39;)
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Output:

  2023-01-01 2023-01-02 2023-01-03
0   94.30076  102.78559   93.29053
1          3          1          3
2   95.58771        NaN        NaN
3          2        NaN        NaN

huangapple
  • 本文由 发表于 2023年5月29日 05:35:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76353691.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定