如何融化数据框并列出列下的单词?

huangapple go评论57阅读模式
英文:

How can we melt a dataframe and list words under columns?

问题

我有一个看起来像这样的数据框。

import pandas as pd

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)
df

如果我尝试基本的melt操作,如下所示...

df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()

我得到这个...

我真正想要的是将交易号作为列,然后按单词列出。因此,如果transaction1是列,这些单词将在该列下列出:

'good','evening','how','are','you'

在transaction2下,这些单词将在该列下列出:

'how','can','i','help'

我该如何做呢?这里的start_time和end_time有点多余。

英文:

I have a dataframe that looks like this.

import pandas as pd

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)
df

如何融化数据框并列出列下的单词?

If I try a basic melt, like so...

df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()

I get this...

如何融化数据框并列出列下的单词?

What I really want is the transaction number as columns and the words listed down. So, if transaction1 was the column, these words would be listed in rows, under that column:

`'good','evening','how','are','you'`

Under transaction2, these words would be listed in rows, under that column:

'how','can','i','help'

How can I do that? The start_time and end_time are kind of superfluous here.

答案1

得分: 1

这是您想要的格式吗?

>>> pd.DataFrame({'1': ['good', 'evening', 'how', 'are', 'you'], '2': ['how', 'can', 'I', 'help', None]})
     1     2
0  good   how
1 evening   can
2    how     I
3    are  help
4    you  None

我以后可以将您提供的内容翻译成中文。

英文:

Is this the format you want?

>>> pd.DataFrame({'1': ['good', 'evening', 'how', 'are', 'you'], '2': ['how', 'can', 'I', 'help', None]})
         1     2
0     good   how
1  evening   can
2      how     I
3      are  help
4      you  None

I haven't done that before but you could pivot your data and collect a list of words under each transaction column.

>>> df.pivot_table(columns='transaction', values='clean_words', aggfunc=list)
transaction                               1                    2
clean_words  [good, evening, how, are, you]  [how, can, i, help]

Or group by transaction and collect a list of words.

>>> df.groupby('transaction', as_index=False).agg(clean_words=pd.NamedAgg(column='clean_words', aggfunc=list))
   transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]

答案2

得分: 1

import pandas as pd
import numpy as np

data = {'clean_words': ['good', 'evening', 'how', 'are', 'you', 'how', 'can', 'i', 'help'],
        'start_time': [1900, 2100, 2500, 2750, 2900, 1500, 1650, 1770, 1800],
        'end_time': [2100, 2500, 2750, 2900, 3000, 1650, 1770, 1800, 1950],
        'transaction': [1, 1, 1, 1, 1, 2, 2, 2, 2]}

df = pd.DataFrame(data)

df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()

print(df_melted)
英文:
import pandas as pd
import numpy as np

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)

df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()

print(df_melted)

transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]

huangapple
  • 本文由 发表于 2023年3月31日 04:48:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892874-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定