转置列并在新列中创建列表。

huangapple go评论65阅读模式
英文:

transpose columns and create list in new column

问题

非常复杂的问题,标题里难以涵盖。让我解释一下:

假设我有一个如下的数据框(DataFrame):

id   'state': 'texas'  'phone_type': 'iphone'  'email_domain': 'gmail'
111  1                0                        1
222  0                1                        1
123  0                1                        0
234  1                0                        0
432  0                0                        1

用于创建数据框的代码

import pandas as pd

df_test = pd.DataFrame(columns=['id',
                                '\'state\': \'texas\'',
                                '\'phone_type\': \'iphone\'',
                                '\'email_domain\': \'gmail\''
                               ],
                       data=[
                           [111, 1, 0, 1],
                           [222, 0, 1, 1],
                           [123, 0, 1, 0],
                           [234, 1, 0, 0],
                           [432, 0, 0, 1]
                       ])

我如何将这些列转置为新的数据框的行,并将等于1的id放入新列中的列表中?还要添加一个列,显示id的计数,就像这样:

attr                     ids_list           count_ids
'state': 'texas'         [111, 234]         2
'phone_type': 'iphone'   [222, 123]         2
'email_domain': 'gmail'  [111, 222, 432]    3
英文:

Very difficult question to fit into a title. Let me explain:

let's say i have a df like this:

id			'state': 'texas'	'phone_type': 'iphone'	'email_domain': 'gmail'
111			1 					0						1
222			0					1 						1
123			0					1 						0
234			1 					0						0
432			0					0						1

#code for df

df_test = pd.DataFrame(columns=['id'
                                ,"'state': 'texas'"
                                ,"'phone_type': 'iphone'"
                                ,"'email_domain': 'gmail'"
                               ]
                       ,data=[
                           [111,1,0,1]
                           ,[222,0,1,1]
                           ,[123,0,1,0]
                           ,[234,1,0,0]
                           ,[432,0,0,1]
                       ])

how can i take the columns, transpose them to rows in a new df, and put the ids that = 1 in a list in a new column? and throw in one more column of the count of ids. Like this:

attr						ids_list		count_ids
'state': 'texas'			[111,234]		2
'phone_type': 'iphone'		[222,123]		2
'email_domain': 'gmail'		[111,222,432]	3

答案1

得分: 2

你可以这样做

    from collections import defaultdict
    import pandas as pd

    new_df = defaultdict(list)
    for c in df_test.columns[1:]:
        ids = df_test['id'][(df_test[c].values).astype(bool)]
        new_df['attr'].append(c)
        new_df['ids_list'].append(ids.values.tolist())
        new_df['count_ids'].append(len(ids))

    new_df = pd.DataFrame(new_df)

请注意,这假定“id”是你原始数据框中的第一列。
英文:

You could do

from collections import defaultdict
import pandas as pd


new_df = defaultdict(list)
for c in df_test.columns[1:]:
    ids = df_test['id'][(df_test[c].values).astype(bool)]
    new_df['attr'].append(c)
    new_df['ids_list'].append(ids.values.tolist())
    new_df['count_ids'].append(len(ids))

new_df = pd.DataFrame(new_df)

Note that this assumes "id" is the first column in your original dataframe.

答案2

得分: 2

In pandas way (with pd.melt):

res = pd.melt(df_test, id_vars=['id'], var_name='attr')\
    .pipe(lambda df: df[df['value'].eq(1)])\
    .groupby('attr')['id'].agg([list, 'size'])\
    .rename(columns={'list': 'ids', 'size': 'count'}).reset_index()
print(res)

attr             ids  count
0  'email_domain': 'gmail'  [111, 222, 432]     3
1   'phone_type': 'iphone'       [222, 123]     2
2         'state': 'texas'       [111, 234]     2
英文:

In pandas way (with pd.melt):

res = pd.melt(df_test, id_vars=['id'], var_name='attr')\
    .pipe(lambda df: df[df['value'].eq(1)])\
    .groupby('attr')['id'].agg([list, 'size'])\
    .rename(columns={'list': 'ids', 'size': 'count'}).reset_index()
print(res)

                      attr             ids  count
0  'email_domain': 'gmail'  [111, 222, 432]     3
1   'phone_type': 'iphone'       [222, 123]     2
2         'state': 'texas'       [111, 234]     2

答案3

得分: 1

你的问题似乎更容易使用(default)dict来存储数据,而不是使用DataFrame。但如果你想要在pandas中完成,你可以使用以下代码:

results_df = pd.DataFrame()
results_df['attr'] = [
    "'state': 'texas'",
    "'phone_type': 'iphone'",
    "'email_domain': 'gmail'",
]
results_df['ids_list'] = [df_test[df_test[col] == 1]['id'].tolist()
                          for col in results_df['attr']]
results_df['count_ids'] = results_df['ids_list'].apply(len)
print(results_df)
                    attr       ids_list  count_ids
0         'state': 'texas'     [111, 234]          2
1   'phone_type': 'iphone'     [222, 123]          2
2  'email_domain': 'gmail'  [111, 222, 432]          3
英文:

Your questions sounds like it would be much easier to use a (default)dict to contain the data instead of a DataFrame. If you, however, wich to do it in pandas, you could use this:

results_df = pd.DataFrame()
results_df['attr'] = [
    "'state': 'texas'",
    "'phone_type': 'iphone'",
    "'email_domain': 'gmail'",
]
results_df['ids_list'] = [df_test[df_test[col] == 1]['id'].tolist()
                          for col in results_df['attr']]
results_df['count_ids'] = results_df['ids_list'].apply(len)

print(results_df)

                      attr         ids_list  count_ids
0         'state': 'texas'       [111, 234]          2
1   'phone_type': 'iphone'       [222, 123]          2
2  'email_domain': 'gmail'  [111, 222, 432]          3

huangapple
  • 本文由 发表于 2023年3月7日 05:38:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656086.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定