英文:
transpose columns and create list in new column
问题
非常复杂的问题,标题里难以涵盖。让我解释一下:
假设我有一个如下的数据框(DataFrame):
id 'state': 'texas' 'phone_type': 'iphone' 'email_domain': 'gmail'
111 1 0 1
222 0 1 1
123 0 1 0
234 1 0 0
432 0 0 1
用于创建数据框的代码
import pandas as pd
df_test = pd.DataFrame(columns=['id',
'\'state\': \'texas\'',
'\'phone_type\': \'iphone\'',
'\'email_domain\': \'gmail\''
],
data=[
[111, 1, 0, 1],
[222, 0, 1, 1],
[123, 0, 1, 0],
[234, 1, 0, 0],
[432, 0, 0, 1]
])
我如何将这些列转置为新的数据框的行,并将等于1的id放入新列中的列表中?还要添加一个列,显示id的计数,就像这样:
attr ids_list count_ids
'state': 'texas' [111, 234] 2
'phone_type': 'iphone' [222, 123] 2
'email_domain': 'gmail' [111, 222, 432] 3
英文:
Very difficult question to fit into a title. Let me explain:
let's say i have a df like this:
id 'state': 'texas' 'phone_type': 'iphone' 'email_domain': 'gmail'
111 1 0 1
222 0 1 1
123 0 1 0
234 1 0 0
432 0 0 1
#code for df
df_test = pd.DataFrame(columns=['id'
,"'state': 'texas'"
,"'phone_type': 'iphone'"
,"'email_domain': 'gmail'"
]
,data=[
[111,1,0,1]
,[222,0,1,1]
,[123,0,1,0]
,[234,1,0,0]
,[432,0,0,1]
])
how can i take the columns, transpose them to rows in a new df, and put the ids that = 1 in a list in a new column? and throw in one more column of the count of ids. Like this:
attr ids_list count_ids
'state': 'texas' [111,234] 2
'phone_type': 'iphone' [222,123] 2
'email_domain': 'gmail' [111,222,432] 3
答案1
得分: 2
你可以这样做
from collections import defaultdict
import pandas as pd
new_df = defaultdict(list)
for c in df_test.columns[1:]:
ids = df_test['id'][(df_test[c].values).astype(bool)]
new_df['attr'].append(c)
new_df['ids_list'].append(ids.values.tolist())
new_df['count_ids'].append(len(ids))
new_df = pd.DataFrame(new_df)
请注意,这假定“id”是你原始数据框中的第一列。
英文:
You could do
from collections import defaultdict
import pandas as pd
new_df = defaultdict(list)
for c in df_test.columns[1:]:
ids = df_test['id'][(df_test[c].values).astype(bool)]
new_df['attr'].append(c)
new_df['ids_list'].append(ids.values.tolist())
new_df['count_ids'].append(len(ids))
new_df = pd.DataFrame(new_df)
Note that this assumes "id" is the first column in your original dataframe.
答案2
得分: 2
In pandas way (with pd.melt
):
res = pd.melt(df_test, id_vars=['id'], var_name='attr')\
.pipe(lambda df: df[df['value'].eq(1)])\
.groupby('attr')['id'].agg([list, 'size'])\
.rename(columns={'list': 'ids', 'size': 'count'}).reset_index()
print(res)
attr ids count
0 'email_domain': 'gmail' [111, 222, 432] 3
1 'phone_type': 'iphone' [222, 123] 2
2 'state': 'texas' [111, 234] 2
英文:
In pandas way (with pd.melt
):
res = pd.melt(df_test, id_vars=['id'], var_name='attr')\
.pipe(lambda df: df[df['value'].eq(1)])\
.groupby('attr')['id'].agg([list, 'size'])\
.rename(columns={'list': 'ids', 'size': 'count'}).reset_index()
print(res)
attr ids count
0 'email_domain': 'gmail' [111, 222, 432] 3
1 'phone_type': 'iphone' [222, 123] 2
2 'state': 'texas' [111, 234] 2
答案3
得分: 1
你的问题似乎更容易使用(default)dict来存储数据,而不是使用DataFrame。但如果你想要在pandas中完成,你可以使用以下代码:
results_df = pd.DataFrame()
results_df['attr'] = [
"'state': 'texas'",
"'phone_type': 'iphone'",
"'email_domain': 'gmail'",
]
results_df['ids_list'] = [df_test[df_test[col] == 1]['id'].tolist()
for col in results_df['attr']]
results_df['count_ids'] = results_df['ids_list'].apply(len)
print(results_df)
attr ids_list count_ids
0 'state': 'texas' [111, 234] 2
1 'phone_type': 'iphone' [222, 123] 2
2 'email_domain': 'gmail' [111, 222, 432] 3
英文:
Your questions sounds like it would be much easier to use a (default)dict to contain the data instead of a DataFrame. If you, however, wich to do it in pandas, you could use this:
results_df = pd.DataFrame()
results_df['attr'] = [
"'state': 'texas'",
"'phone_type': 'iphone'",
"'email_domain': 'gmail'",
]
results_df['ids_list'] = [df_test[df_test[col] == 1]['id'].tolist()
for col in results_df['attr']]
results_df['count_ids'] = results_df['ids_list'].apply(len)
print(results_df)
attr ids_list count_ids
0 'state': 'texas' [111, 234] 2
1 'phone_type': 'iphone' [222, 123] 2
2 'email_domain': 'gmail' [111, 222, 432] 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论