2023年3月7日 05:38:50go评论65阅读模式

英文:

transpose columns and create list in new column

问题

非常复杂的问题，标题里难以涵盖。让我解释一下：

假设我有一个如下的数据框（DataFrame）：

id   'state': 'texas'  'phone_type': 'iphone'  'email_domain': 'gmail'
111  1                0                        1
222  0                1                        1
123  0                1                        0
234  1                0                        0
432  0                0                        1

用于创建数据框的代码

import pandas as pd

df_test = pd.DataFrame(columns=['id',
                                '\'state\': \'texas\'',
                                '\'phone_type\': \'iphone\'',
                                '\'email_domain\': \'gmail\''
                               ],
                       data=[
                           [111, 1, 0, 1],
                           [222, 0, 1, 1],
                           [123, 0, 1, 0],
                           [234, 1, 0, 0],
                           [432, 0, 0, 1]
                       ])

我如何将这些列转置为新的数据框的行，并将等于1的id放入新列中的列表中？还要添加一个列，显示id的计数，就像这样：

attr                     ids_list           count_ids
'state': 'texas'         [111, 234]         2
'phone_type': 'iphone'   [222, 123]         2
'email_domain': 'gmail'  [111, 222, 432]    3

英文:

Very difficult question to fit into a title. Let me explain:

let's say i have a df like this:

id			&#39;state&#39;: &#39;texas&#39;	&#39;phone_type&#39;: &#39;iphone&#39;	&#39;email_domain&#39;: &#39;gmail&#39;
111			1 					0						1
222			0					1 						1
123			0					1 						0
234			1 					0						0
432			0					0						1

#code for df

df_test = pd.DataFrame(columns=[&#39;id&#39;
                                ,&quot;&#39;state&#39;: &#39;texas&#39;&quot;
                                ,&quot;&#39;phone_type&#39;: &#39;iphone&#39;&quot;
                                ,&quot;&#39;email_domain&#39;: &#39;gmail&#39;&quot;
                               ]
                       ,data=[
                           [111,1,0,1]
                           ,[222,0,1,1]
                           ,[123,0,1,0]
                           ,[234,1,0,0]
                           ,[432,0,0,1]
                       ])

how can i take the columns, transpose them to rows in a new df, and put the ids that = 1 in a list in a new column? and throw in one more column of the count of ids. Like this:

attr						ids_list		count_ids
&#39;state&#39;: &#39;texas&#39;			[111,234]		2
&#39;phone_type&#39;: &#39;iphone&#39;		[222,123]		2
&#39;email_domain&#39;: &#39;gmail&#39;		[111,222,432]	3

答案1

得分: 2

你可以这样做

    from collections import defaultdict
    import pandas as pd

    new_df = defaultdict(list)
    for c in df_test.columns[1:]:
        ids = df_test['id'][(df_test[c].values).astype(bool)]
        new_df['attr'].append(c)
        new_df['ids_list'].append(ids.values.tolist())
        new_df['count_ids'].append(len(ids))

    new_df = pd.DataFrame(new_df)

请注意，这假定“id”是你原始数据框中的第一列。

英文:

You could do

from collections import defaultdict
import pandas as pd


new_df = defaultdict(list)
for c in df_test.columns[1:]:
    ids = df_test[&#39;id&#39;][(df_test[c].values).astype(bool)]
    new_df[&#39;attr&#39;].append(c)
    new_df[&#39;ids_list&#39;].append(ids.values.tolist())
    new_df[&#39;count_ids&#39;].append(len(ids))

new_df = pd.DataFrame(new_df)

Note that this assumes "id" is the first column in your original dataframe.

答案2

得分: 2

In pandas way (with pd.melt):

res = pd.melt(df_test, id_vars=['id'], var_name='attr')\
    .pipe(lambda df: df[df['value'].eq(1)])\
    .groupby('attr')['id'].agg([list, 'size'])\
    .rename(columns={'list': 'ids', 'size': 'count'}).reset_index()
print(res)

attr             ids  count
0  'email_domain': 'gmail'  [111, 222, 432]     3
1   'phone_type': 'iphone'       [222, 123]     2
2         'state': 'texas'       [111, 234]     2

英文:

In pandas way (with pd.melt):

res = pd.melt(df_test, id_vars=[&#39;id&#39;], var_name=&#39;attr&#39;)\
    .pipe(lambda df: df[df[&#39;value&#39;].eq(1)])\
    .groupby(&#39;attr&#39;)[&#39;id&#39;].agg([list, &#39;size&#39;])\
    .rename(columns={&#39;list&#39;: &#39;ids&#39;, &#39;size&#39;: &#39;count&#39;}).reset_index()
print(res)

                      attr             ids  count
0  &#39;email_domain&#39;: &#39;gmail&#39;  [111, 222, 432]     3
1   &#39;phone_type&#39;: &#39;iphone&#39;       [222, 123]     2
2         &#39;state&#39;: &#39;texas&#39;       [111, 234]     2

答案3

得分: 1

你的问题似乎更容易使用(default)dict来存储数据，而不是使用DataFrame。但如果你想要在pandas中完成，你可以使用以下代码：

results_df = pd.DataFrame()
results_df['attr'] = [
    "'state': 'texas'",
    "'phone_type': 'iphone'",
    "'email_domain': 'gmail'",
]
results_df['ids_list'] = [df_test[df_test[col] == 1]['id'].tolist()
                          for col in results_df['attr']]
results_df['count_ids'] = results_df['ids_list'].apply(len)

print(results_df)

                    attr       ids_list  count_ids
0         'state': 'texas'     [111, 234]          2
1   'phone_type': 'iphone'     [222, 123]          2
2  'email_domain': 'gmail'  [111, 222, 432]          3

英文:

Your questions sounds like it would be much easier to use a (default)dict to contain the data instead of a DataFrame. If you, however, wich to do it in pandas, you could use this:

results_df = pd.DataFrame()
results_df[&#39;attr&#39;] = [
    &quot;&#39;state&#39;: &#39;texas&#39;&quot;,
    &quot;&#39;phone_type&#39;: &#39;iphone&#39;&quot;,
    &quot;&#39;email_domain&#39;: &#39;gmail&#39;&quot;,
]
results_df[&#39;ids_list&#39;] = [df_test[df_test[col] == 1][&#39;id&#39;].tolist()
                          for col in results_df[&#39;attr&#39;]]
results_df[&#39;count_ids&#39;] = results_df[&#39;ids_list&#39;].apply(len)

print(results_df)

                      attr         ids_list  count_ids
0         &#39;state&#39;: &#39;texas&#39;       [111, 234]          2
1   &#39;phone_type&#39;: &#39;iphone&#39;       [222, 123]          2
2  &#39;email_domain&#39;: &#39;gmail&#39;  [111, 222, 432]          3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

转置列并在新列中创建列表。

问题

用于创建数据框的代码

答案1

答案2

答案3

drelu[z ≤ 0] = 0 的含义是什么？

如何创建一个自动设置编解码器（类型转换器）的 asyncpg 连接池？

如何获取由pandas.get_dummies()生成的列？

运行数千个相同爬虫实例

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论