问题

以下是您要翻译的内容：

我有以下数据框`df`

    import pandas as pd
    from datasets import Dataset

    data = [[1, 'Jack', 'A'], [1, 'Jamie', 'A'], [1, 'Mo', 'B'], [1, 'Tammy', 'A'], [2, 'JJ', 'A'], [2, 'Perry', 'C']]
    df = pd.DataFrame(data, columns=['id', 'name', 'class'])
    > df
      id   name class
    0   1   Jack     A
    1   1  Jamie     A
    2   1     Mo     B
    3   1  Tammy     A
    4   2     JJ     A
    5   2  Perry     C

我想将其转换为一个Dataset对象，每个`id`一个行。期望的输出是

    > myDataset
    Dataset({
        features: ['id', 'name', 'class'],
        num_rows: 2
    })

其中

    > myDataset[0:2]
    {'id': ['1', '2'], 'name': [['Jack', 'Jamie', 'Mo', 'Tammy'], ['JJ', 'Perry']], 'class': [['A', 'A', 'B', 'A'], ['A', 'C']]}

根据文档[这里][1]，我尝试了以下方法，但这给了我一个包含6行的Dataset，而不是按`id`列分组的一个包含2行的Dataset。

    myDataset = Dataset.from_pandas(df) 
    > myDataset
    Dataset({
        features: ['id', 'name', 'class'],
        num_rows: 6
    })
    > myDataset[0:2]
    {'id': [1, 1], 'name': ['Jack', 'Jamie'], 'class': ['A', 'A']}

请注意，代码部分不包括在翻译中。

英文:

I have the following data frame df

import pandas as pd
from datasets import Dataset

data = [[1, &#39;Jack&#39;, &#39;A&#39;], [1, &#39;Jamie&#39;, &#39;A&#39;], [1, &#39;Mo&#39;, &#39;B&#39;], [1, &#39;Tammy&#39;, &#39;A&#39;], [2, &#39;JJ&#39;, &#39;A&#39;], [2, &#39;Perry&#39;, &#39;C&#39;]]
df = pd.DataFrame(data, columns=[&#39;id&#39;, &#39;name&#39;, &#39;class&#39;])
&gt; df
  id   name class
0   1   Jack     A
1   1  Jamie     A
2   1     Mo     B
3   1  Tammy     A
4   2     JJ     A
5   2  Perry     C

I would like to covert it to a Dataset object that has 2 rows, one per id. The desired output is

&gt; myDataset
Dataset({
    features: [&#39;id&#39;, &#39;name&#39;, &#39;class&#39;],
    num_rows: 2
})

where

&gt; myDataset[0:2]
{&#39;id&#39;: [&#39;1&#39;, &#39;2&#39;], &#39;name&#39;: [[&#39;Jack&#39;, &#39;Jamie&#39;, &#39;Mo&#39;, &#39;Tammy&#39;],[&#39;JJ&#39;, &#39;Perry&#39;]], &#39;class&#39;: [[&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;A&#39;], [&#39;A&#39;, &#39;C&#39;]]}

Based on the documentation here, I tried the following but that gave me a Dataset with 6 rows, instead of one with 2 rows and grouped by the column id

myDataset = Dataset.from_pandas(df) 
&gt; myDataset
Dataset({
    features: [&#39;id&#39;, &#39;name&#39;, &#39;class&#39;],
    num_rows: 6
})
&gt; myDataste[0:2]
{&#39;id&#39;: [1, 1], &#39;name&#39;: [&#39;Jack&#39;, &#39;Jamie&#39;], &#39;class&#39;: [&#39;A&#39;, &#39;A&#39;]}

答案1

得分: 0

myDataset = Dataset.from_pandas(df.groupby('id', as_index=False).agg(list))

英文:

You can try to aggregate the original dataframe by id

myDataset = Dataset.from_pandas(df.groupby(&#39;id&#39;, as_index=False).agg(list))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pandas数据帧按列值分组转换为Huggingface Dataset的方法是什么？

问题

答案1

How to adjust the image to meet the minimum requirements and avoid receiving the Telegram API Bad Request error: PHOTO_INVALID_DIMENSIONS?

Pytest 在升级到 pytest 7 后不起作用。

使用Python包（spaCy）仅覆盖特定语言词汇的单词列表。

SSL ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论