2023年2月18日 02:35:00go评论77阅读模式

英文:

Create datasets based on authors from another dataset

问题

我有一个以以下格式的数据集

       text          author        title 
     -------------------------------------
dt =   text0         author0       title0
       text1         author1       title1
         .             .              .
         .             .              .
         .             .              .

我想创建不同的单独数据集，其中仅包含一个作者的文本。例如，数据集名称dt1包含author1的文本，dt2包含author2的文本，依此类推。

如果你需要用Python帮你实现这个，我可以帮你。

英文:

I have a dataset in the following format


       text          author        title 
     -------------------------------------
dt =   text0         author0       title0
       text1         author1       title1
         .             .              .
         .             .              .
         .             .              .

and I would like to create different separate datasets which contain only texts of one author. For example the dataset names dt1 contains the texts of the author1, the dt2 contains texts of the author2, etc.

I would be grateful if you could help me with this using python.

Update:

dt = 
            text	                                 author	       title
-------------------------------------------------------------------------
0	I would like to go to the beach		              George       Beach
1   I was in park few days ago                        Nick         Park
2	I would like to go in uni	                      Peter        University
3   I have be in the airport at 8                     Maria        Airport

答案1

得分: 1

请尝试，这是我理解你需要的。

import pandas as pd
data = {
    'text': ['text0', 'text1', 'text2'],
    'author': ['author0', 'author1', 'author1'],
    'title': ['Comunicación', 'Administración', 'Ventas']
}
df = pd.DataFrame(data)
df1 = df[df["author"]=="author0"]
df2 = df[df["author"]=="author1"]
list_author = df['author'].unique().tolist()
for x in list_author:
  a = df[df["author"]==x]
  print(a)

英文:

Please try, this is what I understand you require.

import pandas as pd
data = {
    &#39;text&#39; : [&#39;text0&#39;, &#39;text1&#39;, &#39;text2&#39;],
    &#39;author&#39;: [&#39;author0&#39;, &#39;author1&#39;, &#39;author1&#39;],
    &#39;title&#39;: [&#39;Comunicaci&#243;n&#39;, &#39;Administraci&#243;n&#39;, &#39;Ventas&#39;]
}
df = pd.DataFrame(data)
df1 = df[df[&quot;author&quot;]==&quot;author0&quot;]
df2 = df[df[&quot;author&quot;]==&quot;author1&quot;]
print(df1)
print(df2)

Update:

import pandas as pd
data = {
    &#39;text&#39; : [&#39;text0&#39;, &#39;text1&#39;, &#39;text2&#39;],
    &#39;author&#39;: [&#39;author0&#39;, &#39;author1&#39;, &#39;author1&#39;],
    &#39;title&#39;: [&#39;Comunicaci&#243;n&#39;, &#39;Administraci&#243;n&#39;, &#39;Ventas&#39;]
}
df = pd.DataFrame(data)
df1 = df[df[&quot;author&quot;]==&quot;author0&quot;]
df2 = df[df[&quot;author&quot;]==&quot;author1&quot;]
list_author = df[&#39;author&#39;].unique().tolist()
for x in list_author:
  a = df[df[&quot;author&quot;]==x]
  print(a)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建基于另一个数据集的作者的数据集。

问题

答案1

我在使用类时遇到了意外的列表赋值语句。

如何执行一个不返回输出且不赋值给变量的自定义过滤器？

sympy.Sum error: IndexError: only integers, slices (`:`), ellipsis (`…`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

JSON文件的更正。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。