问题

以下是您要的翻译：

我有一个看起来像这样的数据框：

df =

id   tweet
1     a
1     b
1     b
1     a
2     d
2     a
2     a
3     b
3     b
4     a
4     b

现在我想按id分组并获取它们的推文：

df.groupby("id").count()

这将导致：

df =

id   count
1     4
2     3
3     2
4     2

然而，我想对数据进行子采样，以便只保存在数据框中推文数量少于n的用户，并且如果您有多于n个样本（推文），则应随机对其进行子采样。我应该如何做？我已尝试以下方法，但它们只返回整个行的n个样本：

n = 3
print(data.groupby("user_id").apply(lambda x: x.sample(min(n, len(x)), replace=False)).reset_index(drop=True))
print(data.groupby('user_id').sample(n, random_state=1))

希望这有助于您理解如何进行子采样操作。

英文:

I have a dataframe that looks like this:

Now I want to group by their id and get their tweets:

df.groupby([&quot;id&quot;]).count()

Which leads me to

However, I'd like to subsample the data so that only users with <n tweets are saved in the dataframe and if you have more than n samples (tweets) your tweets should get randomly subsampled. How do I do this? I haved tried the following but they only return n samples for the entire row...

n=3
print(data.groupby([&quot;user_id&quot;]).apply(lambda x: x.sample(min(n,len(x)), replace=False)).reset_index(drop=True))
print(data.groupby(&#39;user_id&#39;).sample(n, random_state=1))

答案1

得分: 1

shuffle然后 groupby().head():

df.sample(frac=1).groupby('id').head(N)

英文:

shuffle then groupby().head():

df.sample(frac=1).groupby(&#39;id&#39;).head(N)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Subsample after GroupBy” 可以翻译为 “分组后进行子采样”。

问题

答案1

尝试使用Python和Jupyter Notebook进行网页抓取时收到403错误。

传奇以特定格式呈现

使用正则表达式组来在pandas数据框中通过同时匹配多个模式来重命名列。

如何在满足条件的情况下填充一列，该条件要跨越其他2个或更多列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论