How to create a smaller dataframe from an existing dataframe with the same numbers per label

huangapple go评论54阅读模式
英文:

How to create a smaller dataframe from an existing dataframe with the same numbers per label

问题

我有一个包含5万行和两列的数据框,列名分别为“item”和“labels”。我想要减少行数,但保持所有标签的值不变。

  • 标签 "notebook":1000 行
  • 标签 "ballpoint":1000 行
  • 标签 "pencil":1000 行
  • 标签 "eraser":1000 行
  • 标签 "pencil sharpener":1000 行

所以从5万行减少到只有每个标签都有相同数量的5000行。

英文:

I have a dataframe with 50k rows and two columns, item and labels. I want to reduce the number of rows but keep the same values for all labels.
So it looks like:

  • Label "notebook": 1000 rows
  • Label "ballpoint": 1000 rows
  • Label "pencil": 1000 rows
  • Label "eraser": 1000 rows
  • Label "pencil sharpener": 1000 rows

So from 50k rows, it reduces to only 5000 rows with the same number of rows for each label.

答案1

得分: 1

你需要执行分层抽样,简单来说就是将数据分成不同的组,然后从每个组中进行抽样。

抽样可以是成比例的,也可以是不成比例的。由于你已经提到想要每个标签都有1000行数据,所以选择不成比例抽样。以下是抽样的示例代码:

data = {    
    "item": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "label": ['A', 'B', 'A', 'C', 'B', 'B', 'A', 'C', 'A', 'B'],
}
df = pd.DataFrame(data)

# 对每个标签抽样两行数据
df.groupby("label").sample(n=2)
print(df)

输出结果如下:

   item label
0     3     A
1     7     B
2     6     A
3     4     C
4     5     B
5     8     B
6     1     A
7     2     C
8     9     A
9    10     B

请注意,这个代码示例中的抽样是成比例的,如果要进行不成比例抽样,需要进行相应的调整。

英文:

You need to perform stratified sampling which simply means converting your data into groups and then sample from each group.

The sampling could be proportionate or disproportionate. Since you have already mentioned that you want 1000 rows for each label, go for disproportionate sampling. The sample code is below:

data = {    
    "item": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "label": ['A', 'B', 'A', 'C', 'B', 'B', 'A', 'C', 'A', 'B'],
}
df = pd.DataFrame(data)

# Sampling two rows for each labels
df.groupby("label").sample(n=2)
print(df)
   item	label
0	3	A
1	7	A
2	6	B
3	5	B
4	4	C
5	8	C

huangapple
  • 本文由 发表于 2023年6月29日 05:14:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76576744.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定