在Python中为数据框(数字)创建n-gram:

huangapple go评论57阅读模式
英文:

Forming n-grams for a dataframe (numbers) in python

问题

I found many posts explaining n-grams in words. However, I need to apply n-grams (n = 2 or 3) on a dataframe that has integer numbers of n x m. For example: Consider the below dataframe (3 x 5)

df = 

1, 2, 3, 4, 5

6, 7, 8, 9, 10

11, 12, 13, 14, 15 

I need to apply bigram and trigram on df.

I tried this code, but it does not work properly

for i in range(df.shape[0]):
	row = list(str(df.iloc[i,:]))
	print("row:  ", row)
	bigrams = [b for l in row for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
	print(bigrams)

If the input is df = [10,20,30,40,50,60,...]

Expected output

Bigram

(10,20)(20,30)(30,40)(40,50)...

Trigram

(10,20,30)(20,30,40)(30,40,50)...

英文:

I found many posts explaining n-grams in words. However, I need to apply n-grams (n = 2 or 3) on a dataframe that has integer numbers of n x m. For example: Consider the below dataframe (3 x 5)

df = 

1, 2, 3, 4, 5

6, 7, 8, 9, 10

11, 12, 13, 14, 15 

I need to apply bigram and trigram on df.

I tried this code, but it does not work properly

for i in range(df.shape[0]):
	row = list(str(df.iloc[i,:]))
	print("row:  ", row)
	bigrams = [b for l in row for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
	print(bigrams)

If the input is df = [10,20,30,40,50,60,...]

Expected output

Bigram

(10,20)(20,30)(30,40)(40,50)...

Trigram

(10,20,30)(20,30,40)(30,40,50)...

答案1

得分: 2

使用nltk.ngrams

from nltk.util import ngrams

# 对于bigrams
for a in df.values:
    print(list(ngrams(a, n=2)))

[(1, 2), (2, 3), (3, 4), (4, 5)]
[(6, 7), (7, 8), (8, 9), (9, 10)]
[(11, 12), (12, 13), (13, 14), (14, 15)]

对于trigrams设置n=3

[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
[(6, 7, 8), (7, 8, 9), (8, 9, 10)]
[(11, 12, 13), (12, 13, 14), (13, 14, 15)]
英文:

Use nltk.ngrams:

from nltk.util import ngrams

# for bigrams
for a in df.values:
    print(list(ngrams(a, n=2)))

[(1, 2), (2, 3), (3, 4), (4, 5)]
[(6, 7), (7, 8), (8, 9), (9, 10)]
[(11, 12), (12, 13), (13, 14), (14, 15)]

For trigrams set n=3:

[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
[(6, 7, 8), (7, 8, 9), (8, 9, 10)]
[(11, 12, 13), (12, 13, 14), (13, 14, 15)]

huangapple
  • 本文由 发表于 2023年7月17日 17:04:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702921.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定