英文:
Forming n-grams for a dataframe (numbers) in python
问题
I found many posts explaining n-grams in words. However, I need to apply n-grams (n = 2 or 3) on a dataframe that has integer numbers of n x m. For example: Consider the below dataframe (3 x 5)
df =
1, 2, 3, 4, 5
6, 7, 8, 9, 10
11, 12, 13, 14, 15
I need to apply bigram and trigram on df.
I tried this code, but it does not work properly
for i in range(df.shape[0]):
row = list(str(df.iloc[i,:]))
print("row: ", row)
bigrams = [b for l in row for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
print(bigrams)
If the input is df = [10,20,30,40,50,60,...]
Expected output
Bigram
(10,20)(20,30)(30,40)(40,50)...
Trigram
(10,20,30)(20,30,40)(30,40,50)...
英文:
I found many posts explaining n-grams in words. However, I need to apply n-grams (n = 2 or 3) on a dataframe that has integer numbers of n x m. For example: Consider the below dataframe (3 x 5)
df =
1, 2, 3, 4, 5
6, 7, 8, 9, 10
11, 12, 13, 14, 15
I need to apply bigram and trigram on df.
I tried this code, but it does not work properly
for i in range(df.shape[0]):
row = list(str(df.iloc[i,:]))
print("row: ", row)
bigrams = [b for l in row for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
print(bigrams)
If the input is df = [10,20,30,40,50,60,...]
Expected output
Bigram
(10,20)(20,30)(30,40)(40,50)...
Trigram
(10,20,30)(20,30,40)(30,40,50)...
答案1
得分: 2
使用nltk.ngrams
:
from nltk.util import ngrams
# 对于bigrams
for a in df.values:
print(list(ngrams(a, n=2)))
[(1, 2), (2, 3), (3, 4), (4, 5)]
[(6, 7), (7, 8), (8, 9), (9, 10)]
[(11, 12), (12, 13), (13, 14), (14, 15)]
对于trigrams设置n=3
:
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
[(6, 7, 8), (7, 8, 9), (8, 9, 10)]
[(11, 12, 13), (12, 13, 14), (13, 14, 15)]
英文:
Use nltk.ngrams
:
from nltk.util import ngrams
# for bigrams
for a in df.values:
print(list(ngrams(a, n=2)))
[(1, 2), (2, 3), (3, 4), (4, 5)]
[(6, 7), (7, 8), (8, 9), (9, 10)]
[(11, 12), (12, 13), (13, 14), (14, 15)]
For trigrams set n=3
:
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
[(6, 7, 8), (7, 8, 9), (8, 9, 10)]
[(11, 12, 13), (12, 13, 14), (13, 14, 15)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论