Two Letter Bigram in Pandas Dataframe

huangapple go评论70阅读模式
英文:

Two Letter Bigram in Pandas Dataframe

问题

Here's the translation of the code section you provided:

df['bigram'] = list(zip(df['string'], df['string'][1:]))
df['bigram'] = list(ngrams(df['string'], n=2))
df['bigram'] = re.findall(r'[a-zA-z]{2}', df['string'])

Please note that I've translated the code portions only as per your request.

英文:

Having trouble finding a way to get every two letter combination in a string in a dataframe. Everything I have been searching is for words rather than letters. Below is expected output.

stringoutputhellohe, el, ll, loworldwo, or, rl,

I have tried both lines below

df['bigram'] = list(zip(df['string'],df['string][1:]))

Generated this error

ValueError: Length of values (15570) does not match length of index (15571)

df['bigram'] = list(ngrams(df['string'], n=2))

Generated this error

ValueError: Length of values (15570) does not match length of index (15571)

df['bigram']=re.findall(r'[a-zA-z]{2}', df['string'])

Generated this error

TypeError: expected string or bytes-like object

Example:

string output
hello he, el, ll, lo
world wo, or, rl, ld

答案1

得分: 0

以下是代码部分的翻译:

You need to loop over the strings:

from nltk import ngrams

df = pd.DataFrame({'string': ['abc', 'abcdef']})

df['bigram'] = df['string'].apply(lambda x: list(ngrams(x, n=2)))

Output:

   string                                    bigram
0     abc                          [(a, b), (b, c)]
1  abcdef  [(a, b), (b, c), (c, d), (d, e), (e, f)]

If you want a string:

df['bigram'] = [', '.join([x[i:i+2] for i in range(len(x)-2)])
                for x in df['string']]

Output:

   string          bigram
0     abc              ab
1  abcdef  ab, bc, cd, de
英文:

You need to loop over the strings:

from nltk import ngrams

df = pd.DataFrame({'string': ['abc', 'abcdef']})

df['bigram'] = df['string'].apply(lambda x: list(ngrams(x, n=2)))

Output:

   string                                    bigram
0     abc                          [(a, b), (b, c)]
1  abcdef  [(a, b), (b, c), (c, d), (d, e), (e, f)]

If you want a string:

df['bigram'] = [', '.join([x[i:i+2] for i in range(len(x)-2)])
                for x in df['string']]

Output:

   string          bigram
0     abc              ab
1  abcdef  ab, bc, cd, de

huangapple
  • 本文由 发表于 2023年3月21日 02:22:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793969.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定