如何在Python中生成给定单词的所有字谜

huangapple go评论63阅读模式
英文:

How to generate all anagrams of a given word in Python

问题

我想在Python中编写一个函数,返回给定单词的所有可能的字谜。只有英语单词会被视为有效。

然而,到目前为止,我只能生成单词的所有排列。

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [''.join(p) for p in perms]

现在,我该如何高效地检查并过滤出有效的英语单词呢?

英文:

I want to code a function in Python returning all possible anagrams of a word given. Only words of the English language are considered as such.

However, up to now, I only managed to generate all permutations of a word.

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [''.join(p) for p in perms]

How can I efficiently check and filter for valid English words now?

答案1

得分: 3

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download('words')

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [''.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1('sink')
>>> {'inks', 'sink', 'skin'}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w) == word_sorted)
    return ana_lst
英文:

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download('words')

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [''.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1('sink')
>>> {'inks', 'sink', 'skin'}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w)==word_sorted)
    return ana_lst

答案2

得分: 0

你的方法对于长单词来说效率不高。最好预先计算所有的字谜:

  • 获取一个英语词典并按以下方式处理:

    • 对于每个单词,将字母排序,将其添加到Python字典中,使用排序后的单词作为键,并追加到具有相同键的单词列表中;
  • 对于给定的查询单词,对字母进行排序并查找Python字典。

例如,允许的单词是 "one", "two", "three", "neo"。存储为

"eno": ["one", "neo"]
"otw": ["two"]
"eehrt": ["three"]

现在 "eon" 的字谜 -> "eno" 是 "one" 和 "neo"。

英文:

Your method is pretty inefficient for long words. Better precompute all anagrams:

  • Get an English dictionary and process it as follows:

    • For every word, sort the letters and add to a Python dictionary using the sorted word as the key, and append to the list of words with the same key;
  • For a given query word, sort the letters and lookup the Python dictionary.

E.g. the allowed words are "one", "two", "three", "neo". Store as

"eno": ["one", "neo"]
"otw": ["two"]
"eehrt": ["three"]

Now the anagrams of "eon" -> "eno" are "one", "neo".


Note that the preprocessed dictionary is only about twice as large as the initial set of words. And the preprocessing time will remain reasonable as all words have to be input to the set/dictionary anyway (sorting the letters can be done fairly efficiently by histogram sort).

答案3

得分: 0

以下是您要翻译的内容:

如我在评论中所说,我只会通过字典进行一次遍历,看单词是否具有相同的字母:

from collections import Counter
TARGET_WORD = "trace"
target_pattern = Counter(TARGET_WORD.lower())

for word in open("dictionary.txt", "rt").read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

如果要处理多个单词,您可以预处理您的字典:

dictionary = [Counter(x.lower()) for x in open("dictionary.txt", "rt").read().splitlines()]

for word in word_list:  # word_list是您的目标单词列表。
    if Counter(word.lower()) in dictionary:
        print(word)

(编辑:我的第一个预处理程序试图将预处理的字典转换为集合,但经测试,Counter不可哈希,正如@KellyBundy立即注意到的那样。使用tuple(sorted(Counter(x).items())来进行条目和比较可能更有效,但我将其留给读者作为练习。)

英文:

As I said in my comment, I'd just do one pass through the dictionary to see if the word had the same letters:

from collections import Counter
TARGET_WORD = "trace"
target_pattern = Counter(TARGET_WORD.lower())

for word in open("dictionary.txt","rt").read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

If you were to process multiple words, you can preprocess your dictionary:

dictionary = [Counter(x.lower()) for x in open("dictionary.txt","rt").read().splitlines()]

for word in word_list:  # word_list is your list of target words.
    if Counter(word.lower()) in dictionary:
        print (word)

(Edit: My first preprocessing program tried to make the preprocessed dictionary a set, but tested the code and Counters aren't hashable as @KellyBundy caught right away. A set would probably be more efficient, so you could use tuple(sorted(Counter(x).items()) for the entries and the comparison, but I'll leave that as an exercise for the reader.)

huangapple
  • 本文由 发表于 2023年4月19日 21:31:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055155.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定