2023年4月19日 21:31:11go评论63阅读模式

英文:

How to generate all anagrams of a given word in Python

问题

我想在Python中编写一个函数，返回给定单词的所有可能的字谜。只有英语单词会被视为有效。

然而，到目前为止，我只能生成单词的所有排列。

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [''.join(p) for p in perms]

现在，我该如何高效地检查并过滤出有效的英语单词呢？

英文:

I want to code a function in Python returning all possible anagrams of a word given. Only words of the English language are considered as such.

However, up to now, I only managed to generate all permutations of a word.

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [&#39;&#39;.join(p) for p in perms]

How can I efficiently check and filter for valid English words now?

答案1

得分: 3

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download('words')

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [''.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1('sink')
>>> {'inks', 'sink', 'skin'}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w) == word_sorted)
    return ana_lst

英文:

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download(&#39;words&#39;)

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [&#39;&#39;.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1(&#39;sink&#39;)
&gt;&gt;&gt; {&#39;inks&#39;, &#39;sink&#39;, &#39;skin&#39;}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w)==word_sorted)
    return ana_lst

答案2

得分: 0

你的方法对于长单词来说效率不高。最好预先计算所有的字谜：

获取一个英语词典并按以下方式处理：
- 对于每个单词，将字母排序，将其添加到Python字典中，使用排序后的单词作为键，并追加到具有相同键的单词列表中；
对于给定的查询单词，对字母进行排序并查找Python字典。

例如，允许的单词是 "one", "two", "three", "neo"。存储为

"eno": ["one", "neo"]
"otw": ["two"]
"eehrt": ["three"]

现在 "eon" 的字谜 -> "eno" 是 "one" 和 "neo"。

英文:

Your method is pretty inefficient for long words. Better precompute all anagrams:

Get an English dictionary and process it as follows:
- For every word, sort the letters and add to a Python dictionary using the sorted word as the key, and append to the list of words with the same key;
For a given query word, sort the letters and lookup the Python dictionary.

E.g. the allowed words are "one", "two", "three", "neo". Store as

&quot;eno&quot;: [&quot;one&quot;, &quot;neo&quot;]
&quot;otw&quot;: [&quot;two&quot;]
&quot;eehrt&quot;: [&quot;three&quot;]

Now the anagrams of "eon" -> "eno" are "one", "neo".

Note that the preprocessed dictionary is only about twice as large as the initial set of words. And the preprocessing time will remain reasonable as all words have to be input to the set/dictionary anyway (sorting the letters can be done fairly efficiently by histogram sort).

答案3

得分: 0

以下是您要翻译的内容：

如我在评论中所说，我只会通过字典进行一次遍历，看单词是否具有相同的字母：

from collections import Counter
TARGET_WORD = "trace"
target_pattern = Counter(TARGET_WORD.lower())

for word in open("dictionary.txt", "rt").read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

如果要处理多个单词，您可以预处理您的字典：

dictionary = [Counter(x.lower()) for x in open("dictionary.txt", "rt").read().splitlines()]

for word in word_list:  # word_list是您的目标单词列表。
    if Counter(word.lower()) in dictionary:
        print(word)

（编辑：我的第一个预处理程序试图将预处理的字典转换为集合，但经测试，Counter不可哈希，正如@KellyBundy立即注意到的那样。使用tuple(sorted(Counter(x).items())来进行条目和比较可能更有效，但我将其留给读者作为练习。）

英文:

As I said in my comment, I'd just do one pass through the dictionary to see if the word had the same letters:

from collections import Counter
TARGET_WORD = &quot;trace&quot;
target_pattern = Counter(TARGET_WORD.lower())

for word in open(&quot;dictionary.txt&quot;,&quot;rt&quot;).read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

If you were to process multiple words, you can preprocess your dictionary:

dictionary = [Counter(x.lower()) for x in open(&quot;dictionary.txt&quot;,&quot;rt&quot;).read().splitlines()]

for word in word_list:  # word_list is your list of target words.
    if Counter(word.lower()) in dictionary:
        print (word)

(Edit: My first preprocessing program tried to make the preprocessed dictionary a set, but tested the code and Counters aren't hashable as @KellyBundy caught right away. A set would probably be more efficient, so you could use tuple(sorted(Counter(x).items()) for the entries and the comparison, but I'll leave that as an exercise for the reader.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中生成给定单词的所有字谜

问题

答案1

答案2

答案3

在Python Selenium中按顺序拖动表格行。

如何修复一个质数三角形

查询速度太慢； prefetch_related 无法解决问题

如何使用Python高效生成字符串’AABBBCCCCCDDDDDEEEEE’的所有不重复排列？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论