算法来确定单词的超集?

huangapple go评论71阅读模式
英文:

Algorithm to determine word supersets?

问题

我玩一个基于单词的瓷砖游戏,其中玩家尝试制作越来越长的单词,有效操作包括添加一个或多个字母以及重新排列已经放置的字母。

例如,HISSHIVHIVESCHIVESANCHOVIES

我想创建一个程序,输出从输入单词可以制作的潜在单词。简洁地说,它会找到所有包含输入单词中的所有字母至少一次的单词,无论顺序如何。有在线的Scrabble求解器具有类似但不完全相同的功能。

我首先尝试的方法是创建一个英语语言的Trie(或DAFSA?)数据结构。
然后,对数据结构的顶层TrieNode条目中的每一个,调用类似于以下伪代码的函数:

results: List[str] = []

def Search(
  trieNode: TrieNode,
  prerequisites: MultiSet[Character],
  letterStack: Stack[Character]):
    # 为这个单词设置上下文
    letterStack.push(trieNode.letterValue)
    removed: Character? = prerequisites.removeIfPresentOrNil(trieNode.letterValue)

    # 如果有效则添加单词
    if trieNode.isWord and prerequisites.isEmpty():
       results.add(letterStack.join())
    
    # 递归处理可能的后续情况
    for TrieNode n in trieNode.children:
       Search(n, prerequisites, letterStack)

    # 在弹出堆栈帧之前清理上下文
    letterStack.pop()
    prerequisites.addIfNotNil(removed)

这个方法,即深度优先搜索整个英语语言,是否是解决这个问题的最佳方式?如果这个问题已经有了最优解决方案,我不想重复造轮子。

英文:

I play a word based tile game in which one tries to make words of increasingly long lengths where valid operations include adding one or more letters and rearranging what is already in place.

For example HISSHIVHIVESCHIVESANCHOVIES.

I wanted to create a program that outputs potential words you could make from the input word. Concisely, it finds all words that contain all the letters at least once from the input word in any order. There are online Scrabble solvers that have a similar but not identical functionality.

The first approach I had was to create a Trie (or DAFSA?) of the English language.
Then for each of the top level TrieNode entries of the data structure, invoke a function similar to this pseudocode

results : List<String> = []

fn Search(
  trieNode : TrieNode,
  prerequisites : MultiSet<Character>,
  letterStack : Stack<Character>):
    # setup context for this word
    letterStack.push(trieNode.letterValue)
    let removed : Character? = prerequisites.removeIfPresentOrNil(trieNode.letterValue)

    # add word if valid
    if trieNode.isWord and prerequisites.isEmpty():
       results.add(letterStack.join())
    
    # recurse on down-the-line possibilities
    for TrieNode n in trieNode.children:
       Search(n, prerequisites, wordStack)

    # teardown context before stack frame is popped
    letterStack.pop()
    prerequisites.addIfNotNil(removed)

Is this methodology, a depth first search across the entire English language, the best way to go about this problem? I don't want to reinvent the wheel if this is already a solved problem with an optimal solution.

答案1

得分: 1

伪代码

我不确定我是否理解了您的伪代码。以防万一,我将写出我心中的解决方案,使用树来表示这个问题(这可能与您的伪代码相同):

  1. 对于每个单词,您创建一个由单词中的字母排序而成的键(例如,“hello”变为“ehllo”)。
  2. 您创建一个字典:键 --> 具有该键的单词列表。
  3. 您使用这些键创建一棵树。
  4. 当用户提供一个单词时,您获取该单词的键,移动到相应的节点。结果 = 所有子节点中的所有单词。

效率

使用这个算法,我认为您实际上可以证明它在时间复杂性方面是最优的,因为:

  • 移动到树中的正确节点需要O(1)的时间(因为英语中最长的单词有29个字符,树不深)。
  • 从这个节点获取所有子节点中的所有单词的成本为O(N_WORD_SOLUTION)。
  • 因此,总时间复杂性为O(N_WORD_SOLUTION)。

知道您要输出一个包含N_WORD_SOLUTION个单词的列表,您不可能有比O(N_WORD_SOLUTION)更好的解决方案。

注意事项

  • 这个算法在不考虑构建树的时间的情况下是最优的。但树应该在构建一次后重复使用。
  • 我认为最好避免递归调用,您可以在没有递归的情况下编写相同的逻辑。
英文:

Pseudo code

I am not sure I understood your pseudo code. Just in case, i will write what I have in mind using a tree for this problem (it might be what you wrote in your pseudo code):

  1. for each word you create a key made of letters of word sorted (for instance "hello" gives "ehllo")
  2. you create a dictionary: key --> list of words with this key
  3. you create a tree using the keys
  4. when a user gives a word, you get the key of the word, you move in the tree to corresponding node. result = All words in children nodes

Efficiency

With this algorithm, I think you can actually demonstrate that it is optimal in terms of time complexity, because:

  • moving to the correct node in the tree takes O(1) (because the longest word in english has 29 characters, the tree is not deep)
  • From this node, getting all words in all children nodes costs O(N_WORD_SOLUTION)
  • So total time complexity is O(N_WORD_SOLUTION)

Knowing that you want to output a list that contains N_WORD_SOLUTION words, you can not have a better solution than O(N_WORD_SOLUTION)

Notes

  • This algorithm is optimal without taking into account the time to build the tree. But the tree should be build once and then reuse for any call.
  • I think it would be better to avoid recursive calls, you can write the same logic without recursion.

huangapple
  • 本文由 发表于 2023年6月29日 12:43:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76578108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定