如何找到对象之间的关系

huangapple go评论100阅读模式
英文:

How to find Relationships between Objects

问题

对于有类似问题的人(在找到解决方案后编写):

正如你可能根据下面的答案注意到的那样,这个问题有很多不同的解决方案。我选择了Evan的答案,因为它对我来说是最容易实现到我的代码中的。然而,根据我尝试的情况,其他答案也都有效。@SalvadorDali链接了这个Kaggle页面,非常有趣,如果你感兴趣,我推荐阅读一下。还提到了Prolog作为可能的解决方案,我对它不熟悉,但如果你已经了解它,那可能值得考虑。此外,如果你只是想获取可用的代码,下面有工作的JavaScript和Python示例。然而,每个示例对解决方案都有不同的方法,我不确定哪个最有效(可以自行测试)。

进一步的方法/阅读材料:

http://en.wikipedia.org/wiki/Breadth-first_search

https://stackoverflow.com/questions/8966488/prolog-and-ancestor-relationship?lq=1

https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-2-word-vectors


对于标题的混乱,我无法找到一个合适的方式来表达我的问题,如果有更好的想法,请告诉我。

由于我在描述我的问题时遇到了困难,我将尽可能详细地解释我的目标和代码:

注意:我的代码是Go语言编写的,但如果你用其他语言回答也可以,如果你有任何问题,我会尽快回答

基本上,我有一个看起来像这样的“Word”对象数组:

type Word struct{
     text     string
     synonyms []string
}

这是数组中的4个单词的示例:

  []Word{
      {text: "cat" synonyms: ["feline", "kitten", "mouser"]}
      {text: "kitten" synonyms: ["kitty", "kit"]} 
      {text: "kit" synonyms: ["pack", "bag", "gear"]}
      {text: "computer" synonyms: ["electronics", "PC", "abacus"]}
   }

我的挑战是编写一个方法来测试两个单词之间的关系。当然,像“cat”和“kitten”这样的两个单词之间的测试在上面的示例中很容易。我可以检查“cat”的同义词列表,并测试是否包含“kitten”。使用如下代码:

areWordsRelated(word1 Word, word2 Word) bool{
    for _, elem := range word1.synonyms{
         if elem == word2.text{
             return true
         }
    }
    return false
}

然而,我无法找出如何测试更远的关系。

例如:

areWordsRelated("cat","pack") //应返回true
//因为“cat”与“kitten”相关,而“kitten”与“pack”相关
areWordsRelated("cat", "computer") //应返回false

我尝试过递归,但是我所有的尝试似乎都不起作用。任何示例代码(我的代码是Go语言编写的,但Python、Java或JavaScript也可以),伪代码或者只是解释都将非常有帮助。

英文:

For People With A Similar Question (written after finding a solution):

This problem, as you might notice according to the answers below, has a lot of different solutions. I only chose Evan's because it was the easiest one for me implement into my own code. However, from what I tried, every other answer also worked. @SalvadorDali linked this Kaggle page which was definitely interesting and I reccomend reading if you are interested. Prolog was also brought up as a possible solution, I'm unfamiliar with it, but if you already know it -- it's probably worth considering. Also, if you just want to get code to use there are working Javascript and Python examples below. However, each one had a different approach to the solution and I'm not sure which is most effecient (feel free to test it yourself).

For further approaches/reading:

http://en.wikipedia.org/wiki/Breadth-first_search

https://stackoverflow.com/questions/8966488/prolog-and-ancestor-relationship?lq=1

https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-2-word-vectors


Sorry for the confusing title, I can't figure out a way to properly word my question -- any better ideas are welcome.

Because I'm having such a difficult time describing my question, I'll try to explain my goal and code as much as needed:

Note: my code here is Go, but I'd be happy with answers in other languages as well, if you have any questions I'll try to answer as quick as possible

Basically, I have an array of "Word" objects that look like this:

type Word struct{
     text     string
     synonyms []string
}

This is an example of 4 words within the array:

  []Word{
      {text: "cat" synonyms: ["feline", "kitten", "mouser"]}
      {text: "kitten" synonyms: ["kitty", "kit"]} 
      {text: "kit" synonyms: ["pack", "bag", "gear"]}
      {text: "computer" synonyms: ["electronics", "PC", "abacus"]}
   }

My challenge is writing a method to test for a relationship between 2 words. Of course, testing between 2 words like "cat" and "kitten" would be easy with the example above. I could just check "Cat"s list of synonyms and test to see if it contains "kitten." With code like this:

areWordsRelated(word1 Word, word2 Word) bool{
    for _, elem := range word1.synonyms{
         if elem == word2.text{
             return true
         }
    }
    return false
}

However, I can't figure out how to test for a more distant relationship.

For example:

areWordsRelated("cat","pack") //should return true 
//because "cat" is related to "kitten" which is related to "pack"
areWordsRelated("cat", "computer") //should return false

I tried to do it recursively, but all my attempts don't seem to work. Any example code (My code is in Go, but Python, Java, or Javascript are also fine), pseudocode or just explanations would be really great.

答案1

得分: 3

一个Python解决方案:

class Word:

   # 以名称为键的单词字典
   word_dict = {}

   def __init__(self, name, synonyms):
      self.name = name
      self.synonyms = synonyms

      # 更新字典
      Word.word_dict[name] = self
      for s in synonyms:
         if not s in Word.word_dict:
            Word.word_dict
展开收缩
= Word(s, [])
def isAncestor(self, other): if other in self.synonyms: return True for s in self.synonyms: if Word.word_dict
展开收缩
.isAncestor(other):
return True return False def areWordsRelated(word1, word2): if not word1 in Word.word_dict or not word2 in Word.word_dict: return False return Word.word_dict[word1].isAncestor(word2) or Word.word_dict[word2].isAncestor(word1) words = [] words.append(Word("cat", ["feline", "kitten", "mouser"])) words.append(Word("kitten", ["kitty", "kit"])) words.append(Word("kit", ["patck", "bag", "gear"])) words.append(Word("computer", ["electronics", "PC", "abacus"])) print(areWordsRelated("cat", "kit")) print(areWordsRelated("kit", "cat")) print(areWordsRelated("cat", "computer")) print(areWordsRelated("dog", "computer"))

输出:

True
True
False
False
英文:

A Python solution:

class Word:

   # Dictionary of Words, keyed by name.
   word_dict = {}

   def __init__(self, name, synonyms):
      self.name = name
      self.synonyms = synonyms

      # Update the dictionary.
      Word.word_dict[name] = self
      for s in synonyms:
         if not s in Word.word_dict:
            Word.word_dict
展开收缩
= Word(s, []) def isAncestor(self, other): if other in self.synonyms: return True for s in self.synonyms: if Word.word_dict
展开收缩
.isAncestor(other): return True return False def areWordsRelated(word1, word2): if not word1 in Word.word_dict or not word2 in Word.word_dict: return False return Word.word_dict[word1].isAncestor(word2) or Word.word_dict[word2].isAncestor(word1) words = [] words.append(Word("cat", ["feline", "kitten", "mouser"])) words.append(Word("kitten", ["kitty", "kit"])) words.append(Word("kit", ["patck", "bag", "gear"])) words.append(Word("computer", ["electronics", "PC", "abacus"])) print(areWordsRelated("cat", "kit")) print(areWordsRelated("kit", "cat")) print(areWordsRelated("cat", "computer")) print(areWordsRelated("dog", "computer"))

Output:

<!-- language: lang-none -->

True
True
False
False

答案2

得分: 3

如果你给我一些反馈,我可以进行编辑,因为它并不完全符合你的要求,但是基本上是正确的。我将编辑并提供一个技术解释,说明需要更改哪些部分以满足你的具体示例。

package main

import "fmt"

func main() {
    words := []Word{
        {text: "cat", synonyms: []string{"feline", "kitten", "mouser"}},
        {text: "kitten", synonyms: []string{"kitty", "kit"}},
        {text: "kit", synonyms: []string{"pack", "bag", "gear"}},
        {text: "computer", synonyms: []string{"electronics", "PC", "abacus"}},
    }

    fmt.Println(areWordsRelated(words, words[0], words[2]))
    fmt.Println(areWordsRelated(words, words[0], words[3]))
}

type Word struct {
    text     string
    synonyms []string
}

func areWordsRelated(words []Word, word1, word2 Word) bool {
    for _, elem := range word1.synonyms {
        if elem == word2.text {
            return true
        } else {
            for _, word := range words {
                if word.text == elem {
                    if areWordsRelated(words, word, word2) {
                        return true
                    }
                }
            }
        }
    }
    return false
}

编辑:这段代码并不完全符合你的要求,因为它没有将"pack"和"cat"之间的关联表示为一个实际的单词对象,并且我定义了该方法接收一个对象作为word2(只是根据你的示例进行的工作)。我可以将其改为接收一个字符串,以便在返回之前检查"kit"的同义词数组中是否存在"pack",但是思路是相同的。以下是算法的高级解释。

遍历同义词,如果不匹配,则在原始集合中找到该Word对象,并将其作为第一个参数调用自身。这将递归地穷尽每条路径,直到找到匹配项,或者没有剩余的路径,此时你将在循环外返回false。上面的代码在Go Playground中运行,并正确返回true\nfalse。请注意,递归调用在if语句内部进行,以防止过早返回false(这也是性能优化,因为一旦找到true,我们就立即返回,而不是继续递归路径)。

https://play.golang.org/p/gCeY0SthU1

英文:

If you give me some feedback on this I can edit it because it doesn't do exactly what you asked but it is the jist. I'll edit with a technical explanation of what has to be changed to meet your exact example.

package main

import &quot;fmt&quot;

func main() {
	words := []Word{
      		{text: &quot;cat&quot;, synonyms: []string{&quot;feline&quot;, &quot;kitten&quot;, &quot;mouser&quot;}},
      		{text: &quot;kitten&quot;, synonyms: []string{&quot;kitty&quot;, &quot;kit&quot;}} ,
      		{text: &quot;kit&quot;, synonyms: []string{&quot;pack&quot;, &quot;bag&quot;, &quot;gear&quot;}},
      		{text: &quot;computer&quot;, synonyms: []string{&quot;electronics&quot;, &quot;PC&quot;, &quot;abacus&quot;}},
   	}
	
	fmt.Println(areWordsRelated(words, words[0], words[2]))
    fmt.Println(areWordsRelated(words, words[0], words[3]))
}

type Word struct{
     text     string
     synonyms []string
}

func areWordsRelated(words []Word, word1, word2 Word) bool {
	for _, elem := range word1.synonyms{
		if elem == word2.text{
			return true
		} else {
			for _, word := range words {
				if word.text == elem {
					if (areWordsRelated(words, word, word2)) {
						return true
					}
				}
			}
		}
	}
	return false
}

EDIT: This doesn't do quite what you asked because it doesn't make the connection between "pack" and "cat" as pack is not represented by an actual word object and I defined the method to receive word2 as an object (just working off your example). I could instead make that a string so it can check for "pack" in the synonyms array of "kit" before returning but the idea is the same none the less... Here's a high level explanation of the algorithm.

Iterate the synonyms, if it isn't a match, find that Word object back in the original collection and call myself with it as the first argument. This will recursively exhaust every path until it finds a match, or there are none left in which case you're outside the loop returning false. The code above runs in go playground and correctly returns true\nfalse. Notice that the recursive call is made within an if to protect from returning false prematurely (also a performance enhancement because we return as soon as true is found rather than continue to recurse the paths).

https://play.golang.org/p/gCeY0SthU1

答案3

得分: 3

首先,这里并不清楚你如何定义关系。如果你的“cat”有同义词:["feline", "kitten", "mouser"],那么这是否意味着“mouser”有一个同义词“cat”呢?

根据我的理解,答案是否定的。所以这里是一个用Python实现的解决方案:

G = {
    "cat": ["feline", "kitten", "mouser"],
    "kitten": ["kitty", "kit"],
    "kit": ["pack", "bag", "gear"],
    "computer": ["electronics", "PC", "abacus"]
}

def areWordsRelated(G, w1, w2):
    if w1 == w2:
        return True

    frontier = [w1]
    checked = set()
    while len(frontier):
        el = frontier.pop()
        if el in G:
            neighbors = G[el]
            for i in neighbors:
                if i == w2:
                    return True
                if i not in checked:
                    frontier.append(i)
                    checked.add(i)

    return False

areWordsRelated(G, "cat", "pack") #true
areWordsRelated(G, "cat", "computer") #false

那么我们在这里做了什么呢?首先,你有一个图,它只是一个字典(在Go中是一个映射),它显示了你的关系(我基本上采用了你的切片)。

我们的算法就像一团霉菌一样生长,维护一个已检查元素的集合和一个当前的边界。如果边界为空(没有要探索的元素),那么这些元素之间没有连接。我们每次从边界中提取一个元素,并检查所有的邻居。如果其中任何一个是我们正在寻找的元素,那么就存在连接。否则,检查我们是否已经看到过这样的元素(如果没有,将其添加到边界和已检查集合中)。

请注意,如果你的关系以稍微不同的方式工作,你只需要修改图。

最后,如果你正在寻找一种正常的方法来查找同义词,请查看词向量算法和一个很好的Python实现。这将使你能够找到非常复杂的关系,甚至可以在没有明确指定这种关系的情况下找到“California”和“Golden Gate”之间的关系。

英文:

First of all it is not clear how do you define relationship here. If your
"cat" has synonyms: ["feline", "kitten", "mouser"], does that mean that "mouser" has a synonym "cat".

Based on my understanding the answer is no. So here is a solution in python:

G = {
	&quot;cat&quot;: [&quot;feline&quot;, &quot;kitten&quot;, &quot;mouser&quot;],
	&quot;kitten&quot;: [&quot;kitty&quot;, &quot;kit&quot;],
	&quot;kit&quot;: [&quot;pack&quot;, &quot;bag&quot;, &quot;gear&quot;],
	&quot;computer&quot;: [&quot;electronics&quot;, &quot;PC&quot;, &quot;abacus&quot;]
}

def areWordsRelated(G, w1, w2):
	if w1 == w2:
		return True

	frontier = [w1]
	checked = set()
	while len(frontier):
		el = frontier.pop()
		if el in G:
			neighbors = G[el]
			for i in neighbors:
				if i == w2:
					return True
				if i not in checked:
					frontier.append(i)
					checked.add(i)

	return False

areWordsRelated(G, &quot;cat&quot;, &quot;pack&quot;) #true
areWordsRelated(G, &quot;cat&quot;, &quot;computer&quot;) #false

So what are we doing here? At first you have your graph, which is just dictionary (map in go) which shows your relationship (I basically took your slice).

Our algorithm grows like a mold, maintaining a set of checked elements and a current frontier. If frontier is empty (nothing to explore, then the elements are not connected). We extract one element at a time from a frontier and check all the neighbors. If any of them is the element we are looking for - then there is a connection. Otherwise check if we already have seen such element (and if not than add it to the frontier and to the set of checked).

Notice that if your relationship works in a slightly different way, all you need is to modify a graph.


One last word, if you are looking for a normal way to find synonyms take a look at word to vector algorithm and a nice implementation in python. This will allow you to find really complicated relationship even between words like finding that California and Golden Gate are related even without having this relationship specified.

答案4

得分: 2

这是一个用JavaScript编写的递归算法示例,其中还加入了一些jQuery以便更容易搜索数组。它可能可以进行优化,但应该可以给你一个起点。

$(function() {
  var words = [{
    text: "cat",
    synonyms: ["feline", "kitten", "mouser"]
  }, {
    text: "kitten",
    synonyms: ["kitty", "kit"]
  }, {
    text: "kit",
    synonyms: ["pack", "bag", "gear"]
  }, {
    text: "computer",
    synonyms: ["electronics", "PC", "abacus"]
  }];

  console.log(areWordsRelated('cat', 'pack', words));
  console.log(areWordsRelated('cat', 'rack', words));
});

function areWordsRelated(parentWord, childWord, list) {
  var parentWordItems = $.grep(list, function(element) {
    return element.text === parentWord;
  });

  if (parentWordItems.length === 0) {
    return false
  } else {
    var parentWordItem = parentWordItems[0];
    var remainingItems = $.grep(list, function(element) {
      return element.text !== parentWord;
    });
    if (parentWordItem.synonyms.indexOf(childWord) >= 0) {
      return true;
    } else {
      for (var i = 0; i < parentWordItem.synonyms.length; i++) {
        var synonym = parentWordItem.synonyms[i];
        if (areWordsRelated(synonym, childWord, remainingItems)) {
          return true;
        }
      }
      return false;
    }
  }
}

这段代码是一个递归算法,用于判断两个单词是否相关。它通过搜索一个包含单词及其同义词的数组来进行判断。在示例中,它判断了'cat'和'pack'以及'cat'和'rack'是否相关,并将结果打印到控制台上。

英文:

Here's a sample recursive algorithim written in JavaScript, with some jQuery thrown in to make searching the array easier. It probably could be optimized, but should give you something to start with.

<!-- begin snippet: js hide: false console: true -->

<!-- language: lang-js -->

$(function() {
  var words = [{
    text: &quot;cat&quot;,
    synonyms: [&quot;feline&quot;, &quot;kitten&quot;, &quot;mouser&quot;]
  }, {
    text: &quot;kitten&quot;,
    synonyms: [&quot;kitty&quot;, &quot;kit&quot;]
  }, {
    text: &quot;kit&quot;,
    synonyms: [&quot;pack&quot;, &quot;bag&quot;, &quot;gear&quot;]
  }, {
    text: &quot;computer&quot;,
    synonyms: [&quot;electronics&quot;, &quot;PC&quot;, &quot;abacus&quot;]
  }];

  console.log(areWordsRelated(&#39;cat&#39;, &#39;pack&#39;, words));
  console.log(areWordsRelated(&#39;cat&#39;, &#39;rack&#39;, words));
});

function areWordsRelated(parentWord, childWord, list) {
  var parentWordItems = $.grep(list, function(element) {
    return element.text === parentWord;
  });

  if (parentWordItems.length === 0) {
    return false
  } else {
    var parentWordItem = parentWordItems[0];
    var remainingItems = $.grep(list, function(element) {
      return element.text !== parentWord;
    });
    if (parentWordItem.synonyms.indexOf(childWord) &gt;= 0) {
      return true;
    } else {
      for (var i = 0; i &lt; parentWordItem.synonyms.length; i++) {
        var synonym = parentWordItem.synonyms[i];
        if (areWordsRelated(synonym, childWord, remainingItems)) {
          return true;
        }
      }
      return false;
    }
  }
}

<!-- language: lang-html -->

&lt;script src=&quot;https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js&quot;&gt;&lt;/script&gt;

<!-- end snippet -->

答案5

得分: 2

你正在查看一个二度关系(与你已经知道如何找到的“简单”一度关系相对),这意味着你需要做以下两件事之一:

(1)存储量大的解决方案需要维护一个单独的二度关系列表,然后在该列表中进行搜索(较长的列表)- 这需要维护关于单词关系的(可能是更多)数据。例如,如果你有10000个单词,每个单词大约有10个同义词,那就是存储了100,000个一度关系。但是然后你会有大约十亿个二度关系。所以很快就会变得难以处理。

在这种情况下,每个条目看起来像这样:
{text: "cat" synonyms: ["feline", "kitten", "mouser"] seconds:["pack",...]}
...然后你只需编写一个单独的函数,该函数将在“synonyms”或“seconds”中检查关系。

(2)编程解决方案仍然只需要存储一度关系,然后进行嵌套循环。

在这种情况下:

//// 这个函数检查一度关系
areWordsRelated1(word1 Word, word2 Word) bool{
for _, elem := range word1.synonyms{
if elem == word2.text{
return true
}
}
return false
}

//// 这个函数检查二度关系,它首先检查一度关系,如果没有,然后尝试在word2的子节点上使用一度关系函数,
//// 在放弃并返回false之前
areWordsRelated2(word1 Word, word2 Word) bool{
for _, elem1 := range word1.synonyms{
if elem1 == word2.text{
return true
} else {
for _, elem2 := range elem1.synonyms{
if areWordsRelated1(word1, elem2) {
return true
}
}
}
return false
}

注意:我注意到在你的示例数据中,“cat”与“kitten”相关联,但是“kitten”并不与“cat”相关联。

英文:

You're looking at a 2nd degree relationship (as opposed to the 'easy' 1st place example you already know how to find), meaning you have to do one of two things:

(1) The storage-heavy solution requires maintaining a separate list of 2nd-degree relationships and then simply do a search within that (longer) list - this requires maintaining (potentially MUCH) more data about word relationships. For example, if you have 10000 words, and each has roughly 10 synonyms, that's 100,000 first-degree relationships stored. But then you'd have something like a billion 2nd-degree relationships. So of course that gets unwieldy quickly.

In that case, each entry looks like this:
{text: "cat" synonyms: ["feline", "kitten", "mouser"] seconds:["pack",...]}
... and you simply write a separate function that will check for relationships in EITHER 'synonyms' or 'seconds'.

(2) The programmatic solution would be to still only store the 1st-degree relationships and then do an embedded loop.

In this case:

//// This checks for 1st degree relationship
areWordsRelated1(word1 Word, word2 Word) bool{
    for _, elem := range word1.synonyms{
         if elem == word2.text{
             return true
         }
    }
    return false
}

//// This checks for 2nd degree by checking 1st and then, if not, 
//// then trying the 1st degree function on the children of word2
//// before giving up and returning false
areWordsRelated2(word1 Word, word2 Word) bool{
    for _, elem1 := range word1.synonyms{
         if elem1 == word2.text{
             return true
         } else {
         for _, elem2 := range elem1.synonyms{
             if areWordsRelated1(word1, elem2) {
                 return true
             }
         }
    }
    return false
}

NOTE: I noticed that in your sample data, "cat" was related to "kitten", but "kitten" was not conversely related to "cat".

huangapple
  • 本文由 发表于 2015年6月10日 03:33:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/30741231.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定