NLTagger将每个单词标记为OtherWord,命名方案为Other。

huangapple go评论75阅读模式
英文:

NLTagger tags every word as OtherWord and name scheme as Other

问题

我尝试了 Apple 的示例

import NaturalLanguage

let text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    // 获取最可能的标签,并在它是命名实体时打印它。
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

    // 获取与其相关的多个可能的标签及其关联的置信度得分。
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)

    return true
}

但它将所有的名称标签都返回为“其他”。我还尝试了另一个示例,将句子标记为词汇类别,并且它也将每个单词标记为“其他词”:

var text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

print("语言", tagger.dominantLanguage)

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    // 获取最可能的标签,并在它是命名实体时打印它。
    if let tag = tag {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

    return true
}

我尝试了这个问题的答案,通过设置语言正字法,但它没有起作用:

//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)

有人知道为什么会这样吗?

顺便说一下,我的 Xcode 版本是截至今天最新的版本,14.3。

英文:

I tried Apple's own example:

import NaturalLanguage

let text = &quot;The American Red Cross was established in Washington, D.C., by Clara Barton.&quot;

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..&lt;text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in 
    // Get the most likely tag, and print it if it&#39;s a named entity.
    if let tag = tag, tags.contains(tag) {
        print(&quot;\(text[tokenRange]): \(tag.rawValue)&quot;)
    }
        
    // Get multiple possible tags with their associated confidence scores.
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)
        
   return true
}

But it returns all name tags as Other. I also tried another example of tagging the sentence with lexical class, and it also tags every word as OtherWord:

var text = &quot;The American Red Cross was established in Washington, D.C., by Clara Barton.&quot;

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

print(&quot;language&quot;, tagger.dominantLanguage)

tagger.enumerateTags(in: text.startIndex..&lt;text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    // Get the most likely tag, and print it if it&#39;s a named entity.
    if let tag = tag {
        print(&quot;\(text[tokenRange]): \(tag.rawValue)&quot;)
    }

   return true
}

I tried the answer for this question by setting language orthography but it didn't help:

//tagger.setOrthography(NSOrthography(dominantScript: &quot;Latn&quot;, languageMap: [&quot;Latn&quot;: [&quot;en&quot;]]), range: text.startIndex..&lt;text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: &quot;en-US&quot;), range: text.startIndex..&lt;text.endIndex)

Anybody has a clue why is it like this?

By the way, my Xcode version is the latest one as of today, 14.3.

答案1

得分: 1

这似乎是 Xcode 14.3 的回归问题。我下载了 Xcode 14.2,NLTagger 正确地适用于.nameType.lexicalClass 标记。

Xcode 14.3 中的这个回归问题也影响了 NLEmbedding。例如,下面的代码在 14.2 中正确地获取了单词的邻居,但在 Xcode 14.3 中返回了 nil 嵌入:

if let embedding = NLEmbedding.wordEmbedding(for: .english) {
  print("found embedding")
  print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
  print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
  print("no embedding found")
}
英文:

This seems to be a regression of Xcode 14.3. I downloaded Xcode 14.2 and the NLTagger correctly works for .nameType and .lexicalClass tagging.

This regression in Xcode 14.3 also affects NLEmbedding. For example, the following code gets word neighbors correctly in 14.2 but returns nil embedding in Xcode 14.3:

if let embedding = NLEmbedding.wordEmbedding(for: .english) {
  print(&quot;found embedding&quot;)
  print(&quot;embeddings for family: \(embedding.neighbors(for: &quot;family&quot;, maximumCount: 3))&quot;)
  print(&quot;embeddings for science: \(embedding.neighbors(for: &quot;science&quot;, maximumCount: 3))&quot;)
} else {
  print(&quot;no embedding found&quot;)
}

huangapple
  • 本文由 发表于 2023年5月8日 00:22:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定