英文:
NLTagger tags every word as OtherWord and name scheme as Other
问题
我尝试了 Apple 的示例:
import NaturalLanguage
let text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
// 获取最可能的标签,并在它是命名实体时打印它。
if let tag = tag, tags.contains(tag) {
print("\(text[tokenRange]): \(tag.rawValue)")
}
// 获取与其相关的多个可能的标签及其关联的置信度得分。
let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
print(hypotheses)
return true
}
但它将所有的名称标签都返回为“其他”。我还尝试了另一个示例,将句子标记为词汇类别,并且它也将每个单词标记为“其他词”:
var text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
print("语言", tagger.dominantLanguage)
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
// 获取最可能的标签,并在它是命名实体时打印它。
if let tag = tag {
print("\(text[tokenRange]): \(tag.rawValue)")
}
return true
}
我尝试了这个问题的答案,通过设置语言正字法,但它没有起作用:
//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)
有人知道为什么会这样吗?
顺便说一下,我的 Xcode 版本是截至今天最新的版本,14.3。
英文:
I tried Apple's own example:
import NaturalLanguage
let text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
// Get the most likely tag, and print it if it's a named entity.
if let tag = tag, tags.contains(tag) {
print("\(text[tokenRange]): \(tag.rawValue)")
}
// Get multiple possible tags with their associated confidence scores.
let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
print(hypotheses)
return true
}
But it returns all name tags as Other
. I also tried another example of tagging the sentence with lexical class, and it also tags every word as OtherWord
:
var text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
print("language", tagger.dominantLanguage)
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
// Get the most likely tag, and print it if it's a named entity.
if let tag = tag {
print("\(text[tokenRange]): \(tag.rawValue)")
}
return true
}
I tried the answer for this question by setting language orthography but it didn't help:
//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)
Anybody has a clue why is it like this?
By the way, my Xcode version is the latest one as of today, 14.3.
答案1
得分: 1
这似乎是 Xcode 14.3 的回归问题。我下载了 Xcode 14.2,NLTagger 正确地适用于.nameType
和.lexicalClass
标记。
Xcode 14.3 中的这个回归问题也影响了 NLEmbedding
。例如,下面的代码在 14.2 中正确地获取了单词的邻居,但在 Xcode 14.3 中返回了 nil 嵌入:
if let embedding = NLEmbedding.wordEmbedding(for: .english) {
print("found embedding")
print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
print("no embedding found")
}
英文:
This seems to be a regression of Xcode 14.3. I downloaded Xcode 14.2 and the NLTagger correctly works for .nameType
and .lexicalClass
tagging.
This regression in Xcode 14.3 also affects NLEmbedding
. For example, the following code gets word neighbors correctly in 14.2 but returns nil embedding in Xcode 14.3:
if let embedding = NLEmbedding.wordEmbedding(for: .english) {
print("found embedding")
print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
print("no embedding found")
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论