英文:
NLTagger tags every word as OtherWord and name scheme as Other
问题
我尝试了 Apple 的示例:
import NaturalLanguage
let text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
// 获取最可能的标签,并在它是命名实体时打印它。
if let tag = tag, tags.contains(tag) {
print("\(text[tokenRange]): \(tag.rawValue)")
}
// 获取与其相关的多个可能的标签及其关联的置信度得分。
let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
print(hypotheses)
return true
}
但它将所有的名称标签都返回为“其他”。我还尝试了另一个示例,将句子标记为词汇类别,并且它也将每个单词标记为“其他词”:
var text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
print("语言", tagger.dominantLanguage)
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
// 获取最可能的标签,并在它是命名实体时打印它。
if let tag = tag {
print("\(text[tokenRange]): \(tag.rawValue)")
}
return true
}
我尝试了这个问题的答案,通过设置语言正字法,但它没有起作用:
//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)
有人知道为什么会这样吗?
顺便说一下,我的 Xcode 版本是截至今天最新的版本,14.3。
英文:
I tried Apple's own example:
import NaturalLanguage
let text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
// Get the most likely tag, and print it if it's a named entity.
if let tag = tag, tags.contains(tag) {
print("\(text[tokenRange]): \(tag.rawValue)")
}
// Get multiple possible tags with their associated confidence scores.
let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
print(hypotheses)
return true
}
But it returns all name tags as Other. I also tried another example of tagging the sentence with lexical class, and it also tags every word as OtherWord:
var text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
print("language", tagger.dominantLanguage)
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
// Get the most likely tag, and print it if it's a named entity.
if let tag = tag {
print("\(text[tokenRange]): \(tag.rawValue)")
}
return true
}
I tried the answer for this question by setting language orthography but it didn't help:
//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)
Anybody has a clue why is it like this?
By the way, my Xcode version is the latest one as of today, 14.3.
答案1
得分: 1
这似乎是 Xcode 14.3 的回归问题。我下载了 Xcode 14.2,NLTagger 正确地适用于.nameType和.lexicalClass 标记。
Xcode 14.3 中的这个回归问题也影响了 NLEmbedding。例如,下面的代码在 14.2 中正确地获取了单词的邻居,但在 Xcode 14.3 中返回了 nil 嵌入:
if let embedding = NLEmbedding.wordEmbedding(for: .english) {
print("found embedding")
print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
print("no embedding found")
}
英文:
This seems to be a regression of Xcode 14.3. I downloaded Xcode 14.2 and the NLTagger correctly works for .nameType and .lexicalClass tagging.
This regression in Xcode 14.3 also affects NLEmbedding. For example, the following code gets word neighbors correctly in 14.2 but returns nil embedding in Xcode 14.3:
if let embedding = NLEmbedding.wordEmbedding(for: .english) {
print("found embedding")
print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
print("no embedding found")
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论