问题

我尝试了 Apple 的示例：

import NaturalLanguage

let text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    // 获取最可能的标签，并在它是命名实体时打印它。
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

    // 获取与其相关的多个可能的标签及其关联的置信度得分。
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)

    return true
}

但它将所有的名称标签都返回为“其他”。我还尝试了另一个示例，将句子标记为词汇类别，并且它也将每个单词标记为“其他词”：

var text = "美国红十字会由克拉拉·巴顿在华盛顿特区创立。"

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

print("语言", tagger.dominantLanguage)

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    // 获取最可能的标签，并在它是命名实体时打印它。
    if let tag = tag {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

    return true
}

我尝试了这个问题的答案，通过设置语言正字法，但它没有起作用：

//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)

有人知道为什么会这样吗？

顺便说一下，我的 Xcode 版本是截至今天最新的版本，14.3。

英文:

I tried Apple's own example:

import NaturalLanguage

let text = &quot;The American Red Cross was established in Washington, D.C., by Clara Barton.&quot;

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..&lt;text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in 
    // Get the most likely tag, and print it if it&#39;s a named entity.
    if let tag = tag, tags.contains(tag) {
        print(&quot;\(text[tokenRange]): \(tag.rawValue)&quot;)
    }
        
    // Get multiple possible tags with their associated confidence scores.
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)
        
   return true
}

But it returns all name tags as Other. I also tried another example of tagging the sentence with lexical class, and it also tags every word as OtherWord:

var text = &quot;The American Red Cross was established in Washington, D.C., by Clara Barton.&quot;

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

print(&quot;language&quot;, tagger.dominantLanguage)

tagger.enumerateTags(in: text.startIndex..&lt;text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    // Get the most likely tag, and print it if it&#39;s a named entity.
    if let tag = tag {
        print(&quot;\(text[tokenRange]): \(tag.rawValue)&quot;)
    }

   return true
}

I tried the answer for this question by setting language orthography but it didn't help:

//tagger.setOrthography(NSOrthography(dominantScript: &quot;Latn&quot;, languageMap: [&quot;Latn&quot;: [&quot;en&quot;]]), range: text.startIndex..&lt;text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: &quot;en-US&quot;), range: text.startIndex..&lt;text.endIndex)

Anybody has a clue why is it like this?

By the way, my Xcode version is the latest one as of today, 14.3.

答案1

得分: 1

这似乎是 Xcode 14.3 的回归问题。我下载了 Xcode 14.2，NLTagger 正确地适用于.nameType和.lexicalClass 标记。

Xcode 14.3 中的这个回归问题也影响了 NLEmbedding。例如，下面的代码在 14.2 中正确地获取了单词的邻居，但在 Xcode 14.3 中返回了 nil 嵌入：

if let embedding = NLEmbedding.wordEmbedding(for: .english) {
  print("found embedding")
  print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
  print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
  print("no embedding found")
}

英文:

This seems to be a regression of Xcode 14.3. I downloaded Xcode 14.2 and the NLTagger correctly works for .nameType and .lexicalClass tagging.

This regression in Xcode 14.3 also affects NLEmbedding. For example, the following code gets word neighbors correctly in 14.2 but returns nil embedding in Xcode 14.3:

if let embedding = NLEmbedding.wordEmbedding(for: .english) {
  print(&quot;found embedding&quot;)
  print(&quot;embeddings for family: \(embedding.neighbors(for: &quot;family&quot;, maximumCount: 3))&quot;)
  print(&quot;embeddings for science: \(embedding.neighbors(for: &quot;science&quot;, maximumCount: 3))&quot;)
} else {
  print(&quot;no embedding found&quot;)
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

NLTagger将每个单词标记为OtherWord，命名方案为Other。

问题

答案1

如何在Swift中防止对图像的特定区域进行着色？

“UIHostingView未能呈现已经在呈现的视图”

CCCrypto解密：少一个块

从类中返回视图，使用协议。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论