IllegalArgumentException: PTBLexer: 构造函数中的无效选项键:asciiQuotes 斯坦福NLP

huangapple go评论149阅读模式
英文:

IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes Stanford NLP

问题

我正在尝试在Java中测试斯坦福词性标注器 API的Hello World示例(我在Python中使用了相同的.jar文件,它可以正常工作),用于处理法语句子。以下是我的代码:

public class TextPreprocessor {

    private static MaxentTagger tagger = new MaxentTagger("../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger");

    public static void main(String[] args) {
        
        String taggedString = tagger.tagString("Salut à tous, je suis coincé");
        System.out.println(taggedString);
    }
}

但是我遇到了以下异常:

Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread "main" java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
	at edu.stanford.nlp.process.PTBLexer.<init>(PTBLexer.java)
	at edu.stanford.nlp.process.PTBTokenizer.<init>(PTBTokenizer.java:285)
	at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
	at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.<init>(DocumentPreprocessor.java:271)
	at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
	at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)

你能帮助我吗?

英文:

I'm trying to test the Hello word of Stanford POS tagger API in Java (I used the same .jar in python and it worked well) on french sentences.
Here is my code

public class TextPreprocessor {

    private static MaxentTagger tagger=new MaxentTagger(&quot;../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger&quot;);

    public static void main(String[] args) {
        
        String taggedString = tagger.tagString(&quot;Salut &#224; tous, je suis coinc&#233;&quot;);
        System.out.println(taggedString);
    }
}

But I get the following exception:

Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread &quot;main&quot; java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
	at edu.stanford.nlp.process.PTBLexer.&lt;init&gt;(PTBLexer.java)
	at edu.stanford.nlp.process.PTBTokenizer.&lt;init&gt;(PTBTokenizer.java:285)
	at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
	at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.&lt;init&gt;(DocumentPreprocessor.java:271)
	at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
	at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)

Can you help me?

答案1

得分: 1

以下是您要翻译的内容:

你可以使用这段代码和完整的CoreNLP包:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

import java.util.*;


public class PipelineExample {

  public static String text = "Paris est la capitale de la France.";

  public static void main(String[] args) {
    // 设置管道属性
    Properties props = StringUtils.argsToProperties("-props", "french");
    // 设置要运行的注释器列表
    props.setProperty("annotators", "tokenize,ssplit,mwt,pos");
    // 构建管道
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // 创建文档对象
    CoreDocument document = pipeline.processToCoreDocument(text);
    // 显示标记
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
    }
  }

}

您可以在此处下载CoreNLP:https://stanfordnlp.github.io/CoreNLP/

请确保下载最新的法语模型。

我不确定为什么您的独立标注器示例不起作用。您使用了哪些JAR文件?

英文:

You can use this code and the full CoreNLP package:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

import java.util.*;


public class PipelineExample {

  public static String text = &quot;Paris est la capitale de la France.&quot;;

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = StringUtils.argsToProperties(&quot;-props&quot;, &quot;french&quot;);
    // set the list of annotators to run
    props.setProperty(&quot;annotators&quot;, &quot;tokenize,ssplit,mwt,pos&quot;);
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format(&quot;%s\t%s&quot;, tok.word(), tok.tag()));
    }
  }

}

You can download CoreNLP here: https://stanfordnlp.github.io/CoreNLP/

Make sure to download the latest French models.

I am not sure why your example with the standalone tagger does not work. What jars were you using?

huangapple
  • 本文由 发表于 2020年9月15日 00:44:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/63888551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定