问题

我不知道这份指南是否已经过时，或者是我做错了什么。
我刚开始使用 Nutch，并且已经将其与 Solr 集成，在终端上通过爬取/索引一些网站进行了操作。
现在我正在尝试在 Java 应用程序中使用它们，所以我一直在按照这里的教程进行操作：
https://cwiki.apache.org/confluence/display/NUTCH/RunNutchInEclipse#RunNutchInEclipse-RunningNutchinEclipse

我通过 Eclipse 下载了 Subclipse、IvyDE 和 m2e，同时我也下载了 ant，所以我应该已经具备了所有的先决条件。
教程中的 m2e 链接已经失效了，所以我在其他地方找到了它。而且事实证明，在安装 Eclipse 时已经包含了它。

当我在终端上运行 'ant eclipse' 时，我会得到一大堆错误消息。
由于字数限制，这里附上了完整错误消息的 pastebin 链接：
这里

我真的不确定我到底做错了什么。
这些指示并不是特别复杂，所以我真的不知道我在哪里出错了。

以防万一，这是我们需要修改的 nutch-site.xml 内容：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- 将站点特定的属性覆盖放在此文件中。-->

<configuration>

<property>
   <name>plugin.folders</name>
   <value>/home/user/trunk/build/plugins</value>
</property>

<!-- HTTP 属性 -->

<property>
  <name>http.agent.name</name>
  <value>MarketDataCrawler</value>
  <description>HTTP 'User-Agent' 请求头。绝对不能空白 -
  请将其设置为与您的组织唯一相关的单个词。

  注意：您还应该检查其他相关属性：

    http.robots.agents
    http.agent.description
    http.agent.url
    http.agent.email
    http.agent.version

  并根据情况设置其值。

  </description>
</property>

<property>
  <name>http.robots.agents</name>
  <value></value>
  <description>除了 'http.agent.name' 之外，机器人解析器在 robots.txt 中查找的任何其他代理。
  可以使用逗号作为分隔符提供多个代理。例如，mybot，foo-spider，bar-crawler
  
  代理的顺序无关紧要，机器人解析器将根据首次匹配机器人规则的代理做出决定。
  此外，无需将通配符（即“*”）添加到此字符串中，因为机器人解析器会智能地处理不匹配的情况。
    
  如果未指定值，默认情况下，机器人解析器会使用 HTTP 代理（即 'http.agent.name'）进行用户代理匹配。
  </description>
</property>

</configuration>

很多错误与 Ivy 有关，所以我不知道 Nutch 和在 Eclipse 中安装的插件之间的 Ivy 版本是否兼容。

英文:

I don't know if the guide is possibly outdated, or I'm doing something wrong.
I just started using nutch, and I've integrated it with solr and crawled/indexed through some websites via terminal.
Now I'm trying to use them in a java application, so I've been following the tutorial here:
https://cwiki.apache.org/confluence/display/NUTCH/RunNutchInEclipse#RunNutchInEclipse-RunningNutchinEclipse

I downloaded Subclipse, IvyDE and m2e through Eclipse, and I downloaded ant, so I should have all the prerequisites.
The m2e link through the tutorial is broken, so I found it somewhere else. It also turns out that eclipse already had it upon installation.

I get a huge list of error messages when I run 'ant eclipse' in terminal.
Due to word count, put a link to a pastebin with the entire error message
here

I'm really not sure what I'm doing wrong.
The directions aren't that complicated, so I really don't know where I'm messing up.

Just in case it's necessary, here is the nutch-site.xml that we needed to modify.

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;

&lt;property&gt;
   &lt;name&gt;plugin.folders&lt;/name&gt;
   &lt;value&gt;/home/user/trunk/build/plugins&lt;/value&gt;
&lt;/property&gt;

&lt;!-- HTTP properties --&gt;

&lt;property&gt;
  &lt;name&gt;http.agent.name&lt;/name&gt;
  &lt;value&gt;MarketDataCrawler&lt;/value&gt;
  &lt;description&gt;HTTP &#39;User-Agent&#39; request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

    http.robots.agents
    http.agent.description
    http.agent.url
    http.agent.email
    http.agent.version

  and set their values appropriately.

  &lt;/description&gt;
&lt;/property&gt;

&lt;property&gt;
  &lt;name&gt;http.robots.agents&lt;/name&gt;
  &lt;value&gt;&lt;/value&gt;
  &lt;description&gt;Any other agents, apart from &#39;http.agent.name&#39;, that the robots
  parser would look for in robots.txt. Multiple agents can be provided using 
  comma as a delimiter. eg. mybot,foo-spider,bar-crawler
  
  The ordering of agents does NOT matter and the robots parser would make 
  decision based on the agent which matches first to the robots rules.  
  Also, there is NO need to add a wildcard (ie. &quot;*&quot;) to this string as the 
  robots parser would smartly take care of a no-match situation. 
    
  If no value is specified, by default HTTP agent (ie. &#39;http.agent.name&#39;) 
  would be used for user agent matching by the robots parser. 
  &lt;/description&gt;
&lt;/property&gt;

&lt;/configuration&gt;

A ton of the errors have to do with Ivy, so I don't know if the versions of Ivy between Nutch and the plugins installed in eclipse are compatible.

答案1

得分: 0

如日志文件中所指导：

[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.pom
[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.pom

您应该在ivy/ivy.xml中使用更新的存储库URL。一种选择是在ivy.xml中将每个URL从http更改为https。

我认为您正在使用某个旧版本，否则这个问题应该已经被修复了。

英文:

As guided in the LOG file

[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.pom
[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[ivy:resolve] 	SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.pom

You should use updated repositories URL in ivy/ivy.xml. One option is to change each URL from http to https in ivy.xml.

I think, you are using some old version otherwise this issue should be fixed already.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Integrating Nutch 1.17 with Eclipse (Ubuntu 18.04)

问题

答案1

Spring未返回默认的验证错误响应。

如何删除由Java程序创建的具有错误路径的文件

什么是在具有相同行为但不同类常量的两个类之间推荐的模式？

Java中的任何类型的数组在文件中实际上是什么样子的？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论