问题

我试图提取Environmental Data Initiative（EDI）网站上Andrews LTER站点的147个数据包的唯一ID（```Package Id```）。然而，我无法确定哪个```rvest::html_nodes()```包含了Package Id。有任何想法吗？

我一直在尝试：

```R
# 加载所需库
library(rvest)
library(dplyr)

# 定义网站的URL
url <- "http://portal.edirepository.org:80/nis/simpleSearch?defType=edismax&amp;q=*:*&amp;fq=-scope:ecotrends&amp;fq=-scope:lter-landsat*&amp;fq=scope:(knb-lter-and)&amp;fl=id,packageid,title,author,organization,pubdate,coordinates&amp;debug=false"

# 从网站读取HTML内容
page <- read_html(url)

# 提取相关信息
packageIds <- page %>%
  html_nodes("td[class='Package Id']") %>%
  html_text() # 返回一个空的字符串


![在这里输入图像描述][1]

英文:

I am trying to extract the unique ID (Package Id) for each of the 147 data packages on the Environmental Data Initiative (EDI) website for site Andrews LTER. However, I can't
figure out which rvest::html_nodes() holds the Package Id. Any ideas?

What I've been trying:

# Load required libraries
library(rvest)
library(dplyr)

# Define the URL of the website
url &lt;- &quot;http://portal.edirepository.org:80/nis/simpleSearch?defType=edismax&amp;q=*:*&amp;fq=-scope:ecotrends&amp;fq=-scope:lter-landsat*&amp;fq=scope:(knb-lter-and)&amp;fl=id,packageid,title,author,organization,pubdate,coordinates&amp;debug=false&quot;

# Read the HTML content from the website
page &lt;- read_html(url)

# Extract the relevant information
packageIds &lt;- page %&gt;%
  html_nodes(&quot;td[class=&#39;Package Id&#39;]&quot;) %&gt;%
  html_text() # results in an empty character string

答案1

得分: 1

你可以尝试像这样做。这有点棘手，因为我需要将原始查询追加为 &start=0&rows=150，以加载完整的表格。

然后，您可以使用 html_table 返回内容，这在这种情况下是一个列表。然后选择实际的表格列表元素并选择“Package Id”列。

# 定义网站的URL
url <- "https://portal.edirepository.org/nis/simpleSearch?defType=edismax&amp;q=*:*&amp;fq=-scope:ecotrends&amp;fq=-scope:lter-landsat*&amp;fq=scope:(knb-lter-and)&amp;fl=id,packageid,title,author,organization,pubdate,coordinates&amp;debug=false&amp;start=0&amp;rows=150"

# 从网站读取HTML内容
page <- read_html(url)

# 提取相关信息
page %>%
  html_table() %>%
  .[[4]] %>%
  select(`Package Id  ▵▿`) %>%
  rename(package_id = `Package Id  ▵▿`)

# 一个数据框: 147 行 × 1 列
   package_id    
   <chr>               
 1 knb-lter-and.2719.6 
 2 knb-lter-and.2720.8 
 3 knb-lter-and.2721.6 
 4 knb-lter-and.2722.6 
 5 knb-lter-and.2725.6 
 6 knb-lter-and.2726.6 
 7 knb-lter-and.4528.10
 8 knb-lter-and.4541.3 
 9 knb-lter-and.4544.4 
10 knb-lter-and.4547.5 
# … 还有 137 行

英文:

You could try something like this. It was a bit tricky since I needed to append the original query with &start=0&rows=150 in order to load the full table.

Then you can use html_table to return contents which in this case was a list. Then select the actual table list element and select the Package Id col.

# Define the URL of the website
url &lt;- &quot;https://portal.edirepository.org/nis/simpleSearch?defType=edismax&amp;q=*:*&amp;fq=-scope:ecotrends&amp;fq=-scope:lter-landsat*&amp;fq=scope:(knb-lter-and)&amp;fl=id,packageid,title,author,organization,pubdate,coordinates&amp;debug=false&amp;start=0&amp;rows=150&quot;

# Read the HTML content from the website
page &lt;- read_html(url)

# Extract the relevant information
page %&gt;%
  html_table() %&gt;%
  .[[4]] %&gt;%
  select(`Package Id  ▵▿`) %&gt;%
  rename(package_id = `Package Id  ▵▿`)

# A tibble: 147 &#215; 1
   package_id    
   &lt;chr&gt;               
 1 knb-lter-and.2719.6 
 2 knb-lter-and.2720.8 
 3 knb-lter-and.2721.6 
 4 knb-lter-and.2722.6 
 5 knb-lter-and.2725.6 
 6 knb-lter-and.2726.6 
 7 knb-lter-and.4528.10
 8 knb-lter-and.4541.3 
 9 knb-lter-and.4544.4 
10 knb-lter-and.4547.5 
# … with 137 more rows

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用rvest从网页上提取表格中的唯一ID。

问题

答案1

OpenAI ChatGPT (GPT-3.5) API错误 400: “‘user’ 不是类型为 ‘object’ 的对象”

如何使用dplyr合并具有NA的列？

如何根据字符串向量选择数据框的列，进行精确匹配？

`pivot_wider` 在 R 中引发 “! 无法子集化不存在的列。” 错误。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论