无错误,但使用R进行网页抓取时导致空数据框。

huangapple go评论101阅读模式
英文:

No error, but empty dataframe resulting from webscraping real estate website with R

问题

I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .

我的目标是从immobilienscout24.de网站上爬取内容,我使用了这里找到的指导 https://smac-group.github.io/ds/section-web-scraping-in-r.html

My code runs without an error, but all I retrieve is an emtpy data.frame ("No data available in table").

我的代码运行没有错误,但我只获得一个空的数据框("表中无可用数据")。

I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?

我尝试了一些来自stackoverflow的示例,但最终也得到了空的数据框。为什么会这样?有人可以帮助我从上面提到的网站中爬取内容吗?

I am interested in the real estates address, number of rooms, price etc.

我对房地产的地址、房间数量、价格等信息感兴趣。

Here is the code:

以下是代码:

  1. library("xml2")
  2. real_estate <- read_html(
  3. "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
  4. )
  5. library("rvest")
  6. library("magrittr")
  7. flats <- real_estate %>%
  8. html_nodes(".result-list-entry__data") %>%
  9. html_text()
  10. flats_df <- data.frame(
  11. rooms = gsub(pattern = " room.*", "", flats) %>%
  12. as.numeric(),
  13. price = gsub(".*€ |.—.*", "", flats) %>%
  14. gsub(pattern = ",", replacement = "") %>%
  15. as.numeric()
  16. )

我已经尝试了一些不同网站(同一域名)的代码,但仍然获得了一个空的数据框。而且行数也没有意义,应该大约有120行...

  1. pacman::p_load(rvest, dplyr)
  2. real <- data.frame()
  3. for(page in seq (from = 1, to = 6, by = 1)){
  4. link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
  5. code <- read_html(link)
  6. Adresse <- code %>% html_node(".font-normal")%>% html_text()
  7. Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
  8. Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
  9. Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()
  10. real=rbind(real,data.frame(
  11. Adresse = ifelse(length(Adresse)==0,NA,Adresse),
  12. Preis = ifelse(length(Preis)==0,NA,Preis),
  13. Qm = ifelse(length(Qm)==0,NA,Qm),
  14. Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))
  15. write.csv(real, "DatensatzImmobilien.csv")
  16. }

输出:

  1. Adresse Preis Qm Zimmer
  2. 1 <NA> <NA> <NA> <NA>
  3. 2 <NA> <NA> <NA> <NA>
  4. 3 <NA> <NA> <NA> <NA>
  5. 4 <NA> <NA> <NA> <NA>
  6. 5 <NA> <NA> <NA> <NA>
  7. 6 <NA> <NA> <NA> <NA>
  1. <details>
  2. <summary>英文:</summary>
  3. I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .
  4. My code runs without an error, but all I retrieve is an emtpy data.frame (&quot;No data available in table&quot;).
  5. I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?
  6. I am interested in the real estates address, number of rooms, price etc.
  7. Here is the code:

library("xml2")

real_estate <- read_html(
"https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
)

library("rvest")
library("magrittr")
flats <- real_estate %>%
html_nodes(".result-list-entry__data") %>%
html_text()

flats_df <- data.frame(
rooms = gsub(pattern = " room.*", "", flats) %>%
as.numeric(),
price = gsub(".€ |.—.", "", flats) %>%
gsub(pattern = ",", replacement = "") %>%
as.numeric()
)

  1. I have tried some other code with a different website (same domain) and again, I retrieve an emtpy dataframe. Also the number of rows make no sense, there should be about 120...

pacman::p_load(rvest, dplyr)

real <- data.frame()

for(page in seq (from = 1, to = 6, by = 1)){
link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
code <- read_html(link)
Adresse <- code %>% html_node(".font-normal")%>% html_text()
Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()

real=rbind(real,data.frame(
Adresse = ifelse(length(Adresse)==0,NA,Adresse),
Preis = ifelse(length(Preis)==0,NA,Preis),
Qm = ifelse(length(Qm)==0,NA,Qm),
Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))

write.csv(real, "DatensatzImmobilien.csv")
}

Output:
Adresse Preis Qm Zimmer
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA>

  1. </details>
  2. # 答案1
  3. **得分**: 1
  4. 根据我理解的最佳方式,您需要使用RSelenium或RDCOMClient。您必须等待页面加载完成。以下是一个示例:
  5. ```R
  6. library(RDCOMClient)
  7. url <- "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
  8. IEApp <- COMCreate("InternetExplorer.Application")
  9. IEApp[['Visible']] <- TRUE
  10. IEApp$Navigate(url)
  11. Sys.sleep(5)
  12. doc <- IEApp$document()
  13. doc$parentWindow()$execScript("window.scrollBy(0, window.innerHeight);", "javascript")
  14. web_Obj <- doc$querySelector('#resultListItems')
  15. info <- strsplit(web_Obj$innerText(), "\r\n")[[1]]
  16. info[info != ""][1 : 49]

希望这能帮助您。如果需要更多帮助,请告诉我。

英文:

To the best of my understanding, you need to use RSelenium or RDCOMClient. You have to wait for the page to load. Here is an example :

  1. library(RDCOMClient)
  2. url &lt;- &quot;https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&amp;geocoordinates=54.24565;10.77587;10.0&amp;sorting=2&amp;enteredFrom=result_list&quot;
  3. IEApp &lt;- COMCreate(&quot;InternetExplorer.Application&quot;)
  4. IEApp[[&#39;Visible&#39;]] &lt;- TRUE
  5. IEApp$Navigate(url)
  6. Sys.sleep(5)
  7. doc &lt;- IEApp$document()
  8. doc$parentWindow()$execScript(&quot;window.scrollBy(0, window.innerHeight);&quot;, &quot;javascript&quot;)
  9. web_Obj &lt;- doc$querySelector(&#39;#resultListItems&#39;)
  10. info &lt;- strsplit(web_Obj$innerText(), &quot;\r\n&quot;)[[1]]
  11. info[info != &quot;&quot;][1 : 49]
  12. [1] &quot;1/11&quot;
  13. [2] &quot;NEU&quot;
  14. [3] &quot;NEUGro&#223;e Eigentumswohnung mit Gartennutzung&quot;
  15. [4] &quot;Sch&#246;nwalde am Bungsberg, Ostholstein (Kreis)&quot;
  16. [5] &quot;169.000 Kaufpreis96 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;
  17. [6] &quot;Balkon/Terrasse&quot;
  18. [7] &quot;Einbauk&#252;che&quot;
  19. [8] &quot;Garten&quot;
  20. [9] &quot;...&quot;
  21. [10] &quot;Herr Christian Ilgautz&quot;
  22. [11] &quot;Gl&#228;ser Immobilien Neustadt&quot;
  23. [12] &quot;1/9&quot;
  24. [13] &quot;NEU&quot;
  25. [14] &quot;Nur hier gefunden&quot;
  26. [15] &quot;NEUKapitalanlage - Vermietete ETW in Oldenburg i H.&quot;
  27. [16] &quot;Oldenburg in Holstein, Ostholstein (Kreis)&quot;
  28. [17] &quot;99.000 Kaufpreis79 m&#178;Wohnfl&#228;che3 Zi.3Zi.&quot;
  29. [18] &quot;Nur hier gefunden&quot;
  30. [19] &quot;Balkon/Terrasse&quot;
  31. [20] &quot;Einbauk&#252;che&quot;
  32. [21] &quot;...&quot;
  33. [22] &quot;Heike Steinwender&quot;
  34. [23] &quot;Steinwender Immobilien&quot;
  35. [24] &quot;1/14&quot;
  36. [25] &quot;Grundriss&quot;
  37. [26] &quot;Eigentumswohnung mit Blick ins Gr&#252;ne. Nur 5 Minuten zur Ostsee.&quot;
  38. [27] &quot;Blekendorf, Pl&#246;n (Kreis)&quot;
  39. [28] &quot;159.000 Kaufpreis42 m&#178;Wohnfl&#228;che2 Zi.2Zi.&quot;
  40. [29] &quot;Balkon/Terrasse&quot;
  41. [30] &quot;Einbauk&#252;che&quot;
  42. [31] &quot;Oliver Bonow&quot;
  43. [32] &quot;Premium Immobilien Nord GmbH&quot;
  44. [33] &quot;1/9&quot;
  45. [34] &quot;360&#176;-Ansicht&quot;
  46. [35] &quot;Gro&#223;z&#252;gige Wohnung mit Balkon sucht neue Eigent&#252;mer!&quot;
  47. [36] &quot;Oldenburg in Holstein, Ostholstein (Kreis)&quot;
  48. [37] &quot;178.500 Kaufpreis78,74 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;
  49. [38] &quot;Balkon/Terrasse&quot;
  50. [39] &quot;Keller&quot;
  51. [40] &quot;Herr Tobias Schirmer&quot;
  52. [41] &quot;Postbank Immobilien GmbH - FG Kiel&quot;
  53. [42] &quot;1/11&quot;
  54. [43] &quot;Stilsicher kernsanierte 4-Zimmer-Eigentumswohnung im Herzen von Sch&#246;nwalde, unweit der Ostsee!&quot;
  55. [44] &quot;Sch&#246;nwalde am Bungsberg, Ostholstein (Kreis)&quot;
  56. [45] &quot;269.000 Kaufpreis96 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;
  57. [46] &quot;Balkon/Terrasse&quot;
  58. [47] &quot;Einbauk&#252;che&quot;
  59. [48] &quot;Garten&quot;
  60. [49] &quot;...&quot;

huangapple
  • 本文由 发表于 2023年5月25日 22:49:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76333601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定