无错误,但使用R进行网页抓取时导致空数据框。

huangapple go评论70阅读模式
英文:

No error, but empty dataframe resulting from webscraping real estate website with R

问题

I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .

我的目标是从immobilienscout24.de网站上爬取内容,我使用了这里找到的指导 https://smac-group.github.io/ds/section-web-scraping-in-r.html

My code runs without an error, but all I retrieve is an emtpy data.frame ("No data available in table").

我的代码运行没有错误,但我只获得一个空的数据框("表中无可用数据")。

I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?

我尝试了一些来自stackoverflow的示例,但最终也得到了空的数据框。为什么会这样?有人可以帮助我从上面提到的网站中爬取内容吗?

I am interested in the real estates address, number of rooms, price etc.

我对房地产的地址、房间数量、价格等信息感兴趣。

Here is the code:

以下是代码:

library("xml2")

real_estate <- read_html(
  "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
)

library("rvest")
library("magrittr")
flats <- real_estate %>%
  html_nodes(".result-list-entry__data") %>%
  html_text()

flats_df <- data.frame(
  rooms = gsub(pattern = " room.*", "", flats) %>%
    as.numeric(),
  price = gsub(".*€ |.—.*", "", flats) %>%
    gsub(pattern = ",", replacement = "") %>%
    as.numeric()
)

我已经尝试了一些不同网站(同一域名)的代码,但仍然获得了一个空的数据框。而且行数也没有意义,应该大约有120行...

pacman::p_load(rvest, dplyr) 

real <- data.frame()

for(page in seq (from = 1, to = 6, by = 1)){
  link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
  code <- read_html(link) 
  Adresse <- code %>% html_node(".font-normal")%>% html_text()
  Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
  Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
  Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()
  
  real=rbind(real,data.frame(
    Adresse = ifelse(length(Adresse)==0,NA,Adresse),
   Preis = ifelse(length(Preis)==0,NA,Preis),
    Qm = ifelse(length(Qm)==0,NA,Qm),
    Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))
  
  write.csv(real, "DatensatzImmobilien.csv")   
}

输出:

  Adresse Preis   Qm Zimmer
1    <NA>  <NA> <NA>   <NA>
2    <NA>  <NA> <NA>   <NA>
3    <NA>  <NA> <NA>   <NA>
4    <NA>  <NA> <NA>   <NA>
5    <NA>  <NA> <NA>   <NA>
6    <NA>  <NA> <NA>   <NA>

<details>
<summary>英文:</summary>

I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .

My code runs without an error, but all I retrieve is an emtpy data.frame (&quot;No data available in table&quot;).



I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?

I am interested in the real estates address, number of rooms, price etc.

Here is the code:


library("xml2")

real_estate <- read_html(
"https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
)

library("rvest")
library("magrittr")
flats <- real_estate %>%
html_nodes(".result-list-entry__data") %>%
html_text()

flats_df <- data.frame(
rooms = gsub(pattern = " room.*", "", flats) %>%
as.numeric(),
price = gsub(".€ |.—.", "", flats) %>%
gsub(pattern = ",", replacement = "") %>%
as.numeric()
)



I have tried some other code with a different website (same domain) and again, I retrieve an emtpy dataframe. Also the number of rows make no sense, there should be about 120...

pacman::p_load(rvest, dplyr)

real <- data.frame()

for(page in seq (from = 1, to = 6, by = 1)){
link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
code <- read_html(link)
Adresse <- code %>% html_node(".font-normal")%>% html_text()
Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()

real=rbind(real,data.frame(
Adresse = ifelse(length(Adresse)==0,NA,Adresse),
Preis = ifelse(length(Preis)==0,NA,Preis),
Qm = ifelse(length(Qm)==0,NA,Qm),
Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))

write.csv(real, "DatensatzImmobilien.csv")
}

    

Output:
Adresse Preis Qm Zimmer
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA>



</details>


# 答案1
**得分**: 1

根据我理解的最佳方式,您需要使用RSelenium或RDCOMClient。您必须等待页面加载完成。以下是一个示例:

```R
library(RDCOMClient)
url <- "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)

doc <- IEApp$document()
doc$parentWindow()$execScript("window.scrollBy(0, window.innerHeight);", "javascript")
web_Obj <- doc$querySelector('#resultListItems')
info <- strsplit(web_Obj$innerText(), "\r\n")[[1]]
info[info != ""][1 : 49]

希望这能帮助您。如果需要更多帮助,请告诉我。

英文:

To the best of my understanding, you need to use RSelenium or RDCOMClient. You have to wait for the page to load. Here is an example :

library(RDCOMClient)
url &lt;- &quot;https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&amp;geocoordinates=54.24565;10.77587;10.0&amp;sorting=2&amp;enteredFrom=result_list&quot;
IEApp &lt;- COMCreate(&quot;InternetExplorer.Application&quot;)
IEApp[[&#39;Visible&#39;]] &lt;- TRUE
IEApp$Navigate(url)
Sys.sleep(5)

doc &lt;- IEApp$document()
doc$parentWindow()$execScript(&quot;window.scrollBy(0, window.innerHeight);&quot;, &quot;javascript&quot;)
web_Obj &lt;- doc$querySelector(&#39;#resultListItems&#39;)
info &lt;- strsplit(web_Obj$innerText(), &quot;\r\n&quot;)[[1]]
info[info != &quot;&quot;][1 : 49]

  [1] &quot;1/11&quot;                                                                                               
  [2] &quot;NEU&quot;                                                                                                
  [3] &quot;NEUGro&#223;e Eigentumswohnung mit Gartennutzung&quot;                                                        
  [4] &quot;Sch&#246;nwalde am Bungsberg, Ostholstein (Kreis)&quot;                                                       
  [5] &quot;169.000 €Kaufpreis96 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;                                                         
  [6] &quot;Balkon/Terrasse&quot;                                                                                    
  [7] &quot;Einbauk&#252;che&quot;                                                                                        
  [8] &quot;Garten&quot;                                                                                             
  [9] &quot;...&quot;                                                                                                
 [10] &quot;Herr Christian Ilgautz&quot;                                                                             
 [11] &quot;Gl&#228;ser Immobilien Neustadt&quot;                                                                         
 [12] &quot;1/9&quot;                                                                                                
 [13] &quot;NEU&quot;                                                                                                
 [14] &quot;Nur hier gefunden&quot;                                                                                  
 [15] &quot;NEUKapitalanlage - Vermietete ETW in Oldenburg i H.&quot;                                                
 [16] &quot;Oldenburg in Holstein, Ostholstein (Kreis)&quot;                                                         
 [17] &quot;99.000 €Kaufpreis79 m&#178;Wohnfl&#228;che3 Zi.3Zi.&quot;                                                          
 [18] &quot;Nur hier gefunden&quot;                                                                                  
 [19] &quot;Balkon/Terrasse&quot;                                                                                    
 [20] &quot;Einbauk&#252;che&quot;                                                                                        
 [21] &quot;...&quot;                                                                                                
 [22] &quot;Heike Steinwender&quot;                                                                                  
 [23] &quot;Steinwender Immobilien&quot;                                                                             
 [24] &quot;1/14&quot;                                                                                               
 [25] &quot;Grundriss&quot;                                                                                          
 [26] &quot;Eigentumswohnung mit Blick ins Gr&#252;ne. Nur 5 Minuten zur Ostsee.&quot;                                    
 [27] &quot;Blekendorf, Pl&#246;n (Kreis)&quot;                                                                           
 [28] &quot;159.000 €Kaufpreis42 m&#178;Wohnfl&#228;che2 Zi.2Zi.&quot;                                                         
 [29] &quot;Balkon/Terrasse&quot;                                                                                    
 [30] &quot;Einbauk&#252;che&quot;                                                                                        
 [31] &quot;Oliver Bonow&quot;                                                                                       
 [32] &quot;Premium Immobilien Nord GmbH&quot;                                                                       
 [33] &quot;1/9&quot;                                                                                                
 [34] &quot;360&#176;-Ansicht&quot;                                                                                       
 [35] &quot;Gro&#223;z&#252;gige Wohnung mit Balkon sucht neue Eigent&#252;mer!&quot;                                               
 [36] &quot;Oldenburg in Holstein, Ostholstein (Kreis)&quot;                                                         
 [37] &quot;178.500 €Kaufpreis78,74 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;                                                      
 [38] &quot;Balkon/Terrasse&quot;                                                                                    
 [39] &quot;Keller&quot;                                                                                             
 [40] &quot;Herr Tobias Schirmer&quot;                                                                               
 [41] &quot;Postbank Immobilien GmbH - FG Kiel&quot;                                                                 
 [42] &quot;1/11&quot;                                                                                               
 [43] &quot;Stilsicher kernsanierte 4-Zimmer-Eigentumswohnung im Herzen von Sch&#246;nwalde, unweit der Ostsee!&quot;     
 [44] &quot;Sch&#246;nwalde am Bungsberg, Ostholstein (Kreis)&quot;                                                       
 [45] &quot;269.000 €Kaufpreis96 m&#178;Wohnfl&#228;che4 Zi.4Zi.&quot;                                                         
 [46] &quot;Balkon/Terrasse&quot;                                                                                    
 [47] &quot;Einbauk&#252;che&quot;                                                                                        
 [48] &quot;Garten&quot;                                                                                             
 [49] &quot;...&quot;       

huangapple
  • 本文由 发表于 2023年5月25日 22:49:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76333601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定