英文:
No error, but empty dataframe resulting from webscraping real estate website with R
问题
I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .
我的目标是从immobilienscout24.de网站上爬取内容,我使用了这里找到的指导 https://smac-group.github.io/ds/section-web-scraping-in-r.html 。
My code runs without an error, but all I retrieve is an emtpy data.frame ("No data available in table").
我的代码运行没有错误,但我只获得一个空的数据框("表中无可用数据")。
I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?
我尝试了一些来自stackoverflow的示例,但最终也得到了空的数据框。为什么会这样?有人可以帮助我从上面提到的网站中爬取内容吗?
I am interested in the real estates address, number of rooms, price etc.
我对房地产的地址、房间数量、价格等信息感兴趣。
Here is the code:
以下是代码:
library("xml2")
real_estate <- read_html(
"https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
)
library("rvest")
library("magrittr")
flats <- real_estate %>%
html_nodes(".result-list-entry__data") %>%
html_text()
flats_df <- data.frame(
rooms = gsub(pattern = " room.*", "", flats) %>%
as.numeric(),
price = gsub(".*€ |.—.*", "", flats) %>%
gsub(pattern = ",", replacement = "") %>%
as.numeric()
)
我已经尝试了一些不同网站(同一域名)的代码,但仍然获得了一个空的数据框。而且行数也没有意义,应该大约有120行...
pacman::p_load(rvest, dplyr)
real <- data.frame()
for(page in seq (from = 1, to = 6, by = 1)){
link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
code <- read_html(link)
Adresse <- code %>% html_node(".font-normal")%>% html_text()
Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()
real=rbind(real,data.frame(
Adresse = ifelse(length(Adresse)==0,NA,Adresse),
Preis = ifelse(length(Preis)==0,NA,Preis),
Qm = ifelse(length(Qm)==0,NA,Qm),
Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))
write.csv(real, "DatensatzImmobilien.csv")
}
输出:
Adresse Preis Qm Zimmer
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA>
<details>
<summary>英文:</summary>
I want to scrape content from immobilienscout24.de and I used an instruction found here https://smac-group.github.io/ds/section-web-scraping-in-r.html .
My code runs without an error, but all I retrieve is an emtpy data.frame ("No data available in table").
I tried examples from stackoverflow, but I also end up with empty data frames. Why is that? Can someone please help me scrape the content from the website mentioned above?
I am interested in the real estates address, number of rooms, price etc.
Here is the code:
library("xml2")
real_estate <- read_html(
"https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
)
library("rvest")
library("magrittr")
flats <- real_estate %>%
html_nodes(".result-list-entry__data") %>%
html_text()
flats_df <- data.frame(
rooms = gsub(pattern = " room.*", "", flats) %>%
as.numeric(),
price = gsub(".€ |.—.", "", flats) %>%
gsub(pattern = ",", replacement = "") %>%
as.numeric()
)
I have tried some other code with a different website (same domain) and again, I retrieve an emtpy dataframe. Also the number of rows make no sense, there should be about 120...
pacman::p_load(rvest, dplyr)
real <- data.frame()
for(page in seq (from = 1, to = 6, by = 1)){
link <- paste0("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/freudenstadt-kreis/wohnung-kaufen?sorting=2&pagenumber=",page)
code <- read_html(link)
Adresse <- code %>% html_node(".font-normal")%>% html_text()
Preis <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(1) .font-highlight")%>% html_text()
Qm <- code %>% html_node(".result-list-entry__primary-criterion:nth-child(2) .font-highlight")%>% html_text()
Zimmer <- code %>% html_node(".font-tabular .onlyLarge")%>% html_text()
real=rbind(real,data.frame(
Adresse = ifelse(length(Adresse)==0,NA,Adresse),
Preis = ifelse(length(Preis)==0,NA,Preis),
Qm = ifelse(length(Qm)==0,NA,Qm),
Zimmer = ifelse(length(Zimmer)==0,NA,Zimmer)))
write.csv(real, "DatensatzImmobilien.csv")
}
Output:
Adresse Preis Qm Zimmer
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA>
</details>
# 答案1
**得分**: 1
根据我理解的最佳方式,您需要使用RSelenium或RDCOMClient。您必须等待页面加载完成。以下是一个示例:
```R
library(RDCOMClient)
url <- "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$document()
doc$parentWindow()$execScript("window.scrollBy(0, window.innerHeight);", "javascript")
web_Obj <- doc$querySelector('#resultListItems')
info <- strsplit(web_Obj$innerText(), "\r\n")[[1]]
info[info != ""][1 : 49]
希望这能帮助您。如果需要更多帮助,请告诉我。
英文:
To the best of my understanding, you need to use RSelenium or RDCOMClient. You have to wait for the page to load. Here is an example :
library(RDCOMClient)
url <- "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Wangels;23758;Testorf;;;&geocoordinates=54.24565;10.77587;10.0&sorting=2&enteredFrom=result_list"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$document()
doc$parentWindow()$execScript("window.scrollBy(0, window.innerHeight);", "javascript")
web_Obj <- doc$querySelector('#resultListItems')
info <- strsplit(web_Obj$innerText(), "\r\n")[[1]]
info[info != ""][1 : 49]
[1] "1/11"
[2] "NEU"
[3] "NEUGroße Eigentumswohnung mit Gartennutzung"
[4] "Schönwalde am Bungsberg, Ostholstein (Kreis)"
[5] "169.000 €Kaufpreis96 m²Wohnfläche4 Zi.4Zi."
[6] "Balkon/Terrasse"
[7] "Einbauküche"
[8] "Garten"
[9] "..."
[10] "Herr Christian Ilgautz"
[11] "Gläser Immobilien Neustadt"
[12] "1/9"
[13] "NEU"
[14] "Nur hier gefunden"
[15] "NEUKapitalanlage - Vermietete ETW in Oldenburg i H."
[16] "Oldenburg in Holstein, Ostholstein (Kreis)"
[17] "99.000 €Kaufpreis79 m²Wohnfläche3 Zi.3Zi."
[18] "Nur hier gefunden"
[19] "Balkon/Terrasse"
[20] "Einbauküche"
[21] "..."
[22] "Heike Steinwender"
[23] "Steinwender Immobilien"
[24] "1/14"
[25] "Grundriss"
[26] "Eigentumswohnung mit Blick ins Grüne. Nur 5 Minuten zur Ostsee."
[27] "Blekendorf, Plön (Kreis)"
[28] "159.000 €Kaufpreis42 m²Wohnfläche2 Zi.2Zi."
[29] "Balkon/Terrasse"
[30] "Einbauküche"
[31] "Oliver Bonow"
[32] "Premium Immobilien Nord GmbH"
[33] "1/9"
[34] "360°-Ansicht"
[35] "Großzügige Wohnung mit Balkon sucht neue Eigentümer!"
[36] "Oldenburg in Holstein, Ostholstein (Kreis)"
[37] "178.500 €Kaufpreis78,74 m²Wohnfläche4 Zi.4Zi."
[38] "Balkon/Terrasse"
[39] "Keller"
[40] "Herr Tobias Schirmer"
[41] "Postbank Immobilien GmbH - FG Kiel"
[42] "1/11"
[43] "Stilsicher kernsanierte 4-Zimmer-Eigentumswohnung im Herzen von Schönwalde, unweit der Ostsee!"
[44] "Schönwalde am Bungsberg, Ostholstein (Kreis)"
[45] "269.000 €Kaufpreis96 m²Wohnfläche4 Zi.4Zi."
[46] "Balkon/Terrasse"
[47] "Einbauküche"
[48] "Garten"
[49] "..."
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论