英文:
Push and Pop madness - Python not finding the target item
问题
I'm trying to scrape apartments.com with some help from Selenium and stuff it all into a CSV. My current method is to use driver.find_elements()
to locate every element with the .placard
CSS tag, but there are some annoying ads thrown into the mix also holding the .placard
and .reinforcement
tags.
I need to get these out of List, and my first idea is to Pop the offending elements. I can locate them with a presence_of_all_elements_located query, but my Pop strategy is plain not working.
英文:
I'm trying to scrape apartments.com with some help from Selenium and stuff it all into a CSV. My current method is to use '''driver.find_elements()''' to locate every element with the '''.placard''' CSS tag, but there are some annoying ads thrown into the mix also holding the '''.placard''' and '''.reinforcement''' tags.
I need to get these out of List, and my first idea is to Pop the offending elements. I can locate them with a presence_of_all_elements_located query, but my Pop strategy is plain not working.
Please see the code. Thank you
# Wait for the first page of listings to load
apartmentsTOBEDELETED = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
print(len(apartmentsTOBEDELETED))
#output is 1 .reinforcement ad
apartments = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))
print(len(apartments))
#output is 26, including one "apartment" (the ad) with the ".placard" and ".reinforcement" CSS tag
#And now try to remove any apartment with the ".reinforcement" placard from the original list
i = 0
while i < len(apartments):
if apartments[i].find_elements(By.CSS_SELECTOR, ".reinforcement"):
print("found the element TO BE DELETED")
apartments.pop(i)
else:
i += 1
**# Print the number of apartments in the filtered list**
print(len(apartments))
##output is 26. No pop.
答案1
得分: 1
在最后,解决方案如下所建议的。我创建了3个列表,一个包含所有的“广告”,另一个包含所有的“公寓”,第三个包含所有的“筛选后的公寓”。
对于公寓列表中的每个公寓,只要数据没有与广告列表中的数据重叠,就将其添加到筛选后的公寓列表中。
代码看起来像这样:
apartmentsTOBEDELETED = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
print(len(apartmentsTOBEDELETED))
print("找到一个要移除的项目") # 输出为1,通常为.reinforcement广告
# 捕获所有公寓
apartments = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))
# 创建一个新的公寓列表,其中不包含任何与'apartmentsTOBEDELETED'中的项相关的引用
filtered_apartments = []
for apartment in apartments:
contains_deleted_item = False
for deleted_item in apartmentsTOBEDELETED:
if deleted_item.get_attribute('class') in apartment.get_attribute('class'):
contains_deleted_item = True
break
if not contains_deleted_item:
filtered_apartments.append(apartment)
# 打印筛选后的公寓列表中的公寓数量
print(len(filtered_apartments))
print("现在我们剩下这么多公寓")
apartments = filtered_apartments # 将其分配为新列表
英文:
In the end, the solution was as suggested. I created 3 lists, one with all the "ads" and one with all the "apartments", and a third for all the "filtered apartments"
For each apt in the apartments list, it was added into the list of filtered apartments, so long as none of the data overlapped with the data from the ad list
the code looked like this
`
apartmentsTOBEDELETED = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
print(len(apartmentsTOBEDELETED))
print("found an item to remove")# output is 1 .reinforcement ad - typically
#capture all apartments
apartments = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))
# Create a new list of apartments that do not contain any reference to items in 'apartmentsTOBEDELETED'
filtered_apartments = []
for apartment in apartments:
contains_deleted_item = False
for deleted_item in apartmentsTOBEDELETED:
if deleted_item.get_attribute('class') in apartment.get_attribute('class'):
contains_deleted_item = True
break
if not contains_deleted_item:
filtered_apartments.append(apartment)
# Print the number of apartments in the filtered list
print(len(filtered_apartments))
print("now we have this many apartments left")
apartments = filtered_apartments # assign this to be the new list
`
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论