Push and Pop madness – Python未找到目标项

huangapple go评论67阅读模式
英文:

Push and Pop madness - Python not finding the target item

问题

I'm trying to scrape apartments.com with some help from Selenium and stuff it all into a CSV. My current method is to use driver.find_elements() to locate every element with the .placard CSS tag, but there are some annoying ads thrown into the mix also holding the .placard and .reinforcement tags.

I need to get these out of List, and my first idea is to Pop the offending elements. I can locate them with a presence_of_all_elements_located query, but my Pop strategy is plain not working.

英文:

I'm trying to scrape apartments.com with some help from Selenium and stuff it all into a CSV. My current method is to use '''driver.find_elements()''' to locate every element with the '''.placard''' CSS tag, but there are some annoying ads thrown into the mix also holding the '''.placard''' and '''.reinforcement''' tags.

I need to get these out of List, and my first idea is to Pop the offending elements. I can locate them with a presence_of_all_elements_located query, but my Pop strategy is plain not working.

Please see the code. Thank you

# Wait for the first page of listings to load
apartmentsTOBEDELETED = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
print(len(apartmentsTOBEDELETED))
#output is 1 .reinforcement ad

apartments = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))
print(len(apartments))
#output is 26, including one "apartment" (the ad) with the ".placard" and ".reinforcement" CSS tag

#And now try to remove any apartment with the ".reinforcement" placard from the original list
i = 0
while i < len(apartments):
    if apartments[i].find_elements(By.CSS_SELECTOR, ".reinforcement"):
        print("found the element TO BE DELETED")
        apartments.pop(i)
    else:
        i += 1

**# Print the number of apartments in the filtered list**
print(len(apartments))
##output is 26. No pop.

答案1

得分: 1

在最后,解决方案如下所建议的。我创建了3个列表,一个包含所有的“广告”,另一个包含所有的“公寓”,第三个包含所有的“筛选后的公寓”。

对于公寓列表中的每个公寓,只要数据没有与广告列表中的数据重叠,就将其添加到筛选后的公寓列表中。

代码看起来像这样:

apartmentsTOBEDELETED = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
print(len(apartmentsTOBEDELETED))
print("找到一个要移除的项目")  # 输出为1,通常为.reinforcement广告
# 捕获所有公寓
apartments = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))

# 创建一个新的公寓列表,其中不包含任何与'apartmentsTOBEDELETED'中的项相关的引用
filtered_apartments = []
for apartment in apartments:
    contains_deleted_item = False
    for deleted_item in apartmentsTOBEDELETED:
        if deleted_item.get_attribute('class') in apartment.get_attribute('class'):
            contains_deleted_item = True
            break
    if not contains_deleted_item:
        filtered_apartments.append(apartment)

# 打印筛选后的公寓列表中的公寓数量
print(len(filtered_apartments))
print("现在我们剩下这么多公寓")
apartments = filtered_apartments  # 将其分配为新列表
英文:

In the end, the solution was as suggested. I created 3 lists, one with all the "ads" and one with all the "apartments", and a third for all the "filtered apartments"

For each apt in the apartments list, it was added into the list of filtered apartments, so long as none of the data overlapped with the data from the ad list

the code looked like this
`

        apartmentsTOBEDELETED = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".reinforcement")))
        print(len(apartmentsTOBEDELETED))
        print("found an item to remove")# output is 1 .reinforcement ad - typically
        #capture all apartments
        apartments = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".placard")))

        # Create a new list of apartments that do not contain any reference to items in 'apartmentsTOBEDELETED'
        filtered_apartments = []
        for apartment in apartments:
            contains_deleted_item = False
            for deleted_item in apartmentsTOBEDELETED:
                if deleted_item.get_attribute('class') in apartment.get_attribute('class'):
                    contains_deleted_item = True
                    break
            if not contains_deleted_item:
                filtered_apartments.append(apartment)

        # Print the number of apartments in the filtered list
        print(len(filtered_apartments))
        print("now we have this many apartments left")
        apartments = filtered_apartments  # assign this to be the new list

`

huangapple
  • 本文由 发表于 2023年5月7日 13:12:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76192278.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定