如何在使用Selenium(网络爬虫)加载更多按钮后附加更新的数据?

huangapple go评论91阅读模式
英文:

How to I append the updated data after load more button using selenium (Webscraping)

问题

在运行我的代码后,单击“加载更多”按钮后,我没有获取到“updated”数据。我尝试了很多次,但不确定我的代码有什么问题。在Python中,当单击加载更多按钮后,我应该有一个数据列表。然而,我只获取到了按钮单击之前附加到列表中的数据。

class RunChromeTests():
    def test(self):
        chrome_options = Options()
        chrome_options.add_argument("--disable-notifications")
        chrome_options.add_argument("-incognito")
        chrome_options.add_argument("--disable-popup-blocking")
        chrome_options.add_argument("--ignore-certificate-errors")
        chrome_options.add_argument("--disable-javascript")

        # 从https://chromedriver.chromium.org/downloads下载Chrome驱动程序,并找到驱动程序在您的计算机上的位置
        chrome_path = "路径"
        driver = webdriver.Chrome(chrome_path, options=chrome_options)
        driver.maximize_window()
        driver.implicitly_wait(10)

        final = []

        # 输入要抓取的URL,但将页面编号更改为{a}
        driver.get("https://www.burpple.com/search/sg?q=Newly+Opened&type=places")
        content = driver.page_source
        soup = BeautifulSoup(content)

        loadmore = driver.find_element_by_id("masonryViewMore-btn")
        j = 0
        final1 = []

        try:
            while loadmore.is_displayed():
                loadmore.click()
                time.sleep(2)
                lrec = soup.find_all("span", {"class": "searchVenue-header-name-name headingMedium"})
                newlist = lrec[j:]
                print(lrec)
                for rec in newlist:
                    name = rec.text
                    final1.append(name)
                print(final1)
                j = len(lrec) + 1
                time.sleep(5)
        except exceptions.StaleElementReferenceException:
            pass

chromed = RunChromeTests()
chromed.test()

输出:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)']

正确输出:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)', 'Flourish Bakehouse', 'Equate Coffee (Orchard Central)', 'TAG Espresso (Raffles City)', 'Arc-En-Ciel Patisserie', 'Enjoy Eating House & Bar (Stevens),', 'SAGE By Yasunori Doi (Orchard Plaza)', 'Hellu Coffee', 'Pestle & Mortar Society', ...]

英文:

After running my code, I am not getting the "updated" data after clicking the load more button. I tried alot of times & I am not sure what is wrong with my code. I am supposed to have a list of data after the load more button runs in python. However, I am only getting the data before button was clicked which i appended in the list.

class RunChromeTests():
    def test(self):
        chrome_options = Options()
        chrome_options.add_argument("--disable-notifications")
        chrome_options.add_argument("-incognito")
        chrome_options.add_argument("--disable-popup-blocking")
        chrome_options.add_argument("--ignore-certificate-errors")
        chrome_options.add_argument("--disable-javascript")

    
        # Download the chrome driver from https://chromedriver.chromium.org/downloads 
        # and find the driver location in your computer
        chrome_path = r"path"
        driver = webdriver.Chrome(chrome_path, options=chrome_options)
        driver.maximize_window()
        driver.implicitly_wait(10)
        
        final = []
    
    
            
        ### Enter your url to scrape but change the page number to {a} 
        driver.get("https://www.burpple.com/search/sg?q=Newly+Opened&type=places")
        content = driver.page_source
        soup = BeautifulSoup(content)
            
        loadmore = driver.find_element_by_id("masonryViewMore-btn")
        j = 0
        final1=[]
       
        try:
            while loadmore.is_displayed():
                loadmore.click()
                time.sleep(2)
                lrec = soup.find_all("span",{"searchVenue-header-name-name headingMedium"})
                #loadmore.is_displayed()
                newlist = lrec[j:]
                print(lrec)
                #print(newlist)
                for rec in newlist:
                    name = rec.text
                    #print(name)
                    final1.append(name)
                print(final1)
                j = len(lrec)+1
                #final1.append(name)
                time.sleep(5)
                #print(j)
                #
        except exceptions.StaleElementReferenceException:
            pass
chromed = RunChromeTests()
chromed.test()

Output:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)']

Correct Output:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)', 'Flourish Bakehouse','Equate Coffee (Orchard Central)', 'TAG Espresso (Raffles City)', 'Arc-En-Ciel Patisserie', 'Enjoy Eating House & Bar (Stevens),'SAGE By Yasunori Doi (Orchard Plaza)','Hellu Coffee','Pestle & Mortar Society',... ]

... -> signifies & many more

答案1

得分: 1

以下是代码部分的翻译:

final1=[]
driver.get("https://www.burpple.com/search/sg?q=Newly+Opened&type=places")
time.sleep(2)

while True:
    content = driver.page_source
    soup = BeautifulSoup(content)
    for item in soup.select("span.searchVenue-header-name-name.headingMedium"):
        if not item.text in final1:
            final1.append(item.text)
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "masonryViewMore-btn"))).click()
        time.sleep(2)
    except:
        break
print("Total number of elements: {}".format(len(final1)))
print(final1)

不需要翻译的部分已经被保留,包括代码和输出结果。

英文:

You need to get the page source in every time you clicked on Lode More button.
and keep checking whether name is exists in the list if not then added.

Use infinite loop and check if load more button exists, if so click else break the loop.

code:

final1=[]
driver.get ("https://www.burpple.com/search/sg?q=Newly+Opened&type=places")
time.sleep(2)

while True:
    content = driver.page_source
    soup = BeautifulSoup(content)
    for item in soup.select("span.searchVenue-header-name-name.headingMedium"):
        if not item.text in final1:
            final1.append(item.text)
    try:
       WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "masonryViewMore-btn"))).click()
       time.sleep(2)
    except:
        break
print("Total number of elements:  {}".format(len(final1)))
print(final1) 

you need to import below library.

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

Output:

Total number of elements:  161
['Plus Coffee Joint', 'Kotuwa', 'Smoochie Creamery', "Evan's Kitch", '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)', 'Flourish Bakehouse', 'TAG Espresso (Raffles City)', 'Equate Coffee (Orchard Central)', 'Arc-en-ciel Pâtisserie', 'Enjoy Eating House & Bar (Stevens)', 'SAGE by Yasunori Doi (Orchard Plaza)', 'Hellu Coffee', 'Pestle & Mortar Society', 'Omoté Soho (Velocity)', 'Le Matin Patisserie (ION Orchard)', 'KEK Keng Eng Kee Seafood (Tampines)', 'Mother-In-Law Egg Tart (Havelock)', 'Mono Izakaya', 'Daily Staples', "Ben's Tavern", 'FiftyFive Coffee Bar', 'Butcher’s Block', 'There Was No Coffee', 'Roemah Makan', 'Overscoop (Hougang)', 'Wong Fu Fu', 'Dim Sum Haus (Upper Weld)', 'Long Phung Vietnamese Restaurant (Chinatown)', 'Abundance (Jalan Besar)', 'Beans Factory', 'XiabuXiabu', 'B/W Bagelwich', 'Nakiryu', 'Fiamma (Capella Singapore)', 'Kei Kaisendon (Breadtalk IHQ)', 'cloud', 'Sweedy Patisserie', 'Fatty Patty (The Bedok Marketplace)', 'afterwords', 'NAN YANG DAO (Aljunied)', 'KREME', 'Pilot Kitchen', 'Peacock North Indian Cuisine (NEWest)', 'Café Natsu (Clemenceau)', 'Little Cart Noodle House', 'Anchovies & Peanuts (Golden Mile Food Centre)', 'Otto Pizza', 'Little French Fusion', 'Underdog Inn', 'Sum Dim Sum', 'Seng House', 'Lor Mak Mak (Changi Village Hawker Centre)', 'Guriru', 'The Last Scoop', 'Darkness Dessert', 'Hatsu', 'Dan Lao (Maxwell Food Centre)', 'Chong Qing Xiao Mu Deng Traditional Hot Pot (GR.iD)', 'Refuel J9 (Junction Nine)', 'Fu Xiao Xian', 'Good Chai People', 'Ginger.Lily', 'Paris Van Java', 'Hejio', 'LUNA (Joo Chiat)', 'The Better Scoop (Serangoon)', 'Beccarino Patisserie', 'Yi Qian Private Dining', 'HUSK Nasi Lemak (Bugis Cube)', 'Kong Cafe (Thomson Plaza)', 'Mooi Pâtisserie', 'Wanpo Tea Shop (Lazada One)', 'The Hainan Story Coffee House (NEX)', 'Hong Kong Zhai Dim Sum (Marina Square)', 'Beauty in The Pot (NEX)', 'GaiBang (Paya Lebar Square)', 'Sugaroses Cafe', 'SWIRLGO', 'Victoria Bakery', 'Dirty Cheesecake Bakery & Cafe', 'Kao Ge Yu (East Village)', 'Wonders', 'Parliament Bar', 'Mr. Bucket Chocolaterie (Dempsey)', 'Chapter 1', 'Qi Xiang Hotpot', 'The DEN - Kway Teow Kia & Bar', 'Yamakita - Tempura x Tapas', 'Onigarazu Don (Senja Hawker Centre)', 'Full Circle by J-man', 'Ah Lock Kitchen (Senja Hawker Centre)', "Winner's Fried Chicken", 'Jane Deer', 'Uncle Leong Seafood (Techview)', 'Hub & Spoke Cafe (Changi Airport T2)', 'Nani Bowl', 'Staple', 'Daniu Teochew Seafood Restaurant', '123 Zô Vietnamese BBQ Skewers and Hotpot', 'TwoBakeBoys (CT Hub 2)', 'Canopy (Changi City Point)', 'Capital Kitchen (Clarke Quay)', '8Bar Espresso', 'Boms & Buns', 'Lau Wang Claypot (Bugis+)', 'La Lola Churreria (Cross Street Exchange)', "Dawn's", 'Heng Heng Kueh', 'Fosters Cafe', 'MUGUNG', 'Sakutto Tempura&Oyster', 'Firewood Chicken & Bagel', 'Shikar', 'Ahara', '1-V:U', 'Cocokata', 'MANAM', 'Shake Shake In A Tub (IMM)', '929 Desserts & Bites', "Verandah @ Rael's (111 Somerset)", 'Kei Kaisendon (Tanglin Mall)', 'Lee Lai Jiak', 'Gelatology Lab', 'Matchaya (Jem)', 'Bag Me Up', 'PaPa Gelare by CoffeePlus', 'Chun Noodle Bar (Amoy Street Food Centre)', 'Butter Bean (VivoCity)', 'OMU NOMU Craft Sake & Raw Bar', 'The Dim Sum Place (The Centrepoint)', 'Yuugo Cafe', 'That Wine Place', 'The Whole Kitchen (CBD)', 'Colobaba (Century Square)', 'Boomeranz Nasi Ayam Power By Adimann (Northpoint City)', 'Hitoyoshi Izakaya', 'popotatoe', 'Jin Yu Man Tang Dessert Shop (South Bridge)', 'LILY•S Gourmet Bistro (Tanglin Mall)', 'Surrey Hills Deli', 'Main Street Commissary', 'Seroja', 'Luckmeow (Maxwell)', 'Old Shifu Charcoal Porridge', 'iSTEAKS Reserved', 'Rumours Beach Club', 'Willy Wankerz', 'Cupping Room Coffee Roasters (Robinson Centre)', 'Monte Risaia', 'Shinya Izakaya', 'OSTERIA BBR by Alain Ducasse', '+886 Bistro', 'Dabao Gelato', '99 Thai Taste']

huangapple
  • 本文由 发表于 2023年3月8日 17:53:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671546.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定