2023年3月8日 17:53:47go评论119阅读模式

英文:

How to I append the updated data after load more button using selenium (Webscraping)

问题

在运行我的代码后，单击“加载更多”按钮后，我没有获取到“updated”数据。我尝试了很多次，但不确定我的代码有什么问题。在Python中，当单击加载更多按钮后，我应该有一个数据列表。然而，我只获取到了按钮单击之前附加到列表中的数据。

class RunChromeTests():
    def test(self):
        chrome_options = Options()
        chrome_options.add_argument("--disable-notifications")
        chrome_options.add_argument("-incognito")
        chrome_options.add_argument("--disable-popup-blocking")
        chrome_options.add_argument("--ignore-certificate-errors")
        chrome_options.add_argument("--disable-javascript")
        # 从https://chromedriver.chromium.org/downloads下载Chrome驱动程序，并找到驱动程序在您的计算机上的位置
        chrome_path = "路径"
        driver = webdriver.Chrome(chrome_path, options=chrome_options)
        driver.maximize_window()
        driver.implicitly_wait(10)
        final = []
        # 输入要抓取的URL，但将页面编号更改为{a}
        driver.get("https://www.burpple.com/search/sg?q=Newly+Opened&type=places")
        content = driver.page_source
        soup = BeautifulSoup(content)
        loadmore = driver.find_element_by_id("masonryViewMore-btn")
        j = 0
        final1 = []
        try:
            while loadmore.is_displayed():
                loadmore.click()
                time.sleep(2)
                lrec = soup.find_all("span", {"class": "searchVenue-header-name-name headingMedium"})
                newlist = lrec[j:]
                print(lrec)
                for rec in newlist:
                    name = rec.text
                    final1.append(name)
                print(final1)
                j = len(lrec) + 1
                time.sleep(5)
        except exceptions.StaleElementReferenceException:
            pass
chromed = RunChromeTests()
chromed.test()

输出：
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)']

正确输出：
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)', 'Flourish Bakehouse', 'Equate Coffee (Orchard Central)', 'TAG Espresso (Raffles City)', 'Arc-En-Ciel Patisserie', 'Enjoy Eating House & Bar (Stevens),', 'SAGE By Yasunori Doi (Orchard Plaza)', 'Hellu Coffee', 'Pestle & Mortar Society', ...]

英文:

After running my code, I am not getting the "updated" data after clicking the load more button. I tried alot of times & I am not sure what is wrong with my code. I am supposed to have a list of data after the load more button runs in python. However, I am only getting the data before button was clicked which i appended in the list.

class RunChromeTests():
    def test(self):
        chrome_options = Options()
        chrome_options.add_argument(&quot;--disable-notifications&quot;)
        chrome_options.add_argument(&quot;-incognito&quot;)
        chrome_options.add_argument(&quot;--disable-popup-blocking&quot;)
        chrome_options.add_argument(&quot;--ignore-certificate-errors&quot;)
        chrome_options.add_argument(&quot;--disable-javascript&quot;)
    
        # Download the chrome driver from https://chromedriver.chromium.org/downloads 
        # and find the driver location in your computer
        chrome_path = r&quot;path&quot;
        driver = webdriver.Chrome(chrome_path, options=chrome_options)
        driver.maximize_window()
        driver.implicitly_wait(10)
        
        final = []
    
    
            
        ### Enter your url to scrape but change the page number to {a} 
        driver.get(&quot;https://www.burpple.com/search/sg?q=Newly+Opened&amp;type=places&quot;)
        content = driver.page_source
        soup = BeautifulSoup(content)
            
        loadmore = driver.find_element_by_id(&quot;masonryViewMore-btn&quot;)
        j = 0
        final1=[]
       
        try:
            while loadmore.is_displayed():
                loadmore.click()
                time.sleep(2)
                lrec = soup.find_all(&quot;span&quot;,{&quot;searchVenue-header-name-name headingMedium&quot;})
                #loadmore.is_displayed()
                newlist = lrec[j:]
                print(lrec)
                #print(newlist)
                for rec in newlist:
                    name = rec.text
                    #print(name)
                    final1.append(name)
                print(final1)
                j = len(lrec)+1
                #final1.append(name)
                time.sleep(5)
                #print(j)
                #
        except exceptions.StaleElementReferenceException:
            pass
chromed = RunChromeTests()
chromed.test()

Output:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)']

Correct Output:
['Kotuwa', 'Smoochie Creamery', "Evan's Kitch", 'Plus Coffee Joint', '800° Woodfired Pizza (KINEX)', "Sarah's Loft", 'nicher (Springleaf)', 'First Story Cafe', 'Tucela Gelato', 'Ri Ri Cha', 'Unatoto', 'Royal Palm (Meat & Dine)', 'Flourish Bakehouse','Equate Coffee (Orchard Central)', 'TAG Espresso (Raffles City)', 'Arc-En-Ciel Patisserie', 'Enjoy Eating House & Bar (Stevens),'SAGE By Yasunori Doi (Orchard Plaza)','Hellu Coffee','Pestle & Mortar Society',... ]

... -> signifies & many more

答案1

得分: 1

以下是代码部分的翻译：

final1=[]
driver.get("https://www.burpple.com/search/sg?q=Newly+Opened&amp;type=places")
time.sleep(2)
while True:
    content = driver.page_source
    soup = BeautifulSoup(content)
    for item in soup.select("span.searchVenue-header-name-name.headingMedium"):
        if not item.text in final1:
            final1.append(item.text)
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "masonryViewMore-btn"))).click()
        time.sleep(2)
    except:
        break
print("Total number of elements: {}".format(len(final1)))
print(final1)

不需要翻译的部分已经被保留，包括代码和输出结果。

英文:

You need to get the page source in every time you clicked on Lode More button.
and keep checking whether name is exists in the list if not then added.

Use infinite loop and check if load more button exists, if so click else break the loop.

code:

final1=[]
driver.get (&quot;https://www.burpple.com/search/sg?q=Newly+Opened&amp;type=places&quot;)
time.sleep(2)
while True:
    content = driver.page_source
    soup = BeautifulSoup(content)
    for item in soup.select(&quot;span.searchVenue-header-name-name.headingMedium&quot;):
        if not item.text in final1:
            final1.append(item.text)
    try:
       WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, &quot;masonryViewMore-btn&quot;))).click()
       time.sleep(2)
    except:
        break
print(&quot;Total number of elements:  {}&quot;.format(len(final1)))
print(final1)

you need to import below library.

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

Output:

Total number of elements:  161
[&#39;Plus Coffee Joint&#39;, &#39;Kotuwa&#39;, &#39;Smoochie Creamery&#39;, &quot;Evan&#39;s Kitch&quot;, &#39;800&#176; Woodfired Pizza (KINEX)&#39;, &quot;Sarah&#39;s Loft&quot;, &#39;nicher (Springleaf)&#39;, &#39;First Story Cafe&#39;, &#39;Tucela Gelato&#39;, &#39;Ri Ri Cha&#39;, &#39;Unatoto&#39;, &#39;Royal Palm (Meat &amp; Dine)&#39;, &#39;Flourish Bakehouse&#39;, &#39;TAG Espresso (Raffles City)&#39;, &#39;Equate Coffee (Orchard Central)&#39;, &#39;Arc-en-ciel P&#226;tisserie&#39;, &#39;Enjoy Eating House &amp; Bar (Stevens)&#39;, &#39;SAGE by Yasunori Doi (Orchard Plaza)&#39;, &#39;Hellu Coffee&#39;, &#39;Pestle &amp; Mortar Society&#39;, &#39;Omot&#233; Soho (Velocity)&#39;, &#39;Le Matin Patisserie (ION Orchard)&#39;, &#39;KEK Keng Eng Kee Seafood (Tampines)&#39;, &#39;Mother-In-Law Egg Tart (Havelock)&#39;, &#39;Mono Izakaya&#39;, &#39;Daily Staples&#39;, &quot;Ben&#39;s Tavern&quot;, &#39;FiftyFive Coffee Bar&#39;, &#39;Butcher’s Block&#39;, &#39;There Was No Coffee&#39;, &#39;Roemah Makan&#39;, &#39;Overscoop (Hougang)&#39;, &#39;Wong Fu Fu&#39;, &#39;Dim Sum Haus (Upper Weld)&#39;, &#39;Long Phung Vietnamese Restaurant (Chinatown)&#39;, &#39;Abundance (Jalan Besar)&#39;, &#39;Beans Factory&#39;, &#39;XiabuXiabu&#39;, &#39;B/W Bagelwich&#39;, &#39;Nakiryu&#39;, &#39;Fiamma (Capella Singapore)&#39;, &#39;Kei Kaisendon (Breadtalk IHQ)&#39;, &#39;cloud&#39;, &#39;Sweedy Patisserie&#39;, &#39;Fatty Patty (The Bedok Marketplace)&#39;, &#39;afterwords&#39;, &#39;NAN YANG DAO (Aljunied)&#39;, &#39;KREME&#39;, &#39;Pilot Kitchen&#39;, &#39;Peacock North Indian Cuisine (NEWest)&#39;, &#39;Caf&#233; Natsu (Clemenceau)&#39;, &#39;Little Cart Noodle House&#39;, &#39;Anchovies &amp; Peanuts (Golden Mile Food Centre)&#39;, &#39;Otto Pizza&#39;, &#39;Little French Fusion&#39;, &#39;Underdog Inn&#39;, &#39;Sum Dim Sum&#39;, &#39;Seng House&#39;, &#39;Lor Mak Mak (Changi Village Hawker Centre)&#39;, &#39;Guriru&#39;, &#39;The Last Scoop&#39;, &#39;Darkness Dessert&#39;, &#39;Hatsu&#39;, &#39;Dan Lao (Maxwell Food Centre)&#39;, &#39;Chong Qing Xiao Mu Deng Traditional Hot Pot (GR.iD)&#39;, &#39;Refuel J9 (Junction Nine)&#39;, &#39;Fu Xiao Xian&#39;, &#39;Good Chai People&#39;, &#39;Ginger.Lily&#39;, &#39;Paris Van Java&#39;, &#39;Hejio&#39;, &#39;LUNA (Joo Chiat)&#39;, &#39;The Better Scoop (Serangoon)&#39;, &#39;Beccarino Patisserie&#39;, &#39;Yi Qian Private Dining&#39;, &#39;HUSK Nasi Lemak (Bugis Cube)&#39;, &#39;Kong Cafe (Thomson Plaza)&#39;, &#39;Mooi P&#226;tisserie&#39;, &#39;Wanpo Tea Shop (Lazada One)&#39;, &#39;The Hainan Story Coffee House (NEX)&#39;, &#39;Hong Kong Zhai Dim Sum (Marina Square)&#39;, &#39;Beauty in The Pot (NEX)&#39;, &#39;GaiBang (Paya Lebar Square)&#39;, &#39;Sugaroses Cafe&#39;, &#39;SWIRLGO&#39;, &#39;Victoria Bakery&#39;, &#39;Dirty Cheesecake Bakery &amp; Cafe&#39;, &#39;Kao Ge Yu (East Village)&#39;, &#39;Wonders&#39;, &#39;Parliament Bar&#39;, &#39;Mr. Bucket Chocolaterie (Dempsey)&#39;, &#39;Chapter 1&#39;, &#39;Qi Xiang Hotpot&#39;, &#39;The DEN - Kway Teow Kia &amp; Bar&#39;, &#39;Yamakita - Tempura x Tapas&#39;, &#39;Onigarazu Don (Senja Hawker Centre)&#39;, &#39;Full Circle by J-man&#39;, &#39;Ah Lock Kitchen (Senja Hawker Centre)&#39;, &quot;Winner&#39;s Fried Chicken&quot;, &#39;Jane Deer&#39;, &#39;Uncle Leong Seafood (Techview)&#39;, &#39;Hub &amp; Spoke Cafe (Changi Airport T2)&#39;, &#39;Nani Bowl&#39;, &#39;Staple&#39;, &#39;Daniu Teochew Seafood Restaurant&#39;, &#39;123 Z&#244; Vietnamese BBQ Skewers and Hotpot&#39;, &#39;TwoBakeBoys (CT Hub 2)&#39;, &#39;Canopy (Changi City Point)&#39;, &#39;Capital Kitchen (Clarke Quay)&#39;, &#39;8Bar Espresso&#39;, &#39;Boms &amp; Buns&#39;, &#39;Lau Wang Claypot (Bugis+)&#39;, &#39;La Lola Churreria (Cross Street Exchange)&#39;, &quot;Dawn&#39;s&quot;, &#39;Heng Heng Kueh&#39;, &#39;Fosters Cafe&#39;, &#39;MUGUNG&#39;, &#39;Sakutto Tempura&amp;Oyster&#39;, &#39;Firewood Chicken &amp; Bagel&#39;, &#39;Shikar&#39;, &#39;Ahara&#39;, &#39;1-V:U&#39;, &#39;Cocokata&#39;, &#39;MANAM&#39;, &#39;Shake Shake In A Tub (IMM)&#39;, &#39;929 Desserts &amp; Bites&#39;, &quot;Verandah @ Rael&#39;s (111 Somerset)&quot;, &#39;Kei Kaisendon (Tanglin Mall)&#39;, &#39;Lee Lai Jiak&#39;, &#39;Gelatology Lab&#39;, &#39;Matchaya (Jem)&#39;, &#39;Bag Me Up&#39;, &#39;PaPa Gelare by CoffeePlus&#39;, &#39;Chun Noodle Bar (Amoy Street Food Centre)&#39;, &#39;Butter Bean (VivoCity)&#39;, &#39;OMU NOMU Craft Sake &amp; Raw Bar&#39;, &#39;The Dim Sum Place (The Centrepoint)&#39;, &#39;Yuugo Cafe&#39;, &#39;That Wine Place&#39;, &#39;The Whole Kitchen (CBD)&#39;, &#39;Colobaba (Century Square)&#39;, &#39;Boomeranz Nasi Ayam Power By Adimann (Northpoint City)&#39;, &#39;Hitoyoshi Izakaya&#39;, &#39;popotatoe&#39;, &#39;Jin Yu Man Tang Dessert Shop (South Bridge)&#39;, &#39;LILY•S Gourmet Bistro (Tanglin Mall)&#39;, &#39;Surrey Hills Deli&#39;, &#39;Main Street Commissary&#39;, &#39;Seroja&#39;, &#39;Luckmeow (Maxwell)&#39;, &#39;Old Shifu Charcoal Porridge&#39;, &#39;iSTEAKS Reserved&#39;, &#39;Rumours Beach Club&#39;, &#39;Willy Wankerz&#39;, &#39;Cupping Room Coffee Roasters (Robinson Centre)&#39;, &#39;Monte Risaia&#39;, &#39;Shinya Izakaya&#39;, &#39;OSTERIA BBR by Alain Ducasse&#39;, &#39;+886 Bistro&#39;, &#39;Dabao Gelato&#39;, &#39;99 Thai Taste&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在使用Selenium（网络爬虫）加载更多按钮后附加更新的数据？

问题

答案1

Sphinx-multiversion: 下载文件不可读

从唯一的“ID”中减去“INT”列的“LMP”列，但仅从索引行中减去。

Azure 主机设置用于 Node API、Reactjs、Python FastAPI 和 React Native。

动态定义带继承和静态属性的类（sqlmodel / sqlalchemy 元类）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。