如何使用Selenium和C#从一个隐藏的网站获取表格数据?

huangapple go评论67阅读模式
英文:

How to obtain table data from a website that is hidden using selenium and c#?

问题

我试图从以下网站进行抓取并使用C#中的Selenium提取产品的表格数据,但当我想解析HTML结果时,我找不到表格。似乎表格是在页面加载后通过JavaScript/AJAX加载的。我该如何提取表格及其行数?

网址: www.ifm.com/de/en/category/200_010_010_010

var options = new ChromeOptions()
{
    BinaryLocation = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
};
options.AddArguments(new List<string>() { "headless", "disable-gpu" });
string response = "";
options.AddArgument("no-sandbox");
using (var browser = new ChromeDriver(options))
{
    browser.Navigate().GoToUrl(url);
    WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
    ///
    /// *下面两个表达式均返回null*
    //IWebElement rows_count = browser.FindElement(By.XPath("ifm-selector__matching-products"));
    //IWebElement next_button = browser.FindElement(By.XPath("ifm-pagination__cta normalize   hover-         link-2"));
 response= browser.PageSource;
} 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(response);
var rows_count = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='ifm-  selector__results']//div[@class='ifm-selector__matching-products']//span");
英文:

I'm trying to scrape the following website and extract the table data of products using selenium in c# but when I want to parse the HTML result, I can't find the table. It seeems the table is loaded by Javascript/AJAX after the page loads. How can I extract the table and its number of rows?

URL: www.ifm.com/de/en/category/200_010_010_010

var options = new ChromeOptions()
{
    BinaryLocation = &quot;C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe&quot;,

};
options.AddArguments(new List&lt;string&gt;() { &quot;headless&quot;, &quot;disable-gpu&quot; });
string response = &quot;&quot;;
options.AddArgument(&quot;no-sandbox&quot;);
using (var browser = new ChromeDriver(options))
{
    browser.Navigate().GoToUrl(url);
    WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
    ///
    /// *Both below expresions return null*
    //IWebElement rows_count = browser.FindElement(By.XPath(&quot;ifm-selector__matching-products&quot;));
    //IWebElement next_button = browser.FindElement(By.XPath(&quot;ifm-pagination__cta normalize   hover-         link-2&quot;));
 response= browser.PageSource;
} 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(response);
var rows_count = htmlDoc.DocumentNode.SelectSingleNode(&quot;//div[@class=&#39;ifm-  selector__results&#39;]//div[@class=&#39;ifm-selector__matching-products&#39;]//span&quot;);

答案1

得分: 0

You can wait for the element to be available in the DOM. See this answer on how to do that: Link.

You can use the following extensions or use the code inside:

internal static class WebDriverExtensions
{
    public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
        => FindElement((IWebDriver)driver, by, timeout);

    public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
    {
        // NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/

        var webDriverWait = new WebDriverWait(driver, timeout)
        {
            // Will default to the DefaultWait polling interval of selenium which is as of writing half a second
            PollingInterval = pollingInterval
        };

        // We're polling the DOM, so this is normal procedure and not an exception.
        webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));

        return webDriverWait
            .Until(drv => drv.FindElement(@by));
    }
}

Then you'd use ifm-result-item as a CSS class selector to get a list of all HTML elements with their values:

<div class="ifm-result-item">
   <div class="ifm-result-item__product-info">
      <button class="ifm-result-item__toggle hide-md- normalize" aria-expanded="false" data-test="ifm-result-item-toggle">
         <svg viewBox="0 0 24 24" class="ifm-result-item__toggle-icon inline-icon" aria-hidden="true">
            <use href="#chevron-d" class="icon-svg--fat"></use>
         </svg>
      </button>
      <div class="ifm-result-item__product-info-inner">
         <a href="/de/en/product/IEW200" class="ifm-result-item__product-link-wrapper" data-test="ifm-result-item-link">
            <span class="ifm-result-item__image">
               <div class="ifm-product-thumbnail"><img srcset="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x" src="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200" class="ifm-product-thumbnail__img" loading="lazy" style=""></div>
            </span>
            <div>
               <span class="ifm-result-item__product-link">IEW200</span>
               <div class="ifm-result-item__product-description hide-lg+">Inductive sensor</div>
            </div>
         </a>
         <div class="ifm-labeled-value-section ifm-result-item__product-info-details">
            <!-- ... (other HTML content) ... -->
         </div>
      </div>
   </div>
   <!-- ... (other HTML content) ... -->
</div>

For this particular site/page, you could also use the JavaScript API: Link

英文:

You can wait for the element to be available in the dom, see for example this answer on how to do that:

https://stackoverflow.com/a/74930503/4122889

You can use the following extensions or use the code inside.

internal static class WebDriverExtensions
{
    public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
        =&gt; FindElement((IWebDriver)driver, by, timeout);

    public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
    {
        // NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/

        var webDriverWait = new WebDriverWait(driver, timeout)
        {
            // Will default to the DefaultWait polling interval of selenium which is as of writing half a second
            PollingInterval = pollingInterval
        };

        // We&#39;re polling the dom, so this is normal procedure and not an exception.
        webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));

        return webDriverWait
            .Until(drv =&gt; drv.FindElement(@by));
    }
}

Then i'd use ifm-result-item as css class selector, that should give you a list of all html elements with their values:

&lt;div class=&quot;ifm-result-item&quot;&gt;
   &lt;div class=&quot;ifm-result-item__product-info&quot;&gt;
      &lt;button class=&quot;ifm-result-item__toggle hide-md- normalize&quot; aria-expanded=&quot;false&quot; data-test=&quot;ifm-result-item-toggle&quot;&gt;
         &lt;svg viewBox=&quot;0 0 24 24&quot; class=&quot;ifm-result-item__toggle-icon inline-icon&quot; aria-hidden=&quot;true&quot;&gt;
            &lt;use href=&quot;#chevron-d&quot; class=&quot;icon-svg--fat&quot;&gt;&lt;/use&gt;
         &lt;/svg&gt;
      &lt;/button&gt;
      &lt;div class=&quot;ifm-result-item__product-info-inner&quot;&gt;
         &lt;a href=&quot;/de/en/product/IEW200&quot; class=&quot;ifm-result-item__product-link-wrapper&quot; data-test=&quot;ifm-result-item-link&quot;&gt;
            &lt;span class=&quot;ifm-result-item__image&quot;&gt;
               &lt;div class=&quot;ifm-product-thumbnail&quot;&gt;&lt;img srcset=&quot;https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x&quot; src=&quot;https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200&quot; class=&quot;ifm-product-thumbnail__img&quot; loading=&quot;lazy&quot; style=&quot;&quot;&gt;&lt;/div&gt;
            &lt;/span&gt;
            &lt;div&gt;
               &lt;span class=&quot;ifm-result-item__product-link&quot;&gt;IEW200&lt;/span&gt;
               &lt;div class=&quot;ifm-result-item__product-description hide-lg+&quot;&gt;Inductive sensor&lt;/div&gt;
            &lt;/div&gt;
         &lt;/a&gt;
         &lt;div class=&quot;ifm-labeled-value-section ifm-result-item__product-info-details&quot;&gt;
            &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
               &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Dimensions&lt;/div&gt;
               &lt;!----&gt;
               &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;M8 x 1 / L = 40 mm&lt;/div&gt;
            &lt;/div&gt;
            &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
               &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Sensing range&lt;/div&gt;
               &lt;!----&gt;
               &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;3 mm flush mountable&lt;/div&gt;
            &lt;/div&gt;
            &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
               &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Output function&lt;/div&gt;
               &lt;!----&gt;
               &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;normally open&lt;/div&gt;
            &lt;/div&gt;
            &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
               &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Output&lt;/div&gt;
               &lt;!----&gt;
               &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;DC PNP&lt;/div&gt;
            &lt;/div&gt;
            &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
               &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Connection&lt;/div&gt;
               &lt;!----&gt;
               &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;M8 Connector&lt;/div&gt;
            &lt;/div&gt;
         &lt;/div&gt;
      &lt;/div&gt;
   &lt;/div&gt;
   &lt;div class=&quot;ifm-result-item__expandable-functions&quot; style=&quot;display: none;&quot;&gt;
      &lt;hr class=&quot;ifm-result-item__separator hr&quot;&gt;
      &lt;div class=&quot;ifm-expandable-functions ifm-result-item__collapsed-details&quot;&gt;
         &lt;div class=&quot;ifm-expandable-functions__item&quot;&gt;
            &lt;div class=&quot;ifm-product-price&quot;&gt;
               &lt;div class=&quot;ifm-product-price__list-price ifm-list-price&quot;&gt;&lt;span class=&quot;ifm-list-price__label&quot;&gt;List price:&lt;/span&gt;&lt;span class=&quot;ifm-list-price__value&quot; data-test=&quot;ifm-list-price&quot;&gt;55,40 €&lt;/span&gt;&lt;/div&gt;
               &lt;div class=&quot;ifm-product-price__individual-price ifm-individual-price&quot;&gt;&lt;span class=&quot;ifm-individual-price__label&quot;&gt;Your price:&lt;/span&gt;&lt;button type=&quot;button&quot; class=&quot;ifm-individual-price__show-price hover-link-2 normalize&quot; data-test=&quot;ifm-show-price&quot;&gt;Please log in&lt;/button&gt;&lt;/div&gt;
            &lt;/div&gt;
         &lt;/div&gt;
         &lt;div class=&quot;ifm-add-to-cart ifm-expandable-functions__item ifm-expandable-functions__cart-items&quot;&gt;
            &lt;label class=&quot;ifm-add-to-cart__input ifm-input-label&quot;&gt;
               &lt;div class=&quot;ifm-quantity-input&quot; data-test=&quot;ifm-add-to-cart-input&quot;&gt;
                  &lt;div class=&quot;ifm-quantity-input__minus&quot;&gt;&lt;input type=&quot;button&quot; class=&quot;normalize&quot; data-field=&quot;quantity&quot; value=&quot;-&quot;&gt;&lt;/div&gt;
                  &lt;input step=&quot;1&quot; min=&quot;1&quot; max=&quot;9999&quot; type=&quot;number&quot; maxlength=&quot;4&quot; name=&quot;quantity&quot; class=&quot;normalize ifm-quantity-input__input-field&quot;&gt;
                  &lt;div class=&quot;ifm-quantity-input__plus&quot;&gt;&lt;input type=&quot;button&quot; class=&quot;normalize&quot; data-field=&quot;quantity&quot; value=&quot;+&quot;&gt;&lt;/div&gt;
               &lt;/div&gt;
            &lt;/label&gt;
            &lt;button class=&quot;ifm-add-to-cart__button ifm-button normalize&quot; data-test=&quot;ifm-add-to-cart-button&quot;&gt;Add to the shopping basket&lt;/button&gt;
         &lt;/div&gt;
         &lt;div class=&quot;ifm-expandable-functions__shop-items&quot;&gt;
            &lt;button class=&quot;ifm-wishlist hover-link-2 normalize ifm-expandable-functions__shop-item&quot; data-test=&quot;ifm-wishlist-button&quot;&gt;
               &lt;svg viewBox=&quot;0 0 24 24&quot; aria-hidden=&quot;true&quot; class=&quot;inline-icon&quot;&gt;
                  &lt;use href=&quot;#heart&quot; class=&quot;icon-svg--thin&quot;&gt;&lt;/use&gt;
               &lt;/svg&gt;
               &lt;span class=&quot;hide-lg-&quot;&gt;Save for later&lt;/span&gt;
            &lt;/button&gt;
            &lt;button class=&quot;normalize ifm-compare-products hover-link-2 ifm-expandable-functions__shop-item hide-md-&quot; data-test=&quot;ifm-compare-products-button&quot;&gt;
               &lt;svg viewBox=&quot;0 0 1792 1792&quot; class=&quot;inline-icon&quot;&gt;
                  &lt;use href=&quot;#compress&quot;&gt;&lt;/use&gt;
               &lt;/svg&gt;
               &lt;span class=&quot;hide-lg-&quot;&gt;Compare&lt;/span&gt;
            &lt;/button&gt;
         &lt;/div&gt;
      &lt;/div&gt;
   &lt;/div&gt;
&lt;/div&gt;

For this particular site/page you could also just use the javascript api;
https://www.ifm.com/restservices/de/en/category/200_010_010_010/productsAndAttributes

huangapple
  • 本文由 发表于 2023年4月17日 19:32:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034709.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定