如何使用Selenium和C#从一个隐藏的网站获取表格数据?

huangapple go评论117阅读模式
英文:

How to obtain table data from a website that is hidden using selenium and c#?

问题

我试图从以下网站进行抓取并使用C#中的Selenium提取产品的表格数据,但当我想解析HTML结果时,我找不到表格。似乎表格是在页面加载后通过JavaScript/AJAX加载的。我该如何提取表格及其行数?

网址: www.ifm.com/de/en/category/200_010_010_010

  1. var options = new ChromeOptions()
  2. {
  3. BinaryLocation = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
  4. };
  5. options.AddArguments(new List<string>() { "headless", "disable-gpu" });
  6. string response = "";
  7. options.AddArgument("no-sandbox");
  8. using (var browser = new ChromeDriver(options))
  9. {
  10. browser.Navigate().GoToUrl(url);
  11. WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
  12. ///
  13. /// *下面两个表达式均返回null*
  14. //IWebElement rows_count = browser.FindElement(By.XPath("ifm-selector__matching-products"));
  15. //IWebElement next_button = browser.FindElement(By.XPath("ifm-pagination__cta normalize hover- link-2"));
  16. response= browser.PageSource;
  17. }
  18. HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
  19. htmlDoc.LoadHtml(response);
  20. var rows_count = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='ifm- selector__results']//div[@class='ifm-selector__matching-products']//span");
英文:

I'm trying to scrape the following website and extract the table data of products using selenium in c# but when I want to parse the HTML result, I can't find the table. It seeems the table is loaded by Javascript/AJAX after the page loads. How can I extract the table and its number of rows?

URL: www.ifm.com/de/en/category/200_010_010_010

  1. var options = new ChromeOptions()
  2. {
  3. BinaryLocation = &quot;C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe&quot;,
  4. };
  5. options.AddArguments(new List&lt;string&gt;() { &quot;headless&quot;, &quot;disable-gpu&quot; });
  6. string response = &quot;&quot;;
  7. options.AddArgument(&quot;no-sandbox&quot;);
  8. using (var browser = new ChromeDriver(options))
  9. {
  10. browser.Navigate().GoToUrl(url);
  11. WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
  12. ///
  13. /// *Both below expresions return null*
  14. //IWebElement rows_count = browser.FindElement(By.XPath(&quot;ifm-selector__matching-products&quot;));
  15. //IWebElement next_button = browser.FindElement(By.XPath(&quot;ifm-pagination__cta normalize hover- link-2&quot;));
  16. response= browser.PageSource;
  17. }
  18. HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
  19. htmlDoc.LoadHtml(response);
  20. var rows_count = htmlDoc.DocumentNode.SelectSingleNode(&quot;//div[@class=&#39;ifm- selector__results&#39;]//div[@class=&#39;ifm-selector__matching-products&#39;]//span&quot;);

答案1

得分: 0

You can wait for the element to be available in the DOM. See this answer on how to do that: Link.

You can use the following extensions or use the code inside:

  1. internal static class WebDriverExtensions
  2. {
  3. public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
  4. => FindElement((IWebDriver)driver, by, timeout);
  5. public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
  6. {
  7. // NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/
  8. var webDriverWait = new WebDriverWait(driver, timeout)
  9. {
  10. // Will default to the DefaultWait polling interval of selenium which is as of writing half a second
  11. PollingInterval = pollingInterval
  12. };
  13. // We're polling the DOM, so this is normal procedure and not an exception.
  14. webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));
  15. return webDriverWait
  16. .Until(drv => drv.FindElement(@by));
  17. }
  18. }

Then you'd use ifm-result-item as a CSS class selector to get a list of all HTML elements with their values:

  1. <div class="ifm-result-item">
  2. <div class="ifm-result-item__product-info">
  3. <button class="ifm-result-item__toggle hide-md- normalize" aria-expanded="false" data-test="ifm-result-item-toggle">
  4. <svg viewBox="0 0 24 24" class="ifm-result-item__toggle-icon inline-icon" aria-hidden="true">
  5. <use href="#chevron-d" class="icon-svg--fat"></use>
  6. </svg>
  7. </button>
  8. <div class="ifm-result-item__product-info-inner">
  9. <a href="/de/en/product/IEW200" class="ifm-result-item__product-link-wrapper" data-test="ifm-result-item-link">
  10. <span class="ifm-result-item__image">
  11. <div class="ifm-product-thumbnail"><img srcset="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x" src="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200" class="ifm-product-thumbnail__img" loading="lazy" style=""></div>
  12. </span>
  13. <div>
  14. <span class="ifm-result-item__product-link">IEW200</span>
  15. <div class="ifm-result-item__product-description hide-lg+">Inductive sensor</div>
  16. </div>
  17. </a>
  18. <div class="ifm-labeled-value-section ifm-result-item__product-info-details">
  19. <!-- ... (other HTML content) ... -->
  20. </div>
  21. </div>
  22. </div>
  23. <!-- ... (other HTML content) ... -->
  24. </div>

For this particular site/page, you could also use the JavaScript API: Link

英文:

You can wait for the element to be available in the dom, see for example this answer on how to do that:

https://stackoverflow.com/a/74930503/4122889

You can use the following extensions or use the code inside.

  1. internal static class WebDriverExtensions
  2. {
  3. public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
  4. =&gt; FindElement((IWebDriver)driver, by, timeout);
  5. public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
  6. {
  7. // NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/
  8. var webDriverWait = new WebDriverWait(driver, timeout)
  9. {
  10. // Will default to the DefaultWait polling interval of selenium which is as of writing half a second
  11. PollingInterval = pollingInterval
  12. };
  13. // We&#39;re polling the dom, so this is normal procedure and not an exception.
  14. webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));
  15. return webDriverWait
  16. .Until(drv =&gt; drv.FindElement(@by));
  17. }
  18. }

Then i'd use ifm-result-item as css class selector, that should give you a list of all html elements with their values:

  1. &lt;div class=&quot;ifm-result-item&quot;&gt;
  2. &lt;div class=&quot;ifm-result-item__product-info&quot;&gt;
  3. &lt;button class=&quot;ifm-result-item__toggle hide-md- normalize&quot; aria-expanded=&quot;false&quot; data-test=&quot;ifm-result-item-toggle&quot;&gt;
  4. &lt;svg viewBox=&quot;0 0 24 24&quot; class=&quot;ifm-result-item__toggle-icon inline-icon&quot; aria-hidden=&quot;true&quot;&gt;
  5. &lt;use href=&quot;#chevron-d&quot; class=&quot;icon-svg--fat&quot;&gt;&lt;/use&gt;
  6. &lt;/svg&gt;
  7. &lt;/button&gt;
  8. &lt;div class=&quot;ifm-result-item__product-info-inner&quot;&gt;
  9. &lt;a href=&quot;/de/en/product/IEW200&quot; class=&quot;ifm-result-item__product-link-wrapper&quot; data-test=&quot;ifm-result-item-link&quot;&gt;
  10. &lt;span class=&quot;ifm-result-item__image&quot;&gt;
  11. &lt;div class=&quot;ifm-product-thumbnail&quot;&gt;&lt;img srcset=&quot;https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x&quot; src=&quot;https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200&quot; class=&quot;ifm-product-thumbnail__img&quot; loading=&quot;lazy&quot; style=&quot;&quot;&gt;&lt;/div&gt;
  12. &lt;/span&gt;
  13. &lt;div&gt;
  14. &lt;span class=&quot;ifm-result-item__product-link&quot;&gt;IEW200&lt;/span&gt;
  15. &lt;div class=&quot;ifm-result-item__product-description hide-lg+&quot;&gt;Inductive sensor&lt;/div&gt;
  16. &lt;/div&gt;
  17. &lt;/a&gt;
  18. &lt;div class=&quot;ifm-labeled-value-section ifm-result-item__product-info-details&quot;&gt;
  19. &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
  20. &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Dimensions&lt;/div&gt;
  21. &lt;!----&gt;
  22. &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;M8 x 1 / L = 40 mm&lt;/div&gt;
  23. &lt;/div&gt;
  24. &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
  25. &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Sensing range&lt;/div&gt;
  26. &lt;!----&gt;
  27. &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;3 mm flush mountable&lt;/div&gt;
  28. &lt;/div&gt;
  29. &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
  30. &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Output function&lt;/div&gt;
  31. &lt;!----&gt;
  32. &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;normally open&lt;/div&gt;
  33. &lt;/div&gt;
  34. &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
  35. &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Output&lt;/div&gt;
  36. &lt;!----&gt;
  37. &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;DC PNP&lt;/div&gt;
  38. &lt;/div&gt;
  39. &lt;div class=&quot;ifm-labeled-value-section__entry&quot;&gt;
  40. &lt;div class=&quot;ifm-labeled-value-section__label hyphens&quot;&gt;Connection&lt;/div&gt;
  41. &lt;!----&gt;
  42. &lt;div class=&quot;ifm-labeled-value-section__value hyphens&quot;&gt;M8 Connector&lt;/div&gt;
  43. &lt;/div&gt;
  44. &lt;/div&gt;
  45. &lt;/div&gt;
  46. &lt;/div&gt;
  47. &lt;div class=&quot;ifm-result-item__expandable-functions&quot; style=&quot;display: none;&quot;&gt;
  48. &lt;hr class=&quot;ifm-result-item__separator hr&quot;&gt;
  49. &lt;div class=&quot;ifm-expandable-functions ifm-result-item__collapsed-details&quot;&gt;
  50. &lt;div class=&quot;ifm-expandable-functions__item&quot;&gt;
  51. &lt;div class=&quot;ifm-product-price&quot;&gt;
  52. &lt;div class=&quot;ifm-product-price__list-price ifm-list-price&quot;&gt;&lt;span class=&quot;ifm-list-price__label&quot;&gt;List price:&lt;/span&gt;&lt;span class=&quot;ifm-list-price__value&quot; data-test=&quot;ifm-list-price&quot;&gt;55,40 €&lt;/span&gt;&lt;/div&gt;
  53. &lt;div class=&quot;ifm-product-price__individual-price ifm-individual-price&quot;&gt;&lt;span class=&quot;ifm-individual-price__label&quot;&gt;Your price:&lt;/span&gt;&lt;button type=&quot;button&quot; class=&quot;ifm-individual-price__show-price hover-link-2 normalize&quot; data-test=&quot;ifm-show-price&quot;&gt;Please log in&lt;/button&gt;&lt;/div&gt;
  54. &lt;/div&gt;
  55. &lt;/div&gt;
  56. &lt;div class=&quot;ifm-add-to-cart ifm-expandable-functions__item ifm-expandable-functions__cart-items&quot;&gt;
  57. &lt;label class=&quot;ifm-add-to-cart__input ifm-input-label&quot;&gt;
  58. &lt;div class=&quot;ifm-quantity-input&quot; data-test=&quot;ifm-add-to-cart-input&quot;&gt;
  59. &lt;div class=&quot;ifm-quantity-input__minus&quot;&gt;&lt;input type=&quot;button&quot; class=&quot;normalize&quot; data-field=&quot;quantity&quot; value=&quot;-&quot;&gt;&lt;/div&gt;
  60. &lt;input step=&quot;1&quot; min=&quot;1&quot; max=&quot;9999&quot; type=&quot;number&quot; maxlength=&quot;4&quot; name=&quot;quantity&quot; class=&quot;normalize ifm-quantity-input__input-field&quot;&gt;
  61. &lt;div class=&quot;ifm-quantity-input__plus&quot;&gt;&lt;input type=&quot;button&quot; class=&quot;normalize&quot; data-field=&quot;quantity&quot; value=&quot;+&quot;&gt;&lt;/div&gt;
  62. &lt;/div&gt;
  63. &lt;/label&gt;
  64. &lt;button class=&quot;ifm-add-to-cart__button ifm-button normalize&quot; data-test=&quot;ifm-add-to-cart-button&quot;&gt;Add to the shopping basket&lt;/button&gt;
  65. &lt;/div&gt;
  66. &lt;div class=&quot;ifm-expandable-functions__shop-items&quot;&gt;
  67. &lt;button class=&quot;ifm-wishlist hover-link-2 normalize ifm-expandable-functions__shop-item&quot; data-test=&quot;ifm-wishlist-button&quot;&gt;
  68. &lt;svg viewBox=&quot;0 0 24 24&quot; aria-hidden=&quot;true&quot; class=&quot;inline-icon&quot;&gt;
  69. &lt;use href=&quot;#heart&quot; class=&quot;icon-svg--thin&quot;&gt;&lt;/use&gt;
  70. &lt;/svg&gt;
  71. &lt;span class=&quot;hide-lg-&quot;&gt;Save for later&lt;/span&gt;
  72. &lt;/button&gt;
  73. &lt;button class=&quot;normalize ifm-compare-products hover-link-2 ifm-expandable-functions__shop-item hide-md-&quot; data-test=&quot;ifm-compare-products-button&quot;&gt;
  74. &lt;svg viewBox=&quot;0 0 1792 1792&quot; class=&quot;inline-icon&quot;&gt;
  75. &lt;use href=&quot;#compress&quot;&gt;&lt;/use&gt;
  76. &lt;/svg&gt;
  77. &lt;span class=&quot;hide-lg-&quot;&gt;Compare&lt;/span&gt;
  78. &lt;/button&gt;
  79. &lt;/div&gt;
  80. &lt;/div&gt;
  81. &lt;/div&gt;
  82. &lt;/div&gt;

For this particular site/page you could also just use the javascript api;
https://www.ifm.com/restservices/de/en/category/200_010_010_010/productsAndAttributes

huangapple
  • 本文由 发表于 2023年4月17日 19:32:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034709.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定