英文:
How to obtain table data from a website that is hidden using selenium and c#?
问题
我试图从以下网站进行抓取并使用C#中的Selenium提取产品的表格数据,但当我想解析HTML结果时,我找不到表格。似乎表格是在页面加载后通过JavaScript/AJAX加载的。我该如何提取表格及其行数?
网址: www.ifm.com/de/en/category/200_010_010_010
var options = new ChromeOptions()
{
BinaryLocation = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
};
options.AddArguments(new List<string>() { "headless", "disable-gpu" });
string response = "";
options.AddArgument("no-sandbox");
using (var browser = new ChromeDriver(options))
{
browser.Navigate().GoToUrl(url);
WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
///
/// *下面两个表达式均返回null*
//IWebElement rows_count = browser.FindElement(By.XPath("ifm-selector__matching-products"));
//IWebElement next_button = browser.FindElement(By.XPath("ifm-pagination__cta normalize hover- link-2"));
response= browser.PageSource;
}
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(response);
var rows_count = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='ifm- selector__results']//div[@class='ifm-selector__matching-products']//span");
英文:
I'm trying to scrape the following website and extract the table data of products using selenium in c# but when I want to parse the HTML result, I can't find the table. It seeems the table is loaded by Javascript/AJAX after the page loads. How can I extract the table and its number of rows?
URL: www.ifm.com/de/en/category/200_010_010_010
var options = new ChromeOptions()
{
BinaryLocation = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
};
options.AddArguments(new List<string>() { "headless", "disable-gpu" });
string response = "";
options.AddArgument("no-sandbox");
using (var browser = new ChromeDriver(options))
{
browser.Navigate().GoToUrl(url);
WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
///
/// *Both below expresions return null*
//IWebElement rows_count = browser.FindElement(By.XPath("ifm-selector__matching-products"));
//IWebElement next_button = browser.FindElement(By.XPath("ifm-pagination__cta normalize hover- link-2"));
response= browser.PageSource;
}
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(response);
var rows_count = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='ifm- selector__results']//div[@class='ifm-selector__matching-products']//span");
答案1
得分: 0
You can wait for the element to be available in the DOM. See this answer on how to do that: Link.
You can use the following extensions or use the code inside:
internal static class WebDriverExtensions
{
public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
=> FindElement((IWebDriver)driver, by, timeout);
public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
{
// NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/
var webDriverWait = new WebDriverWait(driver, timeout)
{
// Will default to the DefaultWait polling interval of selenium which is as of writing half a second
PollingInterval = pollingInterval
};
// We're polling the DOM, so this is normal procedure and not an exception.
webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));
return webDriverWait
.Until(drv => drv.FindElement(@by));
}
}
Then you'd use ifm-result-item
as a CSS class selector to get a list of all HTML elements with their values:
<div class="ifm-result-item">
<div class="ifm-result-item__product-info">
<button class="ifm-result-item__toggle hide-md- normalize" aria-expanded="false" data-test="ifm-result-item-toggle">
<svg viewBox="0 0 24 24" class="ifm-result-item__toggle-icon inline-icon" aria-hidden="true">
<use href="#chevron-d" class="icon-svg--fat"></use>
</svg>
</button>
<div class="ifm-result-item__product-info-inner">
<a href="/de/en/product/IEW200" class="ifm-result-item__product-link-wrapper" data-test="ifm-result-item-link">
<span class="ifm-result-item__image">
<div class="ifm-product-thumbnail"><img srcset="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x" src="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200" class="ifm-product-thumbnail__img" loading="lazy" style=""></div>
</span>
<div>
<span class="ifm-result-item__product-link">IEW200</span>
<div class="ifm-result-item__product-description hide-lg+">Inductive sensor</div>
</div>
</a>
<div class="ifm-labeled-value-section ifm-result-item__product-info-details">
<!-- ... (other HTML content) ... -->
</div>
</div>
</div>
<!-- ... (other HTML content) ... -->
</div>
For this particular site/page, you could also use the JavaScript API: Link
英文:
You can wait for the element to be available in the dom, see for example this answer on how to do that:
https://stackoverflow.com/a/74930503/4122889
You can use the following extensions or use the code inside.
internal static class WebDriverExtensions
{
public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
=> FindElement((IWebDriver)driver, by, timeout);
public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
{
// NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/
var webDriverWait = new WebDriverWait(driver, timeout)
{
// Will default to the DefaultWait polling interval of selenium which is as of writing half a second
PollingInterval = pollingInterval
};
// We're polling the dom, so this is normal procedure and not an exception.
webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));
return webDriverWait
.Until(drv => drv.FindElement(@by));
}
}
Then i'd use ifm-result-item
as css class selector, that should give you a list of all html elements with their values:
<div class="ifm-result-item">
<div class="ifm-result-item__product-info">
<button class="ifm-result-item__toggle hide-md- normalize" aria-expanded="false" data-test="ifm-result-item-toggle">
<svg viewBox="0 0 24 24" class="ifm-result-item__toggle-icon inline-icon" aria-hidden="true">
<use href="#chevron-d" class="icon-svg--fat"></use>
</svg>
</button>
<div class="ifm-result-item__product-info-inner">
<a href="/de/en/product/IEW200" class="ifm-result-item__product-link-wrapper" data-test="ifm-result-item-link">
<span class="ifm-result-item__image">
<div class="ifm-product-thumbnail"><img srcset="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x" src="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200" class="ifm-product-thumbnail__img" loading="lazy" style=""></div>
</span>
<div>
<span class="ifm-result-item__product-link">IEW200</span>
<div class="ifm-result-item__product-description hide-lg+">Inductive sensor</div>
</div>
</a>
<div class="ifm-labeled-value-section ifm-result-item__product-info-details">
<div class="ifm-labeled-value-section__entry">
<div class="ifm-labeled-value-section__label hyphens">Dimensions</div>
<!---->
<div class="ifm-labeled-value-section__value hyphens">M8 x 1 / L = 40 mm</div>
</div>
<div class="ifm-labeled-value-section__entry">
<div class="ifm-labeled-value-section__label hyphens">Sensing range</div>
<!---->
<div class="ifm-labeled-value-section__value hyphens">3 mm flush mountable</div>
</div>
<div class="ifm-labeled-value-section__entry">
<div class="ifm-labeled-value-section__label hyphens">Output function</div>
<!---->
<div class="ifm-labeled-value-section__value hyphens">normally open</div>
</div>
<div class="ifm-labeled-value-section__entry">
<div class="ifm-labeled-value-section__label hyphens">Output</div>
<!---->
<div class="ifm-labeled-value-section__value hyphens">DC PNP</div>
</div>
<div class="ifm-labeled-value-section__entry">
<div class="ifm-labeled-value-section__label hyphens">Connection</div>
<!---->
<div class="ifm-labeled-value-section__value hyphens">M8 Connector</div>
</div>
</div>
</div>
</div>
<div class="ifm-result-item__expandable-functions" style="display: none;">
<hr class="ifm-result-item__separator hr">
<div class="ifm-expandable-functions ifm-result-item__collapsed-details">
<div class="ifm-expandable-functions__item">
<div class="ifm-product-price">
<div class="ifm-product-price__list-price ifm-list-price"><span class="ifm-list-price__label">List price:</span><span class="ifm-list-price__value" data-test="ifm-list-price">55,40 €</span></div>
<div class="ifm-product-price__individual-price ifm-individual-price"><span class="ifm-individual-price__label">Your price:</span><button type="button" class="ifm-individual-price__show-price hover-link-2 normalize" data-test="ifm-show-price">Please log in</button></div>
</div>
</div>
<div class="ifm-add-to-cart ifm-expandable-functions__item ifm-expandable-functions__cart-items">
<label class="ifm-add-to-cart__input ifm-input-label">
<div class="ifm-quantity-input" data-test="ifm-add-to-cart-input">
<div class="ifm-quantity-input__minus"><input type="button" class="normalize" data-field="quantity" value="-"></div>
<input step="1" min="1" max="9999" type="number" maxlength="4" name="quantity" class="normalize ifm-quantity-input__input-field">
<div class="ifm-quantity-input__plus"><input type="button" class="normalize" data-field="quantity" value="+"></div>
</div>
</label>
<button class="ifm-add-to-cart__button ifm-button normalize" data-test="ifm-add-to-cart-button">Add to the shopping basket</button>
</div>
<div class="ifm-expandable-functions__shop-items">
<button class="ifm-wishlist hover-link-2 normalize ifm-expandable-functions__shop-item" data-test="ifm-wishlist-button">
<svg viewBox="0 0 24 24" aria-hidden="true" class="inline-icon">
<use href="#heart" class="icon-svg--thin"></use>
</svg>
<span class="hide-lg-">Save for later</span>
</button>
<button class="normalize ifm-compare-products hover-link-2 ifm-expandable-functions__shop-item hide-md-" data-test="ifm-compare-products-button">
<svg viewBox="0 0 1792 1792" class="inline-icon">
<use href="#compress"></use>
</svg>
<span class="hide-lg-">Compare</span>
</button>
</div>
</div>
</div>
</div>
For this particular site/page you could also just use the javascript api;
https://www.ifm.com/restservices/de/en/category/200_010_010_010/productsAndAttributes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论