在Selenium中获取无效类型的元素。

huangapple go评论84阅读模式
英文:

getting invaild type of elements IN SELENIUM

问题

在Linux Ubuntu 20.04上使用IntelliJ IDEA最新的社区版本与Firefox和GeckoDriver一起工作。

我正在尝试从网页中获取一些时间表,并将它们复制到一个.txt文件(或一个列表,无所谓)中。

我正在尝试这样做:

WebDriver driver = new FirefoxDriver();
driver.manage().timeouts().implicitlyWait(2000, TimeUnit.MILLISECONDS); //最大等待时间
driver.get("http://telematics.oasa.gr/#main");
driver.findElement(By.xpath("//option[contains(.,'021')]")).click();//选择行程

List<WebElement> oas = driver.findElements(By.xpath("//div/ul/li"));
System.out.println(oas.size());
System.out.println(oas);

页面链接:http://telematics.oasa.gr/#lineDetails_1151_021%20:%20%CE%A0%CE%9B%CE%91%CE%A4%CE%95%CE%99%CE%91%20%CE%9A%CE%91%CE%9D%CE%99%CE%93%CE%93%CE%9F%CE%A3%20-%20%CE%93%CE%9A%CE%A5%CE%96H%20(%CE%9A%CE%A5%CE%9A%CE%9B%CE%99%CE%9A%CE%97)_9-86

以下是页面的HTML片段:

<li class="list-group-item scheduleEntryL"><button type="button" class="btn btn-info btn-circle" style="cursor:default;">07</button>&nbsp;&nbsp;&nbsp;07:10 &nbsp;&nbsp;&nbsp; 07:25 &nbsp;&nbsp;&nbsp; 07:40 &nbsp;&nbsp;&nbsp; 07:55 &nbsp;&nbsp;&nbsp; </li>

在这之后的输出是:

19        
[[[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -> xpath: //div/ul/li], ...]

这意味着我的列表有19个元素,但这不是我想要的。

总结:

  1. 我得到的元素类型不正确。

列表应该包含:

[...,07:00,07:10,07:25,...]
  1. 它应该包含59个元素,因为页面上提供了59个出发时间,但其中一些在同一行。

页面有19行,所以它可能将每一行都作为一个元素提供,这也不是我想要的。

请帮忙解决。

备注:
我已经在这个页面上检查了类似的帖子,但没有帮助。

英文:

working on Linux ubuntu 20.04 with intellij IDEA latest community version with firefox and geckodriver

I am trying to get some timetables from a webpage and copy them to a .txt file (or a list doesn't matter)

I am trying this :

WebDriver driver = new FirefoxDriver();
    driver.manage().timeouts().implicitlyWait(2000, TimeUnit.MILLISECONDS); //MAXIMUM WAIT TIME
    driver.get(&quot;http://telematics.oasa.gr/#main&quot;);
    driver.findElement(By.xpath(&quot;//option[contains(.,&#39;021&#39;)]&quot;)).click();//selecting trip

    List&lt;WebElement&gt; oas = driver.findElements(By.xpath(&quot;//div/ul/li&quot;));
    System.out.println(oas.size());
    System.out.println(oas);

page link : http://telematics.oasa.gr/#lineDetails_1151_021%20:%20%CE%A0%CE%9B%CE%91%CE%A4%CE%95%CE%99%CE%91%20%CE%9A%CE%91%CE%9D%CE%99%CE%93%CE%93%CE%9F%CE%A3%20-%20%CE%93%CE%9A%CE%A5%CE%96H%20(%CE%9A%CE%A5%CE%9A%CE%9B%CE%99%CE%9A%CE%97)_9-86

here is the html of the page :

    &lt;li class=&quot;list-group-item scheduleEntryL&quot;&gt;&lt;button type=&quot;button&quot; class=&quot;btn btn-info btn-circle&quot; style=&quot;cursor:default;&quot;&gt;07&lt;/button&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;07:10 &amp;nbsp;&amp;nbsp;&amp;nbsp; 07:25 &amp;nbsp;&amp;nbsp;&amp;nbsp; 07:40 &amp;nbsp;&amp;nbsp;&amp;nbsp; 07:55 &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/li&gt;

and after this the output is :

    19        

    [[[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li], [[FirefoxDriver: firefox on LINUX (115ffdb6-1eb7-44c4-bebd-dee885674bab)] -&gt; xpath: //div/ul/li]]

which means my list has 19 elements but its not what i want

//SUMMARY

  1. The type of the elements i get is not right

the list should contain:

[...,07:00,07:10,07:25,....]

2.It should contain 59 elemnts because there are 59 departures given in the page but some of them are in the same line

the page has 19 lines so it is propably giving every line as ONE element and this is also not what i want

PLEASE HELP

//I HAVE CHECKED SIMMILAR POSTS ON THIS PAGE AND DID NOT HELP

答案1

得分: 1

你正在使用XPath,它会返回页面中的行,但不会返回包含所需值的实际元素。此外,文本值位于两个节点之间,因此我们需要使用JavaScript来处理。

// 提取不带 * 的出发时间的代码
List<WebElement> oas = driver.findElements(By.xpath("//li[@class='list-group-item scheduleEntryL']"));

LinkedList<String> timeValues = new LinkedList<String>();
String myCountryProxy = null;

for(WebElement element : oas) {
    myCountryProxy = ((JavascriptExecutor) driver).executeScript("return arguments[0].childNodes[1].textContent;", element).toString();
    
    if (!myCountryProxy.equalsIgnoreCase("   ")) {
        myCountryProxy = myCountryProxy.replaceAll("   ", " ").replaceAll("   ", "").replaceAll("   ", "");
        myCountryProxy = myCountryProxy.trim();
        String[] split = myCountryProxy.split("\\s+");
        for (String str : split) {
            timeValues.add(str);
        }
    }
}

// 提取带 * 的出发时间值
oas = driver.findElements(By.xpath("//span[@class='xtra']"));
for (WebElement element : oas) {
    myCountryProxy = ((JavascriptExecutor) driver).executeScript("return arguments[0].childNodes[0].textContent;", element).toString();
    timeValues.add(myCountryProxy);
}

// 提取与 * 相同行的出发值,即:05:10     05:30     05:50
oas = driver.findElements(By.xpath("//li[@class='list-group-item scheduleEntryL']/span"));
if (oas.size() > 0) {
    oas = driver.findElements(By.xpath("//li[@class='list-group-item scheduleEntryL']/span/.."));
    for (WebElement element : oas) {
        myCountryProxy = ((JavascriptExecutor) driver).executeScript("return arguments[0].childNodes[3].textContent;", element).toString();
        
        if (!myCountryProxy.equalsIgnoreCase("   ")) {
            myCountryProxy = myCountryProxy.replaceAll("   ", " ").replaceAll("   ", "").replaceAll("   ", "");
            myCountryProxy = myCountryProxy.trim();
            String[] split = myCountryProxy.split("\\s+");
            for (String str : split) {
                timeValues.add(str);
            }
        }
    }
}

System.out.println(timeValues);

请注意,我已将具有 * 的时间值从列表的末尾添加进去。要提取节点之间的文本,我参考了此 Stack Overflow 帖子

此外,如果您想选择下拉列表的值,请使用 Selenium 的 Select 类:

Select select = new Select(driver.findElement(By.id("lineSelect")));
select.selectByValue("1151_54_9"); // 选择下拉列表值: 021 : ΠΛΑΤΕΙΑ ΚΑΝΙΓΓΟΣ - ΓΚΥΖH (ΚΥΚΛΙΚΗ)
英文:

Your're using xpath which will return rows from page but not actual elements which contains required values. Also text values are between two nodes, so we need to use JavaScript for it.

        //Code to extract Departure time without *
List&lt;WebElement&gt; oas = driver.findElements(By.xpath(&quot;//li[@class=&#39;list-group-item scheduleEntryL&#39;]&quot;));
LinkedList&lt;String&gt; timeValues = new LinkedList&lt;String&gt;();
String myCountryProxy = null;
for(WebElement element:oas)
{
//As text is between two nodes, we need to use javaScript as selenium getText() method didn&#39;t work for it
//So in below javaScript, we refer to oas as parent element as then we try to find node which contain text as childNode. In this code its 2nd child node so we have passed value as 1: childNodes[1]
myCountryProxy = ((JavascriptExecutor)driver).executeScript(&quot;return arguments[0].childNodes[1].textContent;&quot;, element).toString();
//Code to remove extra space from String
if(!myCountryProxy.equalsIgnoreCase(&quot;&#160;&#160;&#160;&quot;))
{					
myCountryProxy = myCountryProxy.replaceAll(&quot; &#160;&#160;&#160; &quot;,&quot; &quot;).replaceAll(&quot;&#160;&#160;&#160;&quot;,&quot;&quot;).replaceAll(&quot; &#160;&#160;&#160; &quot;,&quot;&quot;);
myCountryProxy = myCountryProxy.trim();
//Split string into individual value
String[] split = myCountryProxy.split(&quot;\\s+&quot;);
for(String str:split)
{
timeValues.add(str);
}					
}				
}
//Code to extract departure time values with *: 05:00*
oas = driver.findElements(By.xpath(&quot;//span[@class=&#39;xtra&#39;]&quot;));
for(WebElement element:oas)
{
//here node which contains text is 1st child node only, so we have passed value as childNodes[0]
myCountryProxy = ((JavascriptExecutor)driver).executeScript(&quot;return arguments[0].childNodes[0].textContent;&quot;, element).toString();
timeValues.add(myCountryProxy);
}
//Code to extract departure values which are same line of time with * i.e.: 05:10     05:30     05:50  
oas = driver.findElements(By.xpath(&quot;//li[@class=&#39;list-group-item scheduleEntryL&#39;]/span&quot;));
if(oas.size()&gt;0)
{
oas = driver.findElements(By.xpath(&quot;//li[@class=&#39;list-group-item scheduleEntryL&#39;]/span/..&quot;));
for(WebElement element:oas)
{
//here node which contains text is 4th child node, so we have passed value as childNodes[4]
myCountryProxy = ((JavascriptExecutor)driver).executeScript(&quot;return arguments[0].childNodes[3].textContent;&quot;, element).toString();
//Code to remove extra space from String
if(!myCountryProxy.equalsIgnoreCase(&quot; &#160;&#160;&#160; &quot;))
{					
myCountryProxy = myCountryProxy.replaceAll(&quot; &#160;&#160;&#160; &quot;,&quot; &quot;).replaceAll(&quot;&#160;&#160;&#160;&quot;,&quot;&quot;).replaceAll(&quot; &#160;&#160;&#160; &quot;,&quot;&quot;);
myCountryProxy = myCountryProxy.trim();
//Split string into individual value
String[] split = myCountryProxy.split(&quot;\\s+&quot;);
for(String str:split)
{
timeValues.add(str);
}					
}					
}
}
System.out.println(timeValues);

I have this executed code at my end and getting below output:

[06:10, 06:25, 06:40, 06:55, 07:05, 07:15, 07:25, 07:35, 07:45, 07:55, 08:05, 08:20, 08:30, 08:40, 08:55, 09:05, 09:15, 09:30, 09:50, 10:10, 10:25, 10:45, 11:00, 11:20, 11:35, 11:55, 12:10, 12:30, 12:45, 13:20, 13:40, 13:55, 14:10, 14:25, 14:35, 14:45, 15:00, 15:10, 15:20, 15:35, 15:45, 15:55, 16:10, 16:20, 16:30, 16:45, 16:55, 17:05, 17:20, 17:35, 17:55, 18:15, 18:30, 18:45, 19:00, 19:20, 19:35, 19:55, 20:10, 20:25, 20:40, 21:00, 21:20, 21:40, 22:05, 22:30, 22:55, 05:00*, 23:20*, 05:10, 05:30, 05:50]

Please note, I have added values from row which contains time with *, at end of list. So you will find that values at end of list.

To extract text between nodes, I have referred to: this SO post

Also, if you want to select drop down value, use Select class of selenium:

Select select = new Select(driver.findElement(By.id(&quot;lineSelect&quot;)));
//to select drop down value: 021 : ΠΛΑΤΕΙΑ ΚΑΝΙΓΓΟΣ - ΓΚΥΖH (ΚΥΚΛΙΚΗ)
select.selectByValue(&quot;1151_54_9&quot;);

huangapple
  • 本文由 发表于 2020年9月9日 21:34:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/63812840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定