Page 1 of 1

HTML parsing

Posted: 10 Jan 2017 15:49
by fractals
I'm trying to find a way to read a webpage and display selected information from it using string variables. My attempts to parse the result of a HTTP GET Request using:

Code: Select all

evaluateXPathAsString
failed. Later, I've come across a post on this forum where Martin said that this function only works with XML proper and that it should not be used with HTML.

I know that some people on this forum were able to successfully retrieve information from a website but I did not find any specific information or suggestions how this could be done (my apologies if I've missed it). I'd rather avoid using substring operations for this because the HTML source I need to parse is dynamic and I think this approach could yield inconsistent results.

Would anyone have any suggestions how this could be done? Thanks in advance.

Re: HTML parsing

Posted: 11 Jan 2017 12:08
by mbirth
I'm doing this in a flow with said evaluateXPathAsString(). However, the page I'm parsing is XHTML, so not only HTML, but valid XML, too. The queries I'm using are:

Code: Select all

DP_percent=evaluateXPathAsString(response, '//*[@class="progressBar"]/div/@style');
...
DP_text=trim(evaluateXPathAsString(response, 'string(//*[contains(@class, "barTextBelow")])'));
...
DP_info=evaluateXPathAsString(response, '(//body//p)[1]/text()[2]');
DP_expiry=evaluateXPathAsString(response, '//*[contains(@class, "expiryTime")[2]/text()');
And they all work as expected.