Basically you wanna get the city, temperature and weather forecast right?
That html cannot be parsed using XPath, but still can be done using regex. There is alternative xml version stated at the html.
But I don't know if the link will change or not. So I will just stick with your original link.
You got the html file already. I assume you know how to use Action Http request to that link and save the result in response. Now we are going to process the response to extract that 3 data.
Add a new action : Script and put this
Code: Select all
match = findAll(response, '(?:<meta property="og:title" content=\\")(.*)(?:\\" \\/\>)');
content = substring(match[0],35,length(match[0])-4);
content = split(content," \\| ");
city = content[0];
temp = replace(content[1],"°"," \u00b0C");
forecast = content[2];
The first match will match the line contain the data.
{content} will strip out the tag, leaving exactly as the 3 data we need.
We split it again base on | symbol, to get 3 elements
Then assign each element to corresponding variable. For temperature, replace the ° with unicode u00b0 and C (u00b0 is the unicode for degree), becomes °C
So the result will be stored in {city}, {temp} and {forecast}. Use them as you want.
================
@Martin : I am a little confused with automagic regex parsing. I have always wanna ask this before. When I use regex with (?:abc) pattern, it supposed to match but not capture anything inside the bracket 'abc'
Example :
If I use this at regex tester
(?:<meta property="og:title" content=\\")(.*)(?:\\" \\/\>)
It will show me that it capture only the (.*), like this
<meta property="og:image" content="
https://icons.wxug.com/i/c/k2/clear.png" />
<meta property="og:title" content="Biskra, Algeria | 41° | Clear" />
<meta name="apple-itunes-app" content="app-id=486154808, affiliate-data=at=1010lrYB&ct=website_wu" />
There is underline at
Biskra, Algeria | 41° | Clear. So I know I am using the correct regex. Tested at RegExr also shows the capture $1 properly. <meta .... was matched but not captured.
But when using function findAll(), it matched the whole line. The result for the match above is list with single variable contain the whole line.
match[0] = <meta property="og:title" content="Biskra, Algeria | 41° | Clear" />
Even if I put grouping like this (without the non capturing group ?:)
(<meta property="og:title" content=\\")(.*)(\\" \\/\>)
I expect it will have 3 capturing group.
match[0] = <meta property="og:title" content="
match[1] = Biskra, Algeria | 41° | Clear
match[2] = " />
But turns out to be the same, only single match[0] contain the whole matching line. There is no match[1] or match[2].
Is it supposed to be like this in automagic? Or it is a kind of bug? Similiar to XPath implementation, where the syntax stop only at first match.
Sorry if I kinda hijack the thread, but since the regex is used here, I think it is better to ask it here directly.