Page 1 of 1
Character encoding in HTTP-Request
Posted: 27 Nov 2019 22:37
by Horschte
Hi guys,
I'm screenscraping an internet page using the action HTTP-Request. The problem is that the internet page is encoded in iso-8859-1. The response I get from the action shows some special characters as question marks. I suspect that Automagic uses UTF-8 by default so the special characters get replaced.
How can I set the HTTP-Request to use iso-8859-1? I already tried the costum header option using "Accept-Charset:iso-8859-1" but it's not working.
Any help?
Thank you very much.
Re: Character encoding in HTTP-Request
Posted: 28 Nov 2019 18:02
by Desmanto
I never tried it out. Do you have maybe other website that use iso-8859-1 that I can test with?
Also maybe try custom header Accept-Charset: utf-8, iso-8859-1;q=0.5
Re: Character encoding in HTTP-Request
Posted: 07 Dec 2019 16:19
by Horschte
The solution goes like this:
Instead of saving the response from the HTTP-Request to a variable save it to a file. Then load that file using the action Init Variable Text File and set the Encoding to "iso-8859-1".
Credits for this solution go to Desmanto.
Re: Character encoding in HTTP-Request
Posted: 07 Dec 2019 17:11
by Desmanto
Nice to see the solution works.
@Martin : I wonder this might be a bug. But what Horschte encounter is when we HTTP request using the iso_8859-1 charset, if we store the result in {response}, and view it in debug dialog; some chararcter won't show up properly. It seems the debug dialog force to show it in UTF-8.
Saving it to file, make the charset correct. And if we init the file back using iso_8859-1, then debug dialog will show it up properly this time.
Re: Character encoding in HTTP-Request
Posted: 09 Dec 2019 15:34
by Martin
This might indeed be a bug. The action should respect the encoding but likely the action does not do it right in all circumstances.
Is the URL publicly accessible so I can test it myself?
What device model and Android version are you using?
Thanks & Regards,
Martin
Re: Character encoding in HTTP-Request
Posted: 13 Dec 2019 21:38
by Martin
The server does not indicate the encoding so Automagic falls back to UTF-8 which is not correct. However I fear that not all files without encoding are actually ISO-8859-1 so I will provide a new configuration to optionally specify the encoding.
Regards,
Martin