Page 1 of 1

Unicode decimal code 
 removal

Posted: 16 Jul 2020 17:21
by Micky Micky
Hello




I have a string that contains this. I can't remove it. I've tried replace and replaceAll.

The string is inserted into an email and 
 shows up as 


Any help would be appreciated.

Many thanks

Micky

Re: Unicode decimal code 
 removal

Posted: 18 Jul 2020 16:29
by Desmanto
The replace should work, try this.

Code: Select all

text = "hello &#10";
rep = replace(text, "&#10", ""); //rep = "hello "
Your problem is not with Automagic script. &#10 is usually xml or html conversion from line feed ("\n" in Automagic). Try to remove any newline from the text and see if it still appear.

It is quite hard to test without knowing where you get problem. If you have the step to replicate the problem, I can test it out here.

Re: Unicode decimal code 
 removal

Posted: 18 Jul 2020 17:44
by Micky Micky
http://automagic4android.com/flow.php?i ... 8150b90adb

This scrapes a Twitter feed.
The variable tweet should contain it. It talks about visiting an office in Baker Street

Thanks

Micky

Re: Unicode decimal code 
 removal

Posted: 19 Jul 2020 15:25
by Desmanto
You can use regex to parse it in one line. I just copy the portion which have tweet-container into regex tester, and then replace the tweet id with \d+ and the text with (?s:(.*?)), which will capture the message including line feed. Then the next line replace the 
 with blank, and trim to remove any white space.

Put this in your 3 element (script which have start end), then connect directly to debug dialog.

Code: Select all

find = findAll(response, '<tr class="tweet-container">\n  <td colspan="2" class="tweet-content">\n\n\n    <div class="tweet-text" data-id="\\d+">\n      <div class="dir-ltr" dir="ltr">(?s:(.*?))</div>\n    </div>\n\n\n  </td>\n</tr>', true);
tweet = trim(replaceAll(find[0][1], "&#10;", ""));
The regex is highly specific for this pattern. If they change the pattern, the script may fail. So need to adapt again if twitter change their html code.

Re: Unicode decimal code &#10; removal

Posted: 19 Jul 2020 16:06
by Micky Micky
Hello

find is an empty list. Am I right to say it doesn't require my use of indexOf etc?

I used the 'trim(replaceAll' part in my crude version and it removed the &#10;

So thank you for that.

Micky

Re: Unicode decimal code &#10; removal

Posted: 20 Jul 2020 06:33
by Desmanto
If it is blank, means no match found. Need to check where it stop matching. I just tried here using your flow above, it still can found match. Different url might produce different result.

You don't need indexOf(), substring() or any split(). Regex has done it for us, if it can match properly already.

Re: Unicode decimal code &#10; removal

Posted: 20 Jul 2020 19:41
by Micky Micky
Hello

I think it's time for me to learn regex. I have another flow that uses my crude method repeatedly. Using findAll is much better.

Funny that your replaceAll worked. I did try it before asking for help.

Thanks for all your help.

Micky

Re: Unicode decimal code &#10; removal

Posted: 20 Jul 2020 20:30
by Micky Micky

Code: Select all

find = findAll(response, 'data-id="\\d+">\n      <div class="dir-ltr" dir="ltr">(?s:(.*?))<', true);
This got all the tweets.

It seems thst 'data-id' is unique to the tweets.

Still haven't learned regex though.

Micky

Re: Unicode decimal code &#10; removal

Posted: 22 Jul 2020 15:56
by Desmanto
Sometimes it is because of white space. You can check if the whitespace is there by pasting it to the regex tester. Non standard char usually revealed in the script.

Your regex is too wide, it will match too many. Maybe you miss copy some of my script, or maybe somehow when you use http request from your device (or region), the result is different. You need to copy that chunk of the twitter you want into regex tester and replace the text as what I have done.

Re: Unicode decimal code &#10; removal

Posted: 22 Jul 2020 21:05
by Micky Micky
Hello,

It's working fine. It's definitely not too wide a scope.

findAll without regex has replaced loops in other flows. It's made a big difference.

Thanks for enlightening me!

Micky