Unicode decimal code 
 removal

Post your questions and help other users.

Moderator: Martin

Post Reply
Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Unicode decimal code 
 removal

Post by Micky Micky » 16 Jul 2020 17:21

Hello




I have a string that contains this. I can't remove it. I've tried replace and replaceAll.

The string is inserted into an email and 
 shows up as 


Any help would be appreciated.

Many thanks

Micky
Crude but it works.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Unicode decimal code 
 removal

Post by Desmanto » 18 Jul 2020 16:29

The replace should work, try this.

Code: Select all

text = "hello &#10";
rep = replace(text, "&#10", ""); //rep = "hello "
Your problem is not with Automagic script. &#10 is usually xml or html conversion from line feed ("\n" in Automagic). Try to remove any newline from the text and see if it still appear.

It is quite hard to test without knowing where you get problem. If you have the step to replicate the problem, I can test it out here.
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Re: Unicode decimal code 
 removal

Post by Micky Micky » 18 Jul 2020 17:44

http://automagic4android.com/flow.php?i ... 8150b90adb

This scrapes a Twitter feed.
The variable tweet should contain it. It talks about visiting an office in Baker Street

Thanks

Micky
Crude but it works.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Unicode decimal code 
 removal

Post by Desmanto » 19 Jul 2020 15:25

You can use regex to parse it in one line. I just copy the portion which have tweet-container into regex tester, and then replace the tweet id with \d+ and the text with (?s:(.*?)), which will capture the message including line feed. Then the next line replace the 
 with blank, and trim to remove any white space.

Put this in your 3 element (script which have start end), then connect directly to debug dialog.

Code: Select all

find = findAll(response, '<tr class="tweet-container">\n  <td colspan="2" class="tweet-content">\n\n\n    <div class="tweet-text" data-id="\\d+">\n      <div class="dir-ltr" dir="ltr">(?s:(.*?))</div>\n    </div>\n\n\n  </td>\n</tr>', true);
tweet = trim(replaceAll(find[0][1], "&#10;", ""));
The regex is highly specific for this pattern. If they change the pattern, the script may fail. So need to adapt again if twitter change their html code.
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Re: Unicode decimal code &#10; removal

Post by Micky Micky » 19 Jul 2020 16:06

Hello

find is an empty list. Am I right to say it doesn't require my use of indexOf etc?

I used the 'trim(replaceAll' part in my crude version and it removed the &#10;

So thank you for that.

Micky
Crude but it works.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Unicode decimal code &#10; removal

Post by Desmanto » 20 Jul 2020 06:33

If it is blank, means no match found. Need to check where it stop matching. I just tried here using your flow above, it still can found match. Different url might produce different result.

You don't need indexOf(), substring() or any split(). Regex has done it for us, if it can match properly already.
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Re: Unicode decimal code &#10; removal

Post by Micky Micky » 20 Jul 2020 19:41

Hello

I think it's time for me to learn regex. I have another flow that uses my crude method repeatedly. Using findAll is much better.

Funny that your replaceAll worked. I did try it before asking for help.

Thanks for all your help.

Micky
Crude but it works.

Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Re: Unicode decimal code &#10; removal

Post by Micky Micky » 20 Jul 2020 20:30

Code: Select all

find = findAll(response, 'data-id="\\d+">\n      <div class="dir-ltr" dir="ltr">(?s:(.*?))<', true);
This got all the tweets.

It seems thst 'data-id' is unique to the tweets.

Still haven't learned regex though.

Micky
Crude but it works.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Unicode decimal code &#10; removal

Post by Desmanto » 22 Jul 2020 15:56

Sometimes it is because of white space. You can check if the whitespace is there by pasting it to the regex tester. Non standard char usually revealed in the script.

Your regex is too wide, it will match too many. Maybe you miss copy some of my script, or maybe somehow when you use http request from your device (or region), the result is different. You need to copy that chunk of the twitter you want into regex tester and replace the text as what I have done.
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

Micky Micky
Posts: 179
Joined: 16 Oct 2019 17:38

Re: Unicode decimal code &#10; removal

Post by Micky Micky » 22 Jul 2020 21:05

Hello,

It's working fine. It's definitely not too wide a scope.

findAll without regex has replaced loops in other flows. It's made a big difference.

Thanks for enlightening me!

Micky
Crude but it works.

Post Reply