Reguläre Ausdrücke

Reguläre Ausdrücke können verwendet werden um zu prüfen, ob ein Text einen anderen Text enthält oder allgemeiner, ob ein Pattern den Text matched.

Einfache Pattern-Beispiele:

`Test.*`	Das Pattern matcht einen Text der mit Test beginnt, zum Beispiel Test, Testabc aber nicht abcTest, 123Test123
`Test.+`	Das Pattern matcht einen Text der mit Test beginnt und darauf mindestens ein weiteres Zeichen folgt, zum Beispiel TestX aber nicht Test
`[a-c]{3}`	Das Pattern matcht einen Text der aus genau drei Zeichen a, b oder c besteht, zum Beispiel aaa, abc, cba aber nicht xyz.
`\d{3}.*`	Das Pattern matcht einen Text der mit drei Zahlen beginnt, zum Beispiel 123abc, 123456 aber nicht 12abc oder abc123.

Automagic unterstützt reguläre Ausdrücke mit folgenden Funktionen in Action Script oder Condition Expression:

Boolean matches(String s, String pattern)
Prüft ob der String s vom regulären Ausdruck pattern gematched wird.
Boolean matches(String s, String pattern, List groups)
Prüft ob der String s vom regulären Ausdruck pattern gematched wird und füllt die Gruppen in die bestehende Liste groups ab.
List findAll(String s, String pattern)
Gibt eine Liste der gefundenen Werte in s der Regex pattern.
List findAll(String s, String pattern, boolean returnGroups)
Gibt eine Liste der gefundenen Werte in s der Regex pattern. Optional besteht jedes Element aus einer Liste der enthaltenen Gruppen. (siehe Reguläre Ausdrücke)
String replaceAll(String s, String regex, String replacement)
Gibt den neuen String zurück, bei dem alle Substrings welche regex matchen mit replacement ersetzt werden.
List split(String s, String pattern)
Teilt den String s in ein Array von String mit dem regulären Ausdruck pattern als Trenner.

Beispiele für Funktion `Boolean matches(String s, String pattern)`

Testet ob der Text das Wort test enthält:
result = matches("das ist ein test", ".*test.*")
Testet ob der Text mit drei Zahlen beginnt:
result = matches("1234567", "\\d{3}.*")
Der reguläre Ausdruck würde nur einen Backslash erfordern, innerhalb eines Strings muss aber ein Backslash durch einen zusätzlichen Backslash escaped werden.

Beispiele für Funktion `Boolean matches(String s, String pattern, List groups)`

Testet ob der Text eine Zahl enthält und speichert die Zahl in der Liste groups:
groups = newList(); result = matches("Kontakt 123456 ruft an", "\\D*(\\d*).*");
Die Liste groups enthält die zwei Elemente "Kontakt 123456 ruft an" und "123456".

Beispiele für Funktion `List findAll(String s, String pattern)`

Gibt eine Liste der gefundenen Werte in s der Regex pattern:
result = findAll("Kontakt 123456 ruft um 8 Uhr an", "\\d*");
Die Liste groups enthält die zwei Elemente "123456" und "8".

Beispiele für Funktion `String replaceAll(String s, String regex, String replacement)`

Entferne alle Zahlen aus dem Text:
result = replaceAll("Kontakt 123456 ruft an", "\\d", "");
result enthält "Kontakt ruft an".
Entferne mehrere aufeinanderfolgende Leerzeichen durch ein einzelnes Leerzeichen:
result = replaceAll("a b c", "\\s+", " ");
result enthält "a b c".
Trenne die Zeichen in einem Text durch ein Leerzeichen:
result = replaceAll("1234567", "(.)", "$1 ");
result enthält "1 2 3 4 5 6 7 ".
$1 bezieht sich auf den Text in der ersten Capturing-Gruppe in runden Klammern.
Füge verschiedene Trennzeichen ein:
result = replaceAll("123456", "(.)(.)", "$1/$2-");
result enthält "1/2-3/4-5/6-".
$1 bezieht sich auf den Text in der ersten Capturing-Gruppe in runden Klammern, $2 bezieht sich auf den Text in der zweiten Capturing-Gruppe in runden Klammern.

Beispiele für Funktion `List split(String s, String pattern)`

Trenne den Text bei den Doppelpunkten aud und erzeuge eine Liste:
result = split("Das:ist:ein:Test", ":");
Die Liste result enthält die vier Elemente "Das", "ist", "ein" und "Test".
Trenne den Text in eine Liste der Wörter auf:
result = split("Das ist ein:Test", "\\W");
Die Liste result enthält die vier Elemente "Das", "ist", "ein" und "Test".

Android reguläre Ausdrücke

Automagic verwendet die eingebauten Klassen von Android für die regulären Ausdrücke. Android API online Dokumentation

Die folgende Dokumentation ist ein Auszug der wichtigsten Features der Syntax für reguläre Ausdrücke.

Escape sequences

\	Quote the following metacharacter (so `\.` matches a literal `.`).
\Q	Quote all following metacharacters until `\E`.
\E	Stop quoting metacharacters (started by `\Q`).
\\	A literal backslash.
\uhhhh	The Unicode character U+hhhh (in hex).
\xhh	The Unicode character U+00hh (in hex).
\cx	The ASCII control character ^x (so `\cH` would be ^H, U+0008).
\a	The ASCII bell character (U+0007).
\e	The ASCII ESC character (U+001b).
\f	The ASCII form feed character (U+000c).
\n	The ASCII newline character (U+000a).
\r	The ASCII carriage return character (U+000d).
\t	The ASCII tab character (U+0009).

Character classes

It's possible to construct arbitrary character classes using set operations:

[abc]	Any one of `a`, `b`, or `c`. (Enumeration.)
[a-c]	Any one of `a`, `b`, or `c`. (Range.)
[^abc]	Any character except `a`, `b`, or `c`. (Negation.)
[[a-f][0-9]]	Any character in either range. (Union.)
[[a-z]&&[jkl]]	Any character in both ranges. (Intersection.)

Most of the time, the built-in character classes are more useful:

\d	Any digit character (see note below).
\D	Any non-digit character (see note below).
\s	Any whitespace character (see note below).
\S	Any non-whitespace character (see note below).
\w	Any word character (see note below).
\W	Any non-word character (see note below).
\p{NAME}	Any character in the class with the given NAME.
\P{NAME}	Any character not in the named class.

Note that these built-in classes don't just cover the traditional ASCII range. For example, \w is equivalent to the character class [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]. For more details see Unicode TR-18, and bear in mind that the set of characters in each class can vary between Unicode releases. If you actually want to match only ASCII characters, specify the explicit characters you want; if you mean 0-9 use [0-9] rather than \d, which would also include Gurmukhi digits and so forth.

Quantifiers

Quantifiers match some number of instances of the preceding regular expression.

*	Zero or more.
?	Zero or one.
+	One or more.
{n}	Exactly n.
{n,}	At least n.
{n,m}	At least n but not more than m.

Quantifiers are "greedy" by default, meaning that they will match the longest possible input sequence. There are also non-greedy quantifiers that match the shortest possible input sequence. They're same as the greedy ones but with a trailing ?:

*?	Zero or more (non-greedy).
??	Zero or one (non-greedy).
+?	One or more (non-greedy).
{n}?	Exactly n (non-greedy).
{n,}?	At least n (non-greedy).
{n,m}?	At least n but not more than m (non-greedy).

Quantifiers allow backtracking by default. There are also possessive quantifiers to prevent backtracking. They're same as the greedy ones but with a trailing +:

*+	Zero or more (possessive).
?+	Zero or one (possessive).
++	One or more (possessive).
{n}+	Exactly n (possessive).
{n,}+	At least n (possessive).
{n,m}+	At least n but not more than m (possessive).

Zero-width assertions

^	At beginning of line.
$	At end of line.
\A	At beginning of input.
\b	At word boundary.
\B	At non-word boundary.
\G	At end of previous match.
\z	At end of input.
\Z	At end of input, or before newline at end.

Look-around assertions

Look-around assertions assert that the subpattern does (positive) or doesn't (negative) match after (look-ahead) or before (look-behind) the current position, without including the matched text in the containing match. The maximum length of possible matches for look-behind patterns must not be unbounded.

(?=a)	Zero-width positive look-ahead.
(?!a)	Zero-width negative look-ahead.
(?<=a)	Zero-width positive look-behind.
(?<!a)	Zero-width negative look-behind.

Groups

(a)	A capturing group.
(?:a)	A non-capturing group.
(?>a)	An independent non-capturing group. (The first match of the subgroup is the only match tried.)
\n	The text already matched by capturing group n.

See group() for details of how capturing groups are numbered and accessed.

Operators

ab	Expression a followed by expression b.
a\|b	Either expression a or expression b.

Flags

(?dimsux-dimsux:a)	Evaluates the expression a with the given flags enabled/disabled.
(?dimsux-dimsux)	Evaluates the rest of the pattern with the given flags enabled/disabled.

The flags are:

`i`	`CASE_INSENSITIVE`	case insensitive matching
`d`	`UNIX_LINES`	only accept `'\n'` as a line terminator
`m`	`MULTILINE`	allow `^` and `$` to match beginning/end of any line
`s`	`DOTALL`	allow `.` to match `'\n'` ("s" for "single line")
`u`	`UNICODE_CASE`	enable Unicode case folding
`x`	`COMMENTS`	allow whitespace and comments