壹盾安全V3中的正则表达式使用谷歌的RE2语法,以下摘录一些常用的正则表达式用法
具体的语法请看文末(英文版需要自己翻译)
开始和结束
和其他正则表达式一样,可以使用^
和$
表示开始和结束:
^abc
– 必须以abc
开始efg$
– 必须以efg
结束^abc.+efg$
– 匹配对象必须是以abc
开始,efg
结束
字符集
.
– 点(.
)符号表示匹配任何字符,包括换行符[abc]
– 字符集,表示匹配abc
字符中的任何一个[^abc]
– 反向字符集,表示不匹配abc
字符中的任何一个\d
– 数字字符集,相当于[0-9]
\D
– 非数字字符集,相当于[^0-9]
\w
– 单词字符集,相当于[0-9A-Za-z_]
\W
– 非单词字符集,相当于[^0-9A-Za-z_]
\s
– 空白字符集,相当于[\t\n\f\r ]
\S
– 非空白字符集,相当于[^\t\n\f\r ]
\b
– 单词边界\B
– 非单词边界[[:alpha:]]
– ASCII字符集[[:^alpha:]]
– 非ASCII字符集\pN
– Unicode字符集,只有一个字符\p{Greek}
– Unicode字符集\p{Han}
– 中文字符集\PN
– 非Unicode字符集,只有一个字符\P{Greek}
– 非Unicode字符集
组合
xy
–x
和y
相邻,y
紧跟x
之后x|y
–x
或y
,遇到x
即宣告匹配成功
重复
x*
– 0或多个x
,尽可能匹配多的x
x+
– 1或多个x
,尽可能匹配多的x
x?
– 0或1个x
,尽可能匹配多的x
x{n,m}
– n到m个x
,尽可能匹配多的x
x{n,}
– n个以上的x
,包括n个x
,尽可能匹配多的x
x{n}
– n个x
x*?
– 0或多个x
,尽可能匹配少的x
x+?
– 1或多个x
,尽可能匹配少的x
x??
– 0或1个x
,尽可能匹配少的x
x{n,m}?
– n到m个x
,尽可能匹配少的x
x{n,}?
– n个以上的x
,包括n个x
,尽可能匹配少的x
x{n}?
– n个x
分组
使用圆括号进行分组
(re)
– 编号从1开始,0表示全部匹配的内容- 比如
(hello)(world)
匹配结果有两个分组,编号分别为1和2
- 比如
(?P<name>re)
– 命名分组- 比如使用
(?P<myName>\w+)
来匹配ZhangSan
,那么myName
对应的值被定义为ZhangSan
- 比如使用
(?:re)
– 跳过分组- 比如
(?:hello)(world)
匹配结果只有一个分组(world)
,编号为1;因为hello
这个分组被跳过
- 比如
标记
在别的正则表达式中称之为修饰符(modifier
)
i
– 表示大小写不敏感m
– 多行匹配,如果有$
结束字符的时候需要此标记s
– 让点符号(.
)也匹配\n
U
– 非贪婪模式,匹配结果尽可能少地匹配
使用(?FLAG)
或(?:FLAG)
语法来使用这些标记,并且这里的括号并不会产生新的分组:
(?i)hello
(?i:hello)
以上两个表达式都表示大小写不敏感,所以HELLO
、Hello
、hello
都认为匹配成功。
转义字符
使用反斜杠表示转义字符,用来表示某个字符是原始的字符,而不是正则表达式,比如匹配文件扩展名:
\.(php|asp|jsp|py)
中的点(.)因为在正则表达式中有别的意义,所以需要用反斜杠转义。
RE2完整表达式语法
kinds of single-character expressions | examples |
---|---|
any character, possibly including newline (s=true) | . |
character class | [xyz] |
negated character class | [^xyz] |
Perl character class (link) | \d |
negated Perl character class | \D |
ASCII character class (link) | [[:alpha:]] |
negated ASCII character class | [[:^alpha:]] |
Unicode character class (one-letter name) | \pN |
Unicode character class | \p{Greek} |
negated Unicode character class (one-letter name) | \PN |
negated Unicode character class | \P{Greek} |
Composites | |
---|---|
xy | x followed by y |
x|y | x or y (prefer x ) |
Repetitions | |
---|---|
x* | zero or more , prefer morex |
x+ | one or more , prefer morex |
x? | zero or one , prefer onex |
x{n,m} | n or +1 or … or , prefer moren m x |
x{n,} | n or more , prefer morex |
x{n} | exactly n x |
x*? | zero or more , prefer fewerx |
x+? | one or more , prefer fewerx |
x?? | zero or one , prefer zerox |
x{n,m}? | n or +1 or … or , prefer fewern m x |
x{n,}? | n or more , prefer fewerx |
x{n}? | exactly n x |
x{} | (≡ ) (NOT SUPPORTED) VIM x* |
x{-} | (≡ ) (NOT SUPPORTED) VIM x*? |
x{-n} | (≡ ) (NOT SUPPORTED) VIM x{n}? |
x= | (≡ ) (NOT SUPPORTED) VIM x? |
Implementation restriction: The counting forms , , and reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction. x{n,m}
x{n,}
x{n}
Possessive repetitions | |
---|---|
x*+ | zero or more , possessive (NOT SUPPORTED) x |
x++ | one or more , possessive (NOT SUPPORTED) x |
x?+ | zero or one , possessive (NOT SUPPORTED) x |
x{n,m}+ | n or … or , possessive (NOT SUPPORTED) m x |
x{n,}+ | n or more , possessive (NOT SUPPORTED) x |
x{n}+ | exactly , possessive (NOT SUPPORTED) n x |
Grouping | |
---|---|
(re) | numbered capturing group (submatch) |
(?P<name>re) | named & numbered capturing group (submatch) |
(?<name>re) | named & numbered capturing group (submatch) |
(?'name're) | named & numbered capturing group (submatch) (NOT SUPPORTED) |
(?:re) | non-capturing group |
(?flags) | set flags within current group; non-capturing |
(?flags:re) | set flags during re; non-capturing |
(?#text) | comment (NOT SUPPORTED) |
(?|x|y|z) | branch numbering reset (NOT SUPPORTED) |
(?>re) | possessive match of (NOT SUPPORTED) re |
re@> | possessive match of (NOT SUPPORTED) VIM re |
%(re) | non-capturing group (NOT SUPPORTED) VIM |
Flags | |
---|---|
i | case-insensitive (default false) |
m | multi-line mode: and match begin/end line in addition to begin/end text (default false)^ $ |
s | let match (default false). \n |
U | ungreedy: swap meaning of and , and , etc (default false)x* x*? x+ x+? |
Flag syntax is (set) or (clear) or (set , clear ).
xyz
-xyz
xy-z
xy
z
Empty strings | |
---|---|
^ | at beginning of text or line (=true)m |
$ | at end of text (like not ) or line (=true)\z \Z m |
\A | at beginning of text |
\b | at ASCII word boundary ( on one side and , , or on the other)\w \W \A \z |
\B | not at ASCII word boundary |
\g | at beginning of subtext being searched (NOT SUPPORTED) PCRE |
\G | at end of last match (NOT SUPPORTED) PERL |
\Z | at end of text, or before newline at end of text (NOT SUPPORTED) |
\z | at end of text |
(?=re) | before text matching (NOT SUPPORTED) re |
(?!re) | before text not matching (NOT SUPPORTED) re |
(?<=re) | after text matching (NOT SUPPORTED) re |
(?<!re) | after text not matching (NOT SUPPORTED) re |
re& | before text matching (NOT SUPPORTED) VIM re |
re@= | before text matching (NOT SUPPORTED) VIM re |
re@! | before text not matching (NOT SUPPORTED) VIM re |
re@<= | after text matching (NOT SUPPORTED) VIM re |
re@<! | after text not matching (NOT SUPPORTED) VIM re |
\zs | sets start of match (= \K) (NOT SUPPORTED) VIM |
\ze | sets end of match (NOT SUPPORTED) VIM |
\%^ | beginning of file (NOT SUPPORTED) VIM |
\%$ | end of file (NOT SUPPORTED) VIM |
\%V | on screen (NOT SUPPORTED) VIM |
\%# | cursor position (NOT SUPPORTED) VIM |
\%'m | mark position (NOT SUPPORTED) VIM m |
\%23l | in line 23 (NOT SUPPORTED) VIM |
\%23c | in column 23 (NOT SUPPORTED) VIM |
\%23v | in virtual column 23 (NOT SUPPORTED) VIM |
Escape sequences | |
---|---|
\a | bell (≡ \007 ) |
\f | form feed (≡ \014 ) |
\t | horizontal tab (≡ \011 ) |
\n | newline (≡ \012 ) |
\r | carriage return (≡ \015 ) |
\v | vertical tab character (≡ \013 ) |
\* | literal , for any punctuation character * * |
\123 | octal character code (up to three digits) |
\x7F | hex character code (exactly two digits) |
\x{10FFFF} | hex character code |
\C | match a single byte even in UTF-8 mode |
\Q...\E | literal text even if has punctuation... ... |
\1 | backreference (NOT SUPPORTED) |
\b | backspace (NOT SUPPORTED) (use \010 ) |
\cK | control char ^K (NOT SUPPORTED) (use etc)\001 |
\e | escape (NOT SUPPORTED) (use \033 ) |
\g1 | backreference (NOT SUPPORTED) |
\g{1} | backreference (NOT SUPPORTED) |
\g{+1} | backreference (NOT SUPPORTED) |
\g{-1} | backreference (NOT SUPPORTED) |
\g{name} | named backreference (NOT SUPPORTED) |
\g<name> | subroutine call (NOT SUPPORTED) |
\g'name' | subroutine call (NOT SUPPORTED) |
\k<name> | named backreference (NOT SUPPORTED) |
\k'name' | named backreference (NOT SUPPORTED) |
\lX | lowercase (NOT SUPPORTED) X |
\ux | uppercase (NOT SUPPORTED) x |
\L...\E | lowercase text (NOT SUPPORTED) ... |
\K | reset beginning of (NOT SUPPORTED) $0 |
\N{name} | named Unicode character (NOT SUPPORTED) |
\R | line break (NOT SUPPORTED) |
\U...\E | upper case text (NOT SUPPORTED) ... |
\X | extended Unicode sequence (NOT SUPPORTED) |
\%d123 | decimal character 123 (NOT SUPPORTED) VIM |
\%xFF | hex character FF (NOT SUPPORTED) VIM |
\%o123 | octal character 123 (NOT SUPPORTED) VIM |
\%u1234 | Unicode character 0x1234 (NOT SUPPORTED) VIM |
\%U12345678 | Unicode character 0x12345678 (NOT SUPPORTED) VIM |
Character class elements | |
---|---|
x | single character |
A-Z | character range (inclusive) |
\d | Perl character class |
[:foo:] | ASCII character class foo |
\p{Foo} | Unicode character class Foo |
\pF | Unicode character class (one-letter name)F |
Named character classes as character class elements | |
---|---|
[\d] | digits (≡ \d ) |
[^\d] | not digits (≡ \D ) |
[\D] | not digits (≡ \D ) |
[^\D] | not not digits (≡ \d ) |
[[:name:]] | named ASCII class inside character class (≡ [:name:] ) |
[^[:name:]] | named ASCII class inside negated character class (≡ [:^name:] ) |
[\p{Name}] | named Unicode property inside character class (≡ \p{Name} ) |
[^\p{Name}] | named Unicode property inside negated character class (≡ \P{Name} ) |
Perl character classes (all ASCII-only) | |
---|---|
\d | digits (≡ [0-9] ) |
\D | not digits (≡ [^0-9] ) |
\s | whitespace (≡ [\t\n\f\r ] ) |
\S | not whitespace (≡ [^\t\n\f\r ] ) |
\w | word characters (≡ [0-9A-Za-z_] ) |
\W | not word characters (≡ [^0-9A-Za-z_] ) |
\h | horizontal space (NOT SUPPORTED) |
\H | not horizontal space (NOT SUPPORTED) |
\v | vertical space (NOT SUPPORTED) |
\V | not vertical space (NOT SUPPORTED) |
ASCII character classes | |
---|---|
[[:alnum:]] | alphanumeric (≡ [0-9A-Za-z] ) |
[[:alpha:]] | alphabetic (≡ [A-Za-z] ) |
[[:ascii:]] | ASCII (≡ [\x00-\x7F] ) |
[[:blank:]] | blank (≡ [\t ] ) |
[[:cntrl:]] | control (≡ [\x00-\x1F\x7F] ) |
[[:digit:]] | digits (≡ [0-9] ) |
[[:graph:]] | graphical (≡ ≡ [!-~] [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_ ` {|}~] ) |
[[:lower:]] | lower case (≡ [a-z] ) |
[[:print:]] | printable (≡ ≡ [ -~] [ [:graph:]] ) |
[[:punct:]] | punctuation (≡ [!-/:-@[- ` {-~] ) |
[[:space:]] | whitespace (≡ [\t\n\v\f\r ] ) |
[[:upper:]] | upper case (≡ [A-Z] ) |
[[:word:]] | word characters (≡ [0-9A-Za-z_] ) |
[[:xdigit:]] | hex digit (≡ [0-9A-Fa-f] ) |
Unicode character class names–general category | |
---|---|
C | other |
Cc | control |
Cf | format |
Cn | unassigned code points (NOT SUPPORTED) |
Co | private use |
Cs | surrogate |
L | letter |
LC | cased letter (NOT SUPPORTED) |
L& | cased letter (NOT SUPPORTED) |
Ll | lowercase letter |
Lm | modifier letter |
Lo | other letter |
Lt | titlecase letter |
Lu | uppercase letter |
M | mark |
Mc | spacing mark |
Me | enclosing mark |
Mn | non-spacing mark |
N | number |
Nd | decimal number |
Nl | letter number |
No | other number |
P | punctuation |
Pc | connector punctuation |
Pd | dash punctuation |
Pe | close punctuation |
Pf | final punctuation |
Pi | initial punctuation |
Po | other punctuation |
Ps | open punctuation |
S | symbol |
Sc | currency symbol |
Sk | modifier symbol |
Sm | math symbol |
So | other symbol |
Z | separator |
Zl | line separator |
Zp | paragraph separator |
Zs | space separator |
Unicode character class names–scripts |
---|
Adlam |
Ahom |
Anatolian_Hieroglyphs |
Arabic |
Armenian |
Avestan |
Balinese |
Bamum |
Bassa_Vah |
Batak |
Bengali |
Bhaiksuki |
Bopomofo |
Brahmi |
Braille |
Buginese |
Buhid |
Canadian_Aboriginal |
Carian |
Caucasian_Albanian |
Chakma |
Cham |
Cherokee |
Chorasmian |
Common |
Coptic |
Cuneiform |
Cypriot |
Cypro_Minoan |
Cyrillic |
Deseret |
Devanagari |
Dives_Akuru |
Dogra |
Duployan |
Egyptian_Hieroglyphs |
Elbasan |
Elymaic |
Ethiopic |
Georgian |
Glagolitic |
Gothic |
Grantha |
Greek |
Gujarati |
Gunjala_Gondi |
Gurmukhi |
Han |
Hangul |
Hanifi_Rohingya |
Hanunoo |
Hatran |
Hebrew |
Hiragana |
Imperial_Aramaic |
Inherited |
Inscriptional_Pahlavi |
Inscriptional_Parthian |
Javanese |
Kaithi |
Kannada |
Katakana |
Kawi |
Kayah_Li |
Kharoshthi |
Khitan_Small_Script |
Khmer |
Khojki |
Khudawadi |
Lao |
Latin |
Lepcha |
Limbu |
Linear_A |
Linear_B |
Lisu |
Lycian |
Lydian |
Mahajani |
Makasar |
Malayalam |
Mandaic |
Manichaean |
Marchen |
Masaram_Gondi |
Medefaidrin |
Meetei_Mayek |
Mende_Kikakui |
Meroitic_Cursive |
Meroitic_Hieroglyphs |
Miao |
Modi |
Mongolian |
Mro |
Multani |
Myanmar |
Nabataean |
Nag_Mundari |
Nandinagari |
New_Tai_Lue |
Newa |
Nko |
Nushu |
Nyiakeng_Puachue_Hmong |
Ogham |
Ol_Chiki |
Old_Hungarian |
Old_Italic |
Old_North_Arabian |
Old_Permic |
Old_Persian |
Old_Sogdian |
Old_South_Arabian |
Old_Turkic |
Old_Uyghur |
Oriya |
Osage |
Osmanya |
Pahawh_Hmong |
Palmyrene |
Pau_Cin_Hau |
Phags_Pa |
Phoenician |
Psalter_Pahlavi |
Rejang |
Runic |
Samaritan |
Saurashtra |
Sharada |
Shavian |
Siddham |
SignWriting |
Sinhala |
Sogdian |
Sora_Sompeng |
Soyombo |
Sundanese |
Syloti_Nagri |
Syriac |
Tagalog |
Tagbanwa |
Tai_Le |
Tai_Tham |
Tai_Viet |
Takri |
Tamil |
Tangsa |
Tangut |
Telugu |
Thaana |
Thai |
an |
Tifinagh |
Tirhuta |
Toto |
Ugaritic |
Vai |
Vithkuqi |
Wancho |
Warang_Citi |
Yezidi |
Yi |
Zanabazar_Square |
Vim character classes | |
---|---|
\i | identifier character (NOT SUPPORTED) VIM |
\I | \i except digits (NOT SUPPORTED) VIM |
\k | keyword character (NOT SUPPORTED) VIM |
\K | \k except digits (NOT SUPPORTED) VIM |
\f | file name character (NOT SUPPORTED) VIM |
\F | \f except digits (NOT SUPPORTED) VIM |
\p | printable character (NOT SUPPORTED) VIM |
\P | \p except digits (NOT SUPPORTED) VIM |
\s | whitespace character (≡ ) (NOT SUPPORTED) VIM [ \t] |
\S | non-white space character (≡ ) (NOT SUPPORTED) VIM [^ \t] |
\d | digits (≡ ) VIM [0-9] |
\D | not VIM \d |
\x | hex digits (≡ ) (NOT SUPPORTED) VIM [0-9A-Fa-f] |
\X | not (NOT SUPPORTED) VIM \x |
\o | octal digits (≡ ) (NOT SUPPORTED) VIM [0-7] |
\O | not (NOT SUPPORTED) VIM \o |
\w | word character VIM |
\W | not VIM \w |
\h | head of word character (NOT SUPPORTED) VIM |
\H | not (NOT SUPPORTED) VIM \h |
\a | alphabetic (NOT SUPPORTED) VIM |
\A | not (NOT SUPPORTED) VIM \a |
\l | lowercase (NOT SUPPORTED) VIM |
\L | not lowercase (NOT SUPPORTED) VIM |
\u | uppercase (NOT SUPPORTED) VIM |
\U | not uppercase (NOT SUPPORTED) VIM |
\_x | \x plus newline, for any (NOT SUPPORTED) VIM x |
\c | ignore case (NOT SUPPORTED) VIM |
\C | match case (NOT SUPPORTED) VIM |
\m | magic (NOT SUPPORTED) VIM |
\M | nomagic (NOT SUPPORTED) VIM |
\v | verymagic (NOT SUPPORTED) VIM |
\V | verynomagic (NOT SUPPORTED) VIM |
\Z | ignore differences in Unicode combining characters (NOT SUPPORTED) VIM |
Magic | |
---|---|
(?{code}) | arbitrary Perl code (NOT SUPPORTED) PERL |
(??{code}) | postponed arbitrary Perl code (NOT SUPPORTED) PERL |
(?n) | recursive call to regexp capturing group (NOT SUPPORTED) n |
(?+n) | recursive call to relative group (NOT SUPPORTED) +n |
(?-n) | recursive call to relative group (NOT SUPPORTED) -n |
(?C) | PCRE callout (NOT SUPPORTED) PCRE |
(?R) | recursive call to entire regexp (≡ ) (NOT SUPPORTED) (?0) |
(?&name) | recursive call to named group (NOT SUPPORTED) |
(?P=name) | named backreference (NOT SUPPORTED) |
(?P>name) | recursive call to named group (NOT SUPPORTED) |
(?(cond)true|false) | conditional branch (NOT SUPPORTED) |
(?(cond)true) | conditional branch (NOT SUPPORTED) |
(*ACCEPT) | make regexps more like Prolog (NOT SUPPORTED) |
(*COMMIT) | (NOT SUPPORTED) |
(*F) | (NOT SUPPORTED) |
(*FAIL) | (NOT SUPPORTED) |
(*MARK) | (NOT SUPPORTED) |
(*PRUNE) | (NOT SUPPORTED) |
(*SKIP) | (NOT SUPPORTED) |
(*THEN) | (NOT SUPPORTED) |
(*ANY) | set newline convention (NOT SUPPORTED) |
(*ANYCRLF) | (NOT SUPPORTED) |
(*CR) | (NOT SUPPORTED) |
(*CRLF) | (NOT SUPPORTED) |
(*LF) | (NOT SUPPORTED) |
(*BSR_ANYCRLF) | set \R convention (NOT SUPPORTED) PCRE |
(*BSR_UNICODE) | (NOT SUPPORTED) PCRE |
© 版权声明
文章全是本幻城写的,尽量别直接复制粘贴
THE END
暂无评论内容