关于壹盾安全V3中的正则表达式语法-幻城云笔记

壹盾安全V3中的正则表达式使用谷歌的RE2语法，以下摘录一些常用的正则表达式用法

具体的语法请看文末（英文版需要自己翻译）

开始和结束

和其他正则表达式一样，可以使用^和$表示开始和结束：

^abc – 必须以abc开始
efg$ – 必须以efg结束
^abc.+efg$ – 匹配对象必须是以abc开始，efg结束

字符集

. – 点（.）符号表示匹配任何字符，包括换行符
[abc] – 字符集，表示匹配abc字符中的任何一个
[^abc] – 反向字符集，表示不匹配abc字符中的任何一个
\d – 数字字符集，相当于[0-9]
\D – 非数字字符集，相当于[^0-9]
\w – 单词字符集，相当于[0-9A-Za-z_]
\W – 非单词字符集，相当于[^0-9A-Za-z_]
\s – 空白字符集，相当于[\t\n\f\r ]
\S – 非空白字符集，相当于[^\t\n\f\r ]
\b – 单词边界
\B – 非单词边界
[[:alpha:]] – ASCII字符集
[[:^alpha:]] – 非ASCII字符集
\pN – Unicode字符集，只有一个字符
\p{Greek} – Unicode字符集
\p{Han} – 中文字符集
\PN – 非Unicode字符集，只有一个字符
\P{Greek} – 非Unicode字符集

组合

xy – x和y相邻，y紧跟x之后
x|y – x或y，遇到x即宣告匹配成功

重复

x* – 0或多个x，尽可能匹配最多的x
x+ – 1或多个x，尽可能匹配最多的x
x? – 0或1个x，尽可能匹配最多的x
x{n,m} – n到m个x，尽可能匹配最多的x
x{n,} – n个以上的x，包括n个x，尽可能匹配最多的x
x{n} – n个x
x*? – 0或多个x，尽可能匹配最少的x
x+? – 1或多个x，尽可能匹配最少的x
x?? – 0或1个x，尽可能匹配最少的x
x{n,m}? – n到m个x，尽可能匹配最少的x
x{n,}? – n个以上的x，包括n个x，尽可能匹配最少的x
x{n}? – n个x

分组

使用圆括号进行分组

(re) – 编号从1开始，0表示全部匹配的内容
- 比如(hello)(world)匹配结果有两个分组，编号分别为1和2
(?P<name>re) – 命名分组
- 比如使用(?P<myName>\w+)来匹配ZhangSan，那么myName对应的值被定义为ZhangSan
(?:re) – 跳过分组
- 比如(?:hello)(world)匹配结果只有一个分组(world)，编号为1；因为hello这个分组被跳过

标记

在别的正则表达式中称之为修饰符（modifier）

i – 表示大小写不敏感
m – 多行匹配，如果有$结束字符的时候需要此标记
s – 让点符号（.）也匹配\n
U – 非贪婪模式，匹配结果尽可能少地匹配

使用(?FLAG)或(?:FLAG)语法来使用这些标记，并且这里的括号并不会产生新的分组：

(?i)hello
(?i:hello)

以上两个表达式都表示大小写不敏感，所以HELLO、Hello、hello都认为匹配成功。

转义字符

使用反斜杠表示转义字符，用来表示某个字符是原始的字符，而不是正则表达式，比如匹配文件扩展名：

\.(php|asp|jsp|py)

中的点（.）因为在正则表达式中有别的意义，所以需要用反斜杠转义。

RE2完整表达式语法

kinds of single-character expressions	examples
any character, possibly including newline (s=true)	`.`
character class	`[xyz]`
negated character class	`[^xyz]`
Perl character class (link)	`\d`
negated Perl character class	`\D`
ASCII character class (link)	`[[:alpha:]]`
negated ASCII character class	`[[:^alpha:]]`
Unicode character class (one-letter name)	`\pN`
Unicode character class	`\p{Greek}`
negated Unicode character class (one-letter name)	`\PN`
negated Unicode character class	`\P{Greek}`

	Composites
`xy`	`x` followed by `y`
`x\|y`	`x` or `y` (prefer `x`)

	Repetitions
`x*`	zero or more , prefer more`x`
`x+`	one or more , prefer more`x`
`x?`	zero or one , prefer one`x`
`x{n,m}`	`n` or +1 or … or , prefer more`nmx`
`x{n,}`	`n` or more , prefer more`x`
`x{n}`	exactly `n` `x`
`x*?`	zero or more , prefer fewer`x`
`x+?`	one or more , prefer fewer`x`
`x??`	zero or one , prefer zero`x`
`x{n,m}?`	`n` or +1 or … or , prefer fewer`nmx`
`x{n,}?`	`n` or more , prefer fewer`x`
`x{n}?`	exactly `n` `x`
`x{}`	(≡ ) (NOT SUPPORTED) VIM `x*`
`x{-}`	(≡ ) (NOT SUPPORTED) VIM `x*?`
`x{-n}`	(≡ ) (NOT SUPPORTED) VIM `x{n}?`
`x=`	(≡ ) (NOT SUPPORTED) VIM `x?`

Implementation restriction: The counting forms , , and reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction. x{n,m}x{n,}x{n}

	Possessive repetitions
`x*+`	zero or more , possessive (NOT SUPPORTED) `x`
`x++`	one or more , possessive (NOT SUPPORTED) `x`
`x?+`	zero or one , possessive (NOT SUPPORTED) `x`
`x{n,m}+`	`n` or … or , possessive (NOT SUPPORTED) `mx`
`x{n,}+`	`n` or more , possessive (NOT SUPPORTED) `x`
`x{n}+`	exactly , possessive (NOT SUPPORTED) `nx`

	Grouping
`(re)`	numbered capturing group (submatch)
`(?P<name>re)`	named & numbered capturing group (submatch)
`(?<name>re)`	named & numbered capturing group (submatch)
`(?'name're)`	named & numbered capturing group (submatch) (NOT SUPPORTED)
`(?:re)`	non-capturing group
`(?flags)`	set flags within current group; non-capturing
`(?flags:re)`	set flags during re; non-capturing
`(?#text)`	comment (NOT SUPPORTED)
`(?\|x\|y\|z)`	branch numbering reset (NOT SUPPORTED)
`(?>re)`	possessive match of (NOT SUPPORTED) `re`
`re@>`	possessive match of (NOT SUPPORTED) VIM `re`
`%(re)`	non-capturing group (NOT SUPPORTED) VIM

	Flags
`i`	case-insensitive (default false)
`m`	multi-line mode: and match begin/end line in addition to begin/end text (default false)`^$`
`s`	let match (default false)`.\n`
`U`	ungreedy: swap meaning of and , and , etc (default false)`xx?x+x+?`

Flag syntax is (set) or (clear) or (set , clear ).

xyz-xyzxy-zxyz

	Empty strings
`^`	at beginning of text or line (=true)`m`
`$`	at end of text (like not ) or line (=true)`\z\Zm`
`\A`	at beginning of text
`\b`	at ASCII word boundary ( on one side and , , or on the other)`\w\W\A\z`
`\B`	not at ASCII word boundary
`\g`	at beginning of subtext being searched (NOT SUPPORTED) PCRE
`\G`	at end of last match (NOT SUPPORTED) PERL
`\Z`	at end of text, or before newline at end of text (NOT SUPPORTED)
`\z`	at end of text
`(?=re)`	before text matching (NOT SUPPORTED) `re`
`(?!re)`	before text not matching (NOT SUPPORTED) `re`
`(?<=re)`	after text matching (NOT SUPPORTED) `re`
`(?<!re)`	after text not matching (NOT SUPPORTED) `re`
`re&`	before text matching (NOT SUPPORTED) VIM `re`
`re@=`	before text matching (NOT SUPPORTED) VIM `re`
`re@!`	before text not matching (NOT SUPPORTED) VIM `re`
`re@<=`	after text matching (NOT SUPPORTED) VIM `re`
`re@<!`	after text not matching (NOT SUPPORTED) VIM `re`
`\zs`	sets start of match (= \K) (NOT SUPPORTED) VIM
`\ze`	sets end of match (NOT SUPPORTED) VIM
`\%^`	beginning of file (NOT SUPPORTED) VIM
`\%$`	end of file (NOT SUPPORTED) VIM
`\%V`	on screen (NOT SUPPORTED) VIM
`\%#`	cursor position (NOT SUPPORTED) VIM
`\%'m`	mark position (NOT SUPPORTED) VIM `m`
`\%23l`	in line 23 (NOT SUPPORTED) VIM
`\%23c`	in column 23 (NOT SUPPORTED) VIM
`\%23v`	in virtual column 23 (NOT SUPPORTED) VIM

	Escape sequences
`\a`	bell (≡ `\007`)
`\f`	form feed (≡ `\014`)
`\t`	horizontal tab (≡ `\011`)
`\n`	newline (≡ `\012`)
`\r`	carriage return (≡ `\015`)
`\v`	vertical tab character (≡ `\013`)
`\*`	literal , for any punctuation character `**`
`\123`	octal character code (up to three digits)
`\x7F`	hex character code (exactly two digits)
`\x{10FFFF}`	hex character code
`\C`	match a single byte even in UTF-8 mode
`\Q...\E`	literal text even if has punctuation`......`
`\1`	backreference (NOT SUPPORTED)
`\b`	backspace (NOT SUPPORTED) (use `\010`)
`\cK`	control char ^K (NOT SUPPORTED) (use etc)`\001`
`\e`	escape (NOT SUPPORTED) (use `\033`)
`\g1`	backreference (NOT SUPPORTED)
`\g{1}`	backreference (NOT SUPPORTED)
`\g{+1}`	backreference (NOT SUPPORTED)
`\g{-1}`	backreference (NOT SUPPORTED)
`\g{name}`	named backreference (NOT SUPPORTED)
`\g<name>`	subroutine call (NOT SUPPORTED)
`\g'name'`	subroutine call (NOT SUPPORTED)
`\k<name>`	named backreference (NOT SUPPORTED)
`\k'name'`	named backreference (NOT SUPPORTED)
`\lX`	lowercase (NOT SUPPORTED) `X`
`\ux`	uppercase (NOT SUPPORTED) `x`
`\L...\E`	lowercase text (NOT SUPPORTED) `...`
`\K`	reset beginning of (NOT SUPPORTED) `$0`
`\N{name}`	named Unicode character (NOT SUPPORTED)
`\R`	line break (NOT SUPPORTED)
`\U...\E`	upper case text (NOT SUPPORTED) `...`
`\X`	extended Unicode sequence (NOT SUPPORTED)
`\%d123`	decimal character 123 (NOT SUPPORTED) VIM
`\%xFF`	hex character FF (NOT SUPPORTED) VIM
`\%o123`	octal character 123 (NOT SUPPORTED) VIM
`\%u1234`	Unicode character 0x1234 (NOT SUPPORTED) VIM
`\%U12345678`	Unicode character 0x12345678 (NOT SUPPORTED) VIM

	Character class elements
`x`	single character
`A-Z`	character range (inclusive)
`\d`	Perl character class
`[:foo:]`	ASCII character class `foo`
`\p{Foo}`	Unicode character class `Foo`
`\pF`	Unicode character class (one-letter name)`F`

	Named character classes as character class elements
`[\d]`	digits (≡ `\d`)
`[^\d]`	not digits (≡ `\D`)
`[\D]`	not digits (≡ `\D`)
`[^\D]`	not not digits (≡ `\d`)
`[[:name:]]`	named ASCII class inside character class (≡ `[:name:]`)
`[^[:name:]]`	named ASCII class inside negated character class (≡ `[:^name:]`)
`[\p{Name}]`	named Unicode property inside character class (≡ `\p{Name}`)
`[^\p{Name}]`	named Unicode property inside negated character class (≡ `\P{Name}`)

	Perl character classes (all ASCII-only)
`\d`	digits (≡ `[0-9]`)
`\D`	not digits (≡ `[^0-9]`)
`\s`	whitespace (≡ `[\t\n\f\r ]`)
`\S`	not whitespace (≡ `[^\t\n\f\r ]`)
`\w`	word characters (≡ `[0-9A-Za-z_]`)
`\W`	not word characters (≡ `[^0-9A-Za-z_]`)
`\h`	horizontal space (NOT SUPPORTED)
`\H`	not horizontal space (NOT SUPPORTED)
`\v`	vertical space (NOT SUPPORTED)
`\V`	not vertical space (NOT SUPPORTED)

	ASCII character classes
`[[:alnum:]]`	alphanumeric (≡ `[0-9A-Za-z]`)
`[[:alpha:]]`	alphabetic (≡ `[A-Za-z]`)
`[[:ascii:]]`	ASCII (≡ `[\x00-\x7F]`)
`[[:blank:]]`	blank (≡ `[\t ]`)
`[[:cntrl:]]`	control (≡ `[\x00-\x1F\x7F]`)
`[[:digit:]]`	digits (≡ `[0-9]`)
`[[:graph:]]`	graphical (≡ ≡ `[!-~][A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_```{\|}~]`)
`[[:lower:]]`	lower case (≡ `[a-z]`)
`[[:print:]]`	printable (≡ ≡ `[ -~][ [:graph:]]`)
`[[:punct:]]`	punctuation (≡ `[!-/:-@[-```{-~]`)
`[[:space:]]`	whitespace (≡ `[\t\n\v\f\r ]`)
`[[:upper:]]`	upper case (≡ `[A-Z]`)
`[[:word:]]`	word characters (≡ `[0-9A-Za-z_]`)
`[[:xdigit:]]`	hex digit (≡ `[0-9A-Fa-f]`)

	Unicode character class names–general category
`C`	other
`Cc`	control
`Cf`	format
`Cn`	unassigned code points (NOT SUPPORTED)
`Co`	private use
`Cs`	surrogate
`L`	letter
`LC`	cased letter (NOT SUPPORTED)
`L&`	cased letter (NOT SUPPORTED)
`Ll`	lowercase letter
`Lm`	modifier letter
`Lo`	other letter
`Lt`	titlecase letter
`Lu`	uppercase letter
`M`	mark
`Mc`	spacing mark
`Me`	enclosing mark
`Mn`	non-spacing mark
`N`	number
`Nd`	decimal number
`Nl`	letter number
`No`	other number
`P`	punctuation
`Pc`	connector punctuation
`Pd`	dash punctuation
`Pe`	close punctuation
`Pf`	final punctuation
`Pi`	initial punctuation
`Po`	other punctuation
`Ps`	open punctuation
`S`	symbol
`Sc`	currency symbol
`Sk`	modifier symbol
`Sm`	math symbol
`So`	other symbol
`Z`	separator
`Zl`	line separator
`Zp`	paragraph separator
`Zs`	space separator

Unicode character class names–scripts
`Adlam`
`Ahom`
`Anatolian_Hieroglyphs`
`Arabic`
`Armenian`
`Avestan`
`Balinese`
`Bamum`
`Bassa_Vah`
`Batak`
`Bengali`
`Bhaiksuki`
`Bopomofo`
`Brahmi`
`Braille`
`Buginese`
`Buhid`
`Canadian_Aboriginal`
`Carian`
`Caucasian_Albanian`
`Chakma`
`Cham`
`Cherokee`
`Chorasmian`
`Common`
`Coptic`
`Cuneiform`
`Cypriot`
`Cypro_Minoan`
`Cyrillic`
`Deseret`
`Devanagari`
`Dives_Akuru`
`Dogra`
`Duployan`
`Egyptian_Hieroglyphs`
`Elbasan`
`Elymaic`
`Ethiopic`
`Georgian`
`Glagolitic`
`Gothic`
`Grantha`
`Greek`
`Gujarati`
`Gunjala_Gondi`
`Gurmukhi`
`Han`
`Hangul`
`Hanifi_Rohingya`
`Hanunoo`
`Hatran`
`Hebrew`
`Hiragana`
`Imperial_Aramaic`
`Inherited`
`Inscriptional_Pahlavi`
`Inscriptional_Parthian`
`Javanese`
`Kaithi`
`Kannada`
`Katakana`
`Kawi`
`Kayah_Li`
`Kharoshthi`
`Khitan_Small_Script`
`Khmer`
`Khojki`
`Khudawadi`
`Lao`
`Latin`
`Lepcha`
`Limbu`
`Linear_A`
`Linear_B`
`Lisu`
`Lycian`
`Lydian`
`Mahajani`
`Makasar`
`Malayalam`
`Mandaic`
`Manichaean`
`Marchen`
`Masaram_Gondi`
`Medefaidrin`
`Meetei_Mayek`
`Mende_Kikakui`
`Meroitic_Cursive`
`Meroitic_Hieroglyphs`
`Miao`
`Modi`
`Mongolian`
`Mro`
`Multani`
`Myanmar`
`Nabataean`
`Nag_Mundari`
`Nandinagari`
`New_Tai_Lue`
`Newa`
`Nko`
`Nushu`
`Nyiakeng_Puachue_Hmong`
`Ogham`
`Ol_Chiki`
`Old_Hungarian`
`Old_Italic`
`Old_North_Arabian`
`Old_Permic`
`Old_Persian`
`Old_Sogdian`
`Old_South_Arabian`
`Old_Turkic`
`Old_Uyghur`
`Oriya`
`Osage`
`Osmanya`
`Pahawh_Hmong`
`Palmyrene`
`Pau_Cin_Hau`
`Phags_Pa`
`Phoenician`
`Psalter_Pahlavi`
`Rejang`
`Runic`
`Samaritan`
`Saurashtra`
`Sharada`
`Shavian`
`Siddham`
`SignWriting`
`Sinhala`
`Sogdian`
`Sora_Sompeng`
`Soyombo`
`Sundanese`
`Syloti_Nagri`
`Syriac`
`Tagalog`
`Tagbanwa`
`Tai_Le`
`Tai_Tham`
`Tai_Viet`
`Takri`
`Tamil`
`Tangsa`
`Tangut`
`Telugu`
`Thaana`
`Thai`
`an`
`Tifinagh`
`Tirhuta`
`Toto`
`Ugaritic`
`Vai`
`Vithkuqi`
`Wancho`
`Warang_Citi`
`Yezidi`
`Yi`
`Zanabazar_Square`

	Vim character classes
`\i`	identifier character (NOT SUPPORTED) VIM
`\I`	`\i` except digits (NOT SUPPORTED) VIM
`\k`	keyword character (NOT SUPPORTED) VIM
`\K`	`\k` except digits (NOT SUPPORTED) VIM
`\f`	file name character (NOT SUPPORTED) VIM
`\F`	`\f` except digits (NOT SUPPORTED) VIM
`\p`	printable character (NOT SUPPORTED) VIM
`\P`	`\p` except digits (NOT SUPPORTED) VIM
`\s`	whitespace character (≡ ) (NOT SUPPORTED) VIM `[ \t]`
`\S`	non-white space character (≡ ) (NOT SUPPORTED) VIM `[^ \t]`
`\d`	digits (≡ ) VIM `[0-9]`
`\D`	not VIM `\d`
`\x`	hex digits (≡ ) (NOT SUPPORTED) VIM `[0-9A-Fa-f]`
`\X`	not (NOT SUPPORTED) VIM `\x`
`\o`	octal digits (≡ ) (NOT SUPPORTED) VIM `[0-7]`
`\O`	not (NOT SUPPORTED) VIM `\o`
`\w`	word character VIM
`\W`	not VIM `\w`
`\h`	head of word character (NOT SUPPORTED) VIM
`\H`	not (NOT SUPPORTED) VIM `\h`
`\a`	alphabetic (NOT SUPPORTED) VIM
`\A`	not (NOT SUPPORTED) VIM `\a`
`\l`	lowercase (NOT SUPPORTED) VIM
`\L`	not lowercase (NOT SUPPORTED) VIM
`\u`	uppercase (NOT SUPPORTED) VIM
`\U`	not uppercase (NOT SUPPORTED) VIM
`\_x`	`\x` plus newline, for any (NOT SUPPORTED) VIM `x`
`\c`	ignore case (NOT SUPPORTED) VIM
`\C`	match case (NOT SUPPORTED) VIM
`\m`	magic (NOT SUPPORTED) VIM
`\M`	nomagic (NOT SUPPORTED) VIM
`\v`	verymagic (NOT SUPPORTED) VIM
`\V`	verynomagic (NOT SUPPORTED) VIM
`\Z`	ignore differences in Unicode combining characters (NOT SUPPORTED) VIM

	Magic
`(?{code})`	arbitrary Perl code (NOT SUPPORTED) PERL
`(??{code})`	postponed arbitrary Perl code (NOT SUPPORTED) PERL
`(?n)`	recursive call to regexp capturing group (NOT SUPPORTED) `n`
`(?+n)`	recursive call to relative group (NOT SUPPORTED) `+n`
`(?-n)`	recursive call to relative group (NOT SUPPORTED) `-n`
`(?C)`	PCRE callout (NOT SUPPORTED) PCRE
`(?R)`	recursive call to entire regexp (≡ ) (NOT SUPPORTED) `(?0)`
`(?&name)`	recursive call to named group (NOT SUPPORTED)
`(?P=name)`	named backreference (NOT SUPPORTED)
`(?P>name)`	recursive call to named group (NOT SUPPORTED)
`(?(cond)true\|false)`	conditional branch (NOT SUPPORTED)
`(?(cond)true)`	conditional branch (NOT SUPPORTED)
`(*ACCEPT)`	make regexps more like Prolog (NOT SUPPORTED)
`(*COMMIT)`	(NOT SUPPORTED)
`(*F)`	(NOT SUPPORTED)
`(*FAIL)`	(NOT SUPPORTED)
`(*MARK)`	(NOT SUPPORTED)
`(*PRUNE)`	(NOT SUPPORTED)
`(*SKIP)`	(NOT SUPPORTED)
`(*THEN)`	(NOT SUPPORTED)
`(*ANY)`	set newline convention (NOT SUPPORTED)
`(*ANYCRLF)`	(NOT SUPPORTED)
`(*CR)`	(NOT SUPPORTED)
`(*CRLF)`	(NOT SUPPORTED)
`(*LF)`	(NOT SUPPORTED)
`(*BSR_ANYCRLF)`	set \R convention (NOT SUPPORTED) PCRE
`(*BSR_UNICODE)`	(NOT SUPPORTED) PCRE

文章全是本幻城写的，尽量别直接复制粘贴

THE END