正则表达式
支持正则表达式的语句:
1.FIND,REPLACE语句;
2.Functions:count,count_xxx,contains,find,find_xxx,match,matches,replace,substring,substring_xxx;
3.类:CL_ABAP_REGEX,CL_ABAP_MATCHER;
正则表达式语句规则
Single Character Patterns
单个普通字符:A-B,0-9等单个字符,以及一些特殊字符通过反斜杠()转义变成普通字符;
特殊字符\:. , [,],-,^这些字符作为特殊操作符,-,^只有在[]中有特殊意义;
示例:
"regex:A string:a 结果:不匹配
"regex:AB string:A 结果:不匹配
IF cl_abap_matcher=>matches( pattern = 'A' text = 'A' ) = abap_true.
WRITE:/ '1.true'.
ENDIF.
".,[,],-,^特殊操作字符
".可以替换任意单个字符;
"\使用反斜杠将特殊字符变成普通字符;
"\和一些字符一起表示一组字符(不能再[]中使用):
"1.\C:表示字母字符集;
"2.\d:表示数字字符集;
"3.\D:表示非数字字符集;
"4.\l:表示小写字符集;
"5.\L:表示非小写字符集;
"6.\s:表示空白字符;
"7.\S:表示非空白字符;
"8.\u:表示大写字符集;
"9.\U:表示非大写字符集;
"10.\w:表示字母数字下划线字符集;
"11.\W:表示非字母数字下划线字符集;
"[]表示一个字符集,只需要匹配字符集中一个字符,表示匹配;
"[^x]表示对该字符集取反,只需要不匹配字符集中任意字符,表示匹配;
"[x-x]表示字符集范围,A-Z,a-z,0-1等;
"ABAP定义的字符集
"1.[:alnum:]字母数字集;
"2.[:alpha:]字母集;
"3.[:digit:]数字集;
"4.[:blank:]空白字符,水平制表符;
"5.[:cntrl:]所有控制字符集;
"6.[:graph:]可显示字符集,除空白和水平制表符;
"7.[:lower:]小写字符集;
"8.[:print:]所有可显示字符的集合([:graph:]和[:blank:]的并集);
"9.[:punct:]所有标点字符集;
"10.[:space:]所有空白字符、制表符和回车符的集合;
"11.[:unicode:]字符表示大于255的所有字符集(仅在Unicode系统中);
"12.[:upper:]所有大写字符集;
"13.[:word:]包括下划线在内的所有字母数字字符集_;
"14.[:xdigit:]所有十六进制数字的集合(“0”-“9”,“A”-“F”,和“A”-“F”);
"示例:
"regex:\. string:. 结果:匹配
"regex:\C string:A 结果:匹配
"regex:.. string:AB 结果:匹配
"regex:[ABC] string:A 结果:匹配
"regex:[AB][CD] string:AD 结果:匹配
"regex:[^A-Z] string:1 结果:匹配
"regex:[A-Z-] string:- 结果:匹配
IF cl_abap_matcher=>matches( pattern = '[A-Z-]' text = 'A' ) = abap_true.
WRITE:/ '2.true'.
ENDIF.
Character string patterns
多正则表达式连接匹配。
特殊字符{,},*,+,?,(,),|,\
示例:
"regex:h[ae]llo string:hello 结果:匹配;
"regex:h[ae]llo string:hallo 结果:匹配;
IF cl_abap_matcher=>matches( pattern = '[A-Z-]' text = 'A' ) = abap_true.
WRITE:/ '3.true'.
ENDIF.
"{,},*,+,?,(,),|,\特殊字符
"x{n}:表示修饰的字符出现n次;
"x{n,m}:表示修饰字符出现n~m次;
"x*:表示修饰字符出现{0,}次;
"x+:表示修饰字符出现{1,}次;
"x?:表示修饰字符出现{0,1}次;
"a|b:表示匹配a或b字符;
"():表示分组匹配
"(?:xxx):表示xxx出现一次
"使用\1,\2代表分组从左到右
"\Qxxx\E之间的特殊字符变成普通字符
"示例:
"regex:hi{2}o string:hiio 结果:匹配
"regex:hi{1,3}o string:hiiio 结果:匹配
"regex:hi?o string:ho 结果:匹配
"regex:hi*o string:ho 结果:匹配
"regex:hi+o string:hio 结果:匹配
"regex:.{0,4} string:匹配0~4个字符
"regex:a|bb|c string:bb 结果:匹配
"regex:h(a|b)o string:hao 结果:匹配
"regex:(a|b)(?:ac) string:bac 结果:匹配
"regex:(").*\1 string:"hi" 结果:匹配
IF cl_abap_matcher=>matches( pattern = '(a|b)(?:ac)' text = 'bac' ) = abap_true.
WRITE:/ '4.true'.
ENDIF.
IF cl_abap_matcher=>matches( pattern = '(").*\1' text = '"hi"' ) = abap_true.
WRITE:/ '5.true'.
ENDIF.
DATA:TEXT type STRING.
DATA:result_tab TYPE match_result_tab.
DATA:wa_result_tab TYPE match_result.
text = 'aaaaaabaaaaaaacaaaa'.
FIND ALL OCCURRENCES OF REGEX '(a+)(a)' IN text RESULTS result_tab.
WRITE:/ text.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
Search Pattern
开始结尾字符匹配
示例
"特殊字符:^,$,\,(,),=,!
"示例1:Start and end of a line
"^,$表示前置符号,结尾符号,每一行
text = |Line1\nLine2\nLine3|.
FIND ALL OCCURRENCES OF REGEX '^'
IN text RESULTS result_tab.
WRITE:/ text.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
FIND ALL OCCURRENCES OF REGEX '$'
IN text RESULTS result_tab.
WRITE:/ text.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
"示例2:Start and end of a character string
"\A,\z作为前置符号,结尾符号,字符串开始结尾
DATA:t_text(10) TYPE c.
DATA:t_text_tab LIKE TABLE OF text.
APPEND ' Smile' TO t_text_tab.
APPEND ' Smile' TO t_text_tab.
APPEND ' Smile' TO t_text_tab.
APPEND ' Smile' TO t_text_tab.
APPEND ' Smile' TO t_text_tab.
APPEND ' Smile' TO t_text_tab.
FIND ALL OCCURRENCES OF regex '\A(?:Smile)|(?:Smile)\z'
IN TABLE t_text_tab RESULTS result_tab.
WRITE:/ 'Smile匹配'.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
"示例3
"\z匹配最后行,\Z忽略换行匹配最后字符
text = |... this is the end\n\n\n|.
FIND REGEX 'end\z' IN text.
IF sy-subrc <> 0.
WRITE / `There's no end.`.
ENDIF.
FIND REGEX 'end\Z' IN text.
IF sy-subrc = 0.
WRITE / `The end is near the end.`.
ENDIF.
"示例4:Start and End of Word
"\<,\>也表示匹配开头,结尾单词
"\b表示开头结尾匹配
"查找s开头
text = `Sometimes snow seems so soft.`.
FIND ALL OCCURRENCES OF regex '\<s'
IN text IGNORING CASE
RESULTS result_tab.
WRITE:/ 's开头',text.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
FIND ALL OCCURRENCES OF regex 's\b'
IN text IGNORING CASE
RESULTS result_tab.
WRITE:/ 's开头或结尾',text.
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
"示例5:Preview Condition
"预定义匹配内容不作为匹配结果内容
"(?=x),相当于匹配x
"(?!x),相当于不匹配x
text = `Shalalala!`.
FIND ALL OCCURRENCES OF REGEX '(?:la)(?=!)'
IN text RESULTS result_tab.
WRITE:/ text.
"这里匹配到最后'la','!'不作为匹配到内容
LOOP AT result_tab INTO wa_result_tab.
WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
ENDLOOP.
"示例6:Cut operator
DATA:s_text TYPE string.
DATA:moff TYPE i.
DATA:mlen TYPE i.
s_text = `xxaabbaaaaxx`.
FIND REGEX 'a+b+|[ab]+' IN text
MATCH OFFSET moff
MATCH LENGTH mlen.
WRITE:/ s_text.
IF sy-subrc = 0.
WRITE:/ moff.
WRITE:/ mlen.
WRITE:/ text+moff(mlen).
ENDIF.
FIND REGEX '(?>a+b+|[ab]+)' IN text
MATCH OFFSET moff
MATCH LENGTH mlen.
WRITE:/ s_text.
IF sy-subrc = 0.
WRITE:/ moff.
WRITE:/ mlen.
WRITE:/ text+moff(mlen).
ENDIF.
FIND REGEX '(?>a+|a)a' IN text
MATCH OFFSET moff
MATCH LENGTH mlen.
WRITE:/ s_text.
IF sy-subrc <> 0.
WRITE:/ moff.
WRITE:/ mlen.
WRITE:/ 'Nothing found'.
ENDIF.
Replace Patterns
替换字符REPLACE
示例:
"REPLACE关键词替换字符
"特殊字符:$,&,`,`
"示例1:Addressing the Full Occurrence
text = `Yeah!`.
REPLACE REGEX `\w+` IN text WITH `$0,$&`.
WRITE:/ text.
"示例2:Addressing the Registers of Subgroups
"自身分组替换,返回`CBA'n'ABC`
text = `ABC'n'CBA`.
REPLACE REGEX `(\w+)(\W\w\W)(\w+)` IN text WITH `$3$2$1`.
WRITE:/ text.
"示例3:Addressing the Text Before the Occurrence
text = `ABC and BCD`.
REPLACE REGEX 'and' IN text WITH '$0 $`'.
"ABC and ABC BCD
WRITE:/ text.
Simplified Regular Expressions
简化正则表达式
示例:
"这个类CL_ABAP_REGEX,仅支持简化正则表达式
"不支持+,|,(?=),(?!),(?:);
"{} => \{\}
"() => \(\)
"示例1
DATA:lo_regex TYPE REF TO cl_abap_regex.
DATA:t_res TYPE match_result_tab.
DATA:wa_res TYPE match_result.
"不使用simplified Regular,+表示前面字符出现{1,}
CREATE OBJECT lo_regex
EXPORTING
pattern = 'a+'
ignore_case = abap_true "忽略大小写
simple_regex = abap_false.
FIND ALL OCCURRENCES OF REGEX lo_regex IN 'aaa+bbb' RESULTS t_res.
LOOP AT t_res INTO wa_res.
WRITE:/ wa_res-line,wa_res-offset,wa_res-length.
ENDLOOP.
"使用simplified Regular,+表示普通+
CREATE OBJECT lo_regex
EXPORTING
pattern = 'a+'
simple_regex = abap_true.
FIND ALL OCCURRENCES OF REGEX lo_regex IN 'aaa+bbb' RESULTS t_res.
LOOP AT t_res INTO wa_res.
WRITE:/ wa_res-line,wa_res-offset,wa_res-length.
ENDLOOP.
Special Characters in Regular Expressions
"正则表达式中特殊表达式
"\ Escape character for special characters
"反斜杠转义字符
"$0, $& Placeholder for the whole found location
"$1, $2, $3... Placeholder for the registration of subgroups
"$` Placeholder for the text before the found location
"$' Placeholder for the text after the found location
正则表达式使用
FIND,REPLACE关键词
"使用FIND,REPLACE关键词
"FIND
"语法:FIND [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF] pattern
" IN [section_of] dobj
" [IN {CHARACTER|BYTE} MODE]
" [find_options].
"pattern = {[SUBSTRING] substring} | {REGEX regex}
"可以查找substring或匹配regex
"section_of = SECTION [OFFSET off] [LENGTH len] OF
"可以指定查找dobj字符串匹配范围,off匹配开始位置,len偏移长度
"find_options = [{RESPECTING|IGNORING} CASE]
" [MATCH COUNT mcnt]
" { {[MATCH OFFSET moff]
" [MATCH LENGTH mlen]}
" | [RESULTS result_tab|result_wa] }
" [SUBMATCHES s1 s2 ...]
"mcnt:匹配次数,如果first occurrence,mcnt一直为1
"moff:最后一次匹配偏移值,如果是first occurrence,则是第一次匹配值
"mlen:最后一次匹配字符串长度,如果是first occurence,则是第一次匹配值
"submatches:分组匹配字符串
"示例1
DATA:s1 TYPE string.
DATA:s2 TYPE string.
text = `Hey hey, my my, Rock and roll can never die`.
FIND REGEX `(\w+)\W+\1\W+(\w+)\W+\2` IN text
IGNORING CASE
MATCH OFFSET moff
MATCH LENGTH mlen
SUBMATCHES s1 s2.
WRITE:/ moff,mlen,s1,s2.
"REPLACE
"语法:
"1. REPLACE [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF] pattern
" IN [section_of] dobj WITH new
" [IN {CHARACTER|BYTE} MODE]
" [replace_options].
"replace_options = [{RESPECTING|IGNORING} CASE]
" [REPLACEMENT COUNT rcnt]
" {{[REPLACEMENT OFFSET roff][REPLACEMENT LENGTH rlen]}
" |[RESULTS result_tab|result_wa]}
"2. REPLACE SECTION [OFFSET off] [LENGTH len] OF dobj WITH new
" [IN {CHARACTER|BYTE} MODE].
text = 'hello1 world!22'.
REPLACE
ALL OCCURRENCES OF
REGEX '[0-9]'
IN SECTION OFFSET 0 LENGTH 10 OF text
WITH '!'.
WRITE:/ text.
"指定位置范围替换
REPLACE SECTION OFFSET 10 LENGTH 5 OF text WITH '!'.
WRITE:/ text.
使用function
可以使用到正则表达式的function:find,count,match等方法。
示例:
"find
"返回匹配字符位置
"语法:
"1.find( val = text {sub = substring}|{regex = regex}[case = case][off = off] [len = len] [occ = occ] )
"2.find_end( val = text regex = regex [case = case][off = off] [len = len] [occ = occ] )
"3.find_any_of( val = text sub = substring [off = off] [len = len] [occ = occ] )
"4.find_any_not_of( val = text sub = substring [off = off] [len = len] [occ = occ] )
"occ表是返回第几次匹配值,如果为正从左到右匹配,如果为负从右到左匹配
"示例
DATA:mocc TYPE I VALUE 1.
DATA:result TYPE I.
text = 'hello world world'.
result = find( val = text sub = 'wo' case = abap_true off = moff len = mlen occ = mocc ).
WRITE:/ text,result,moff,mlen,mocc.
"count
"返回匹配次数
"语法:
"1.count( val = text {sub = substring}|{regex = regex} [case = case][off = off] [len = len] )
"2.count_any_of( val = text sub = substring [off = off] [len = len] )
"3.count_any_not_of( val = text sub = substring [off = off] [len = len] )
result = count( val = text sub = 'wo' case = abap_true off = moff len = mlen ).
WRITE:/ text,result,moff,mlen.
"match
"返回匹配结果子串
"语法:
"match( val = text regex = regex [case = case] [occ = occ] )
DATA:s_result TYPE string.
s_result = match( val = text regex = 'wor' case = abap_true occ = 1 ).
WRITE:/ s_result.
"contains
"返回字符串是否包含子串,boolean
"1.contains( val = text sub|start|end = substring [case = case][off = off] [len = len] [occ = occ] )
"2.contains( val = text regex = regex [case = case][off = off] [len = len] [occ = occ] )
"3.contains_any_of( val = text sub|start|end = substring [off = off] [len = len] [occ = occ] )
"4.contains_any_not_of( val = text sub|start|end = substring [off = off] [len = len] [occ = occ] )
"off:匹配开始位置
"len:从开始偏移量
"occ:指定匹配次数,如果匹配字符串没有出现大于等于指定次数,返回false
"case:大小写敏感
text = 'abcdef egg help'.
IF contains( val = text sub = 'e' case = abap_true off = 0 len = 15 occ = 2 ).
WRITE:/ 'contains:匹配成功'.
ENDIF.
"matches
"返回字符串匹配结果,boolean
"语法:matches( val = text regex = regex [case = case] [off = off] [len = len] ) ...
"示例:
text = '33340@334.com'.
"匹配邮箱
IF matches( val = text
regex = `\w+(\.\w+)*@(\w+\.)+((\l|\u){2,4})` ).
MESSAGE 'Format OK' TYPE 'S'.
ELSEIF matches(
val = text
regex = `[[:alnum:],!#\$%&'\*\+/=\?\^_``\{\|}~-]+` &
`(\.[[:alnum:],!#\$%&'\*\+/=\?\^_``\{\|}~-]+)*` &
`@[[:alnum:]-]+(\.[[:alnum:]-]+)*` &
`\.([[:alpha:]]{2,})` ).
MESSAGE 'Syntax OK but unusual' TYPE 'S' DISPLAY LIKE 'W'.
ELSE.
MESSAGE 'Wrong Format' TYPE 'S' DISPLAY LIKE 'E'.
ENDIF.
"replace
"替换指定范围字符串,off,len指定
"1.replace( val = text [off = off] [len = len] with = new )
"替换匹配字符子串
"如果off有值,len = 0,表示插入到off处;
"如果len有值,off = 0,替换头部len长度字符串;
"如果off等于字符串长度,len=0,表示将子串拼接到字符串后;
"2.replace( val = text {sub = substring}|{regex = regex} with = new [case = case] [occ = occ] )
"occ指定替换次数
"示例:
text = 'hello world! welcome china!'.
text = replace( val = text off = 0 len = 5 with = 'hi' ).
WRITE:/ 'replace:',text.
"这里只替换第一次匹配的'!'
text = replace( val = text sub = '!' with = '.' case = abap_true occ = 1 ).
WRITE:/ 'replace:',text.
"substring
"返回子字符串
"1.substring( val = text [off = off] [len = len] )
"2.substring_from( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len] )
"3.substring_after( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len] )
"4.substring_before( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len] )
"5.substring_to( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len] )
text = 'ABCDEFGHJKLMN'.
text = substring( val = text off = 0 len = 10 ).
WRITE:/ 'substring:',text.
"返回ABCDE,返回匹配子字符串,len指定返回长度
text = 'ABCDEFGHJKLMN'.
text = substring_from( val = text sub = 'ABCDEF' case = abap_true occ = 1 len = 5 ).
WRITE:/ 'substring:',text.
"返回DEFGH,返回查找到字符串后面len长度部分
text = 'ABCDEFGHJKLMN'.
text = substring_after( val = text sub = 'ABC' case = abap_true occ = 1 len = 5 ).
WRITE:/ 'substring:',text.
"返回DEFGH,返回查找到字符串前面len长度部分
text = 'ABCDEFGHJKLMN'.
text = substring_before( val = text sub = 'JKL' case = abap_true occ = 1 len = 5 ).
WRITE:/ 'substring:',text.
"返回GHJKL,返回查找到字符串前面len长度部分(包含匹配字符串)
text = 'ABCDEFGHJKLMN'.
text = substring_to( val = text sub = 'JKL' case = abap_true occ = 1 len = 5 ).
WRITE:/ 'substring:',text.
使用cl_abap_regex,cl_abap_matcher
类cl_abap_regex,用来创建正则表达式,cl_abap_matcher,用来进行匹配,查找,替换等操作。
示例:
"CL_ABAP_REGEX
"CL_ABAP_MATCHER
DATA:lo_matcher TYPE REF TO cl_abap_matcher.
DATA:ls_match TYPE match_result.
DATA:lv_match TYPE C LENGTH 1.
"直接使用cl_abap_matcher类方法matches
IF cl_abap_matcher=>matches( pattern = 'ABC.*' text = 'ABCDABCE' ) = abap_true.
"返回静态实例
lo_matcher = cl_abap_matcher=>get_object( ).
"获取匹配结果
ls_match = lo_matcher->get_match( ).
"cl_abap_matcher的attribute
"text:匹配的字符串
"table:匹配的table
"regex:匹配的正则表达式
WRITE:/ 'cl_abap_matcher:',lo_matcher->text,ls_match-offset,ls_match-length.
ENDIF.
"创建matcher对象,然后匹配
lo_matcher = cl_abap_matcher=>create( pattern = 'A.*'
ignore_case = abap_true
text = 'ABC' ).
"匹配结果,匹配‘X’,不匹配为空
lv_match = lo_matcher->match( ).
WRITE:/ 'cl_abap_matcher:',lv_match.
"创建cl_abap_regex,正则表达式对象
"通过regex对象创建matcher
"DATA: lo_regex TYPE REF TO cl_abap_regex.
CREATE OBJECT lo_regex EXPORTING pattern = '^add.*' ignore_case = abap_true.
lo_matcher = lo_regex->create_matcher( text = 'addition' ).
lv_match = lo_matcher->match( ).
WRITE:/'cl_abap_matcher:',lv_match.
"创建matcher对象,使用构造方法
DATA:t_result_tab TYPE MATCH_RESULT_TAB.
DATA:s_result_tab TYPE MATCH_RESULT.
CREATE OBJECT lo_regex EXPORTING pattern = 'A'.
CREATE OBJECT lo_matcher EXPORTING REGEX = lo_regex TEXT = 'ABCDABCD'.
t_result_tab = lo_matcher->find_all( ).
LOOP AT t_result_tab INTO s_result_tab.
WRITE:/ 'find_all:',s_result_tab-offset,s_result_tab-length.
ENDLOOP.
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Hblog!