Python正則表達式(2)

時間 2019-11-06 標籤 python 正則表達式 2

對於一些預約義的字符集能夠使用轉義碼能夠更加緊湊的表示，re能夠識別的轉義碼有3對，6個，分別爲三個字母的大小寫，他們的意義是相反的。html

\d : 一個數字
\D : 一個非數字
\w : 字母或者數字
\W : 非字母，非數字
\s : 空白符（製表符，空格，換行符等）
\S : 非空白符python

若是想指定匹配的內容在文本的相對位置，能夠使用錨定，跟轉義碼相似。正則表達式

^ 字符或行的開始
$ 字符或行的結束
\A 字符串的開始
\Z 字符串結束
\b 一個單詞開頭或者末尾的空串
\B 不在一個單詞開頭或末尾的空串函數

import re
the_str = "This is some text -- with punctuation"  
re.search(r'^\w+', the_str).group(0)       # This
re.search(r'\A\w+', the_str).group(0)      # This  
re.search(r'\w+\S*$', the_str).group(0)    # punctuation  
re.search(r'\w+\S*\Z', the_str).group(0)   # punctuation  
re.search(r'\w*t\W*', the_str).group(0)    # text --  
re.search(r'\bt\w+', the_str).group(0)     # text  
re.search(r'\Bt*\B', the_str).group(0)     # 沒有匹配

用組來解析匹配，簡單的說就是在一個正則表達式中有幾個小括號()將匹配的表達式分紅不一樣的組，使用group()函數來獲取某個組的匹配，其中0爲整個正則表達式所匹配的內容，後面從1開始從左往右依次獲取每一個組的匹配，即每一個小括號中的匹配。使用groups()能夠獲取全部的匹配內容。this

import re  
the_str = "--aabb123bbaa"  
pattern = r'(\W+)([a-z]+)(\d+)(\D+)'  
match = re.search(pattern, the_str)    
match.groups()    # ('--', 'aabb', '123', 'bbaa') 
match.group(0)    # '--aabb123bbaa'  
match.group(1)    # '--'  
match.group(2)    # 'aabb'  
match.group(3)    # '123'  
match.group(4)    # 'bbaa'

python對分組的語法作了擴展，咱們能夠對每一個分組進行命名，這樣即可以使用名稱來調用。語法:(?P<name>pattern),使用groupdict()能夠返回一個包含了組名的字典。code

import re  
the_str = "--aabb123bbaa"  
pattern = r'(?P<not_al_and_num>\W+)(?P<al>[a-z]+)(?P<num>\d+)(?P<not_num>\D+)'  
match = re.search(pattern, the_str)    
match.groups()    # ('--', 'aabb', '123', 'bbaa')  
match.groupdict() # {'not_al_and_num': '--', 'not_num': 'bbaa', 'num': '123', 'al': 'aabb'}  
match.group(0)                    # '--aabb123bbaa'  
match.group(1)                    # '--'  
match.group(2)                    # 'aabb'  
match.group(3)                    # '123'  
match.group(4)                    # 'bbaa'   
match.group('not_al_and_num')    # '--'
match.group('al')                 # 'aabb'  
match.group('num')               # '123' '
match.group('not_num')            # 'bbaa'

以上的group()方法在使用的時候須要注意，只有在有匹配的時候纔會正常運行，不然會拋錯，因此在不能保證有匹配而又要輸出匹配結果的時候，必須作校驗。htm

在re中能夠設置不通的標誌，也就是search()和compile()等中都包含的缺省變量flag。使用標誌能夠進行完成一些特殊的要求，如忽略大小寫，多行搜索等。文檔

import re  
the_str = "this Text"  
re.findall(r'\bt\w+', the_str)   # ['this']  
re.findall(r'\bt\w+', the_str, re.IGNORECASE) # ['this', 'Text']

關於搜索選項有不少，具體可查看文檔 http://docs.python.org/2/library/re.html#module-re字符串