Python正則表達式常用函數總結

2020-01-04 16:54:10

字體：大中小

來源：轉載

供稿：網友

本文實例總結了Python正則表達式常用函數。分享給大家供大家參考，具體如下：

re.match()

函數原型：

match(pattern, string, flags=0)
Try to apply the pattern at the start of the string,
returning a match object, or None if no match was found.

函數作用：

re.match函數嘗試從字符串的開頭開始匹配一個模式，如果匹配成功，返回一個匹配成功的對象，否則返回None。

參數說明：

pattern：匹配的正則表達式
string：要匹配的字符串
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

我們可以使用group()或groups()匹配對象函數來獲取匹配后的結果。

group()

group(...)
    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.

獲得一個或多個分組截獲的字符串；指定多個參數時將以元組形式返回。group1可以使用編號也可以使用別名；編號0代表匹配的整個子串；默認返回group(0)；沒有截獲字符串的組返回None；截獲了多次的組返回最后一次截獲的子串。

groups()

groups(...)
    groups([default=None]) -> tuple.
    Return a tuple containing all the subgroups of the match, from 1.
    The default argument is used for groups
    that did not participate in the match

以元組形式返回全部分組截獲的字符串。相當于調用group(1,2,…last)。沒有截獲字符串的組以默認值None代替。

實例

import reline = "This is the last one"res = re.match( r'(.*) is (.*?) .*', line, re.M|re.I)if res: print "res.group() : ", res.group() print "res.group(1) : ", res.group(1) print "res.group(2) : ", res.group(2) print "res.groups() : ", res.groups()else: print "No match!!"

re.M|re.I：這兩參數表示多行匹配|不區分大小寫，同時生效。

細節實例：

>>> re.match(r'.*','.*g3jl/nok').group()'.*g3jl'

.（點）表示除換行符以外的任意一個字符，*（星號）表示匹配前面一個字符0次1次或多次，這兩聯合起來使用表示匹配除換行符意外的任意多個字符，所以出現以上的結果。

1、re.match(r'.*..', '..').group()'..'2、>>> re.match(r'.*g.','.*g3jlok').group()'.*g3'3、>>> re.match(r'.*...', '..').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

上面兩例子為什么有結果呢？這是因為第一個例子.*..中的.*匹配了0次，后面的..匹配字符串中..，而第二個例子中的 .* 匹配了一次，匹配字符串中的 .*，g匹配了后面的g字符，最后一個.號匹配了。
為什么第三個例子沒有匹配到結果呢？這是因為就算正則表達式中的 .* 匹配0次，后面的三個點也不能完全匹配原字符串中的兩個點，所以匹配失敗了。
從上面幾個例子可以看出，只有當正則表達式中要匹配的字符數小于等于原字符串中的字符數，才能匹配出結果。并且 “.*” 在匹配的過程中會回溯，先匹配0次，如果整個表達式能匹配成功，再匹配一次，如果還是能匹配，那就匹配兩次，這樣一次下去，直到不能匹配成功時，返回最近一次匹配成功的結果，這就是”.*”的貪婪性。

匹配Python中的標識符：

>>> re.match(r'^[a-zA-Z|_][/w_]*','_1name1').group()'_1name1'>>> re.match(r'^[a-zA-Z|_][/w_]*','_name1').group()'_name1'>>> re.match(r'^[a-zA-Z|_][/w_]*','num').group()'num'>>> re.match(r'^[a-zA-Z|_][/w_]*','1num').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

re.search()

函數原型：

search(pattern, string, flags=0)
Scan through string looking for a match to the pattern,
returning a match object, or None if no match was found.

函數作用：

掃描整個字符串并返回第一次成功的匹配對象，如果匹配失敗，則返回None。

參數說明：

pattern：匹配的正則表達式
string：要匹配的字符串
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

跟re.match函數一樣，使用group()和groups()方法來獲取匹配后的結果。

>>> re.search(r'[abc]/*/d{2}','12a*23Gb*12ad').group()'a*23'

從匹配結果看出，re.search返回了第一次匹配成功的結果'a*23'，如果盡可能多的匹配的話，還可以匹配后面的'b*12'。

re.match與re.search的區別

re.match只匹配字符串的開始，如果字符串開始不符合正則表達式，則匹配失敗，函數返回None；而re.search匹配整個字符串，直到找到一個匹配，否則也返回None。

>>> re.match(r'(.*)(are)',"Cats are smarter than dogs").group(2)'are'>>> re.search(r'(are)+',"Cats are smarter than dogs").group()'are'

上面兩個例子是等價的。

re.sub()

Python的re模塊中提供了re.sub()函數用于替換字符串中的匹配項，如果沒有匹配的項則字符串將沒有匹配的返回。

函數原型：

sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl. repl can be either a string or a callable;
    if a string, backslash escapes in it are processed. If it is
    a callable, it's passed the match object and must return
    a replacement string to be used.

參數說明：

pattern：匹配的正則表達式
repl：用于替換的字符串
string：要被替換的字符串
count：替換的次數，如果為0表示替換所有匹配到的字串，如果是1表示替換1次等,該參數必須是非負整數，默認為0。
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

實例

將手機號的后4位替換成0

>>> re.sub('/d{4}$','0000','13549876489')'13549870000'

將代碼后面的注釋信息去掉

>>> re.sub('#.*$','', 'num = 0 #a number')'num = 0 '

re.split()

函數原型：

split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

函數作用：

分割字符串，將字符串用給定的正則表達式匹配的字符串進行分割，分割后返回結果list。

參數說明：

pattern：匹配的正則表達式
string：被分割的字符串
maxsplit：最大的分割次數
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

re.findall()

函數原型：

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.
    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.
    Empty matches are included in the result.

函數的作用：

獲取字符串中所有匹配的字符串，并以列表的形式返回。列表中的元素有如下幾種情況：

當正則表達式中含有多個圓括號()時，列表的元素為多個字符串組成的元組，而且元組中字符串個數與括號對數相同，并且字符串排放順序跟括號出現的順序一致（一般看左括號'(‘就行），字符串內容與每個括號內的正則表達式想對應。
當正則表達式中只帶有一個圓括號時，列表中的元素為字符串，并且該字符串的內容與括號中的正則表達式相對應。（注意：列表中的字符串只是圓括號中的內容，不是整個正則表達式所匹配的內容。）
當正則表達式中沒有圓括號時，列表中的字符串表示整個正則表達式匹配的內容。

參數說明：

pattern：匹配的正則表達式
string：被分割的字符串
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

實例：

1、匹配字符串中所有含有'oo'字符的單詞

#正則表達式中沒有括號>>> re.findall(r'/w*oo/w*', 'woo this foo is too')['woo', 'foo', 'too']

從結果可以看出，當正則表達式中沒有圓括號時，列表中的字符串表示整個正則表達式匹配的內容

2、獲取字符串中所有的數字字符串

#正則表達式中只有1個括號>>> re.findall(r'.*?(/d+).*?','adsd12343.jl34d5645fd789')['12343', '34', '5645', '789']

從上面結果可以看出，當正則表達式中只帶有一個圓括號時，列表中的元素為字符串，并且該字符串的內容與括號中的正則表達式相對應。

3、提取字符串中所有的有效的域名地址

#正則表達式中有多個括號時>>> add = 'https://www.net.com.edu//action=?asdfsd and other https://www.baidu.com//a=b'>>> re.findall(r'((w{3}/.)(/w+/.)+(com|edu|cn|net))',add)[('www.net.com.edu', 'www.', 'com.', 'edu'), ('www.baidu.com', 'www.', 'baidu.','com')]

從執行結果可以看出，正則表達式中有多個圓括號時，返回匹配成功的列表中的每一個元素都是由一次匹配成功后，正則表達式中所有括號中匹配的內容組成的元組。

re.finditer()

函數原型：

finditer(pattern, string, flags=0)
Return an iterator over all non-overlapping matches in the string. For each match, the iterator
returns a match object.
Empty matches are included in the result.

函數作用：

跟re.findall()函數一樣，匹配字符串中所有滿足的字串，只是返回的是一個迭代器，而不是一個像findall函數那樣存有所有結果的list，這個迭代器里面存的是每一個結果的一個匹配對象，這樣可以節省空間，一般用在需要匹配大量的結果時，類似于range和xrange的區別。

參數說明：

pattern：匹配的正則表達式
string：被分割的字符串
flags：標志位，用于控制正則表達式的匹配方式。如是否區分大小寫、是否多行匹配等。

如：匹配字符串中所有的數字字串

>>> for i in re.finditer(r'/d+','one12two34three56four') :...  print i.group(),...12 34 56

start()

返回匹配的起始位置。如：

>>> re.search(r'/d+', 'asdf13df234').start()

注意，索引位置是從0開始計數的。

end()

返回匹配結束的下一個位置。如：

>>> re.search(r'/d+', 'asdf13df234').end()

span()

返回匹配的區間，左閉右開。如：

>>> re.search(r'/d+', 'asdf13df234').span()(4, 6)

re.compile()

函數原型：

compile(pattern, flags=0)
Compile a regular expression pattern, returning a pattern object.

函數作用：

編譯一個正則表達式語句，并返回編譯后的正則表達式對象。
這樣我們就可以將那些經常使用的正則表達式編譯成正則表達式對象，可以提高一定的效率。如：
一句話包含五個英文單詞，長度不一定，用空格分割，請把五個單詞匹配出來

>>> s = "this is  a python test">>> p = re.compile('/w+') #編譯正則表達式，獲得其對象>>> res = p.findall(s)#用正則表達式對象去匹配內容>>> print res['this', 'is', 'a', 'python', 'test']

希望本文所述對大家Python程序設計有所幫助。

上一篇：Python實現好友全頭像的拼接實例(推薦)

下一篇：Python正則表達式分組概念與用法詳解