Python正則表達式和元字符詳解

2020-01-04 13:58:57

字體：大中小

來源：轉載

供稿：網友

正則表達式

正則表達式是一種強大的字符串操作工具。它是一種領域特定語言 (DSL)，不管是 Python 還是在大多數現代編程語言中都是作為庫存在。

它們主要面向兩種任務：

- 驗證字符串是否與模式匹配（例如，字符串具有電子郵件地址的格式）。
- 在字符串中執行替換（例如將所有大寫字母改成小寫字母）。

特定于領域的語言是高度專業化的迷你編程語言。

正則表達式是一個例子，SQL（用于數據庫操作）是另一個例子。

私有領域特定語言通常用于特定的工業目的。

Python 的正則表達式可以使用 re 模塊訪問，re 模塊是標準庫的一部分。

當你定義一個正則表達式，可以使用 re.match 函數用于確定是否匹配字符串的開始部分。如果匹配則 match 函數返回表示匹配的對象，如果不匹配則返回 None。

為了避免在處理正則表達式時出現混淆，我們將 r 添加到字符串前綴。該字符串不需要轉義任何東西，使得正則表達式的使用變得更容易。

from re import matchmsg = r"super"if match(msg,"superman!"): print("You are True")else: print("Occur an error! Foolish...")

運行結果：

>>>
You are True
>>>

上面的例子檢查模式 super 是否匹配字符串，如果匹配，則打印 You are True。

這里的模式是一種簡單的單詞，但是有些字符串，在正則表達式中使用它們時會有特殊的意義。

匹配模式的其他函數有 re.match 和 re.findall。

re.match 在字符串中找到匹配。
re.findall 返回一個包含匹配的列表。

import restring = "Hello python!Hello python!Hello python!"pattern = r".python."print(re.match(pattern,string))print(re.findall(pattern,string))

運行結果：

>>>
None
[' python!', ' python!', ' python!']
>>>

從上面的示例中，我們可以得出：

match() 函數是從內容的第一個字符開始匹配，如果匹配不到，就得到None
findall() 函數從全部內容匹配，如果有多個，找出所有匹配的

函數 re.finditer 執行與 re.findall 相同的操作，但它返回一個迭代器，而不是一個列表。

正則表達式的 search 函數返回一個對象，包含幾個更詳細的信息。

此方法包括返回字符串匹配的值，返回第一次匹配的開始和結束位置，以及以元組形式返回第一個匹配的開始和結束位置的 span 函數。

import restring = "Hello python!Hello python!Hello python!"pattern = r".python."match = re.search(pattern,string)if match: print(match.group()) print(match.start()) print(match.end()) print(match.span())

運行結果：

>>>
python!
5
13
(5, 13)
>>>

查找和替換

sub 是正則表達式里非常重要的函數。表達式：

re.sub(pattern, repl, string, count=0, flags=0)

pattern：表示正則表達式中的模式字符串；
repl：被替換的字符串（既可以是字符串，也可以是函數）；
string：要被處理的，要被替換的字符串；
count：匹配的次數, 默認是全部替換
flags：具體用處不詳

import restring = "Hello python!Hello python!Hello python!"pattern = r"python"newstr = re.sub(pattern,"Java",string)print(newstr)

運行結果：

>>>
Hello Java!Hello Java!Hello Java!
>>>

元字符

元字符使正則表達式比普通字符串方法更強大。它們允許您創建正則表達式來表示諸如一個或多個數字的匹配。
如果要創建與元字符 (如 $) 匹配的正則表達式，元字符的存在就會產生問題。您可以通過在元字符前面添加反斜杠來轉義元字符。

但是這可能會導致問題，因為反斜杠在普通 Python 字符串中也有轉義函數。這可能意味著可能將三個或四個反斜杠排成一行來執行所有轉義操作。

為了避免這種情況，您可以使用一個原始字符串，它是一個普通字符串，前面有一個 "r" 前綴。

元字符點，用來表示匹配除了換行外的任何字符。

import restring1 = "Hello python!Hello python!Hello python!"string2 = "pythan,1234587pythoi"string3 = r"hello"pattern = r"pyth.n"match1 = re.search(pattern,string1)match2 = re.search(pattern,string2)match3 = re.search(pattern,string3)if match1: print(match1.group()) print("match 1")if match2: print(match1.group()) print("match 2")if match3: print(match3.group()) print("match 3")

運行結果：

>>>
python
match 1
python
match 2
>>>

^ 表示匹配開始，$ 表示匹配結束。

import restring1="python"string2="pythan,1234587pythoi"string3="hello"pattern=r"^pyth.n$"match1 = re.search(pattern,string1)match2 = re.search(pattern,string2)match3 = re.search(pattern,string3)if match1: print(match1.group()) print("match 1")if match2: print(match1.group()) print("match 2")if match3: print(match3.group()) print("match 3")

運行結果：