Python 正则表达式编译

re.compile 将正则表达式编译为 Pattern 对象，提升性能。

基本编译

Python

import re

# 编译正则表达式
pattern = re.compile(r"\d+")

# 使用编译后的模式
text = "abc 123 def 456"
matches = pattern.findall(text)
print(matches)  # ['123', '456']

# 相比直接调用 re.findall
matches = re.findall(r"\d+", text)  # 每次都需要编译

编译时指定标志

Python

import re

# 编译时指定多个标志
pattern = re.compile(r"hello", flags=re.IGNORECASE)

text = "Hello HELLO hello"
matches = pattern.findall(text)
print(matches)  # ['Hello', 'HELLO', 'hello']

# 组合标志
pattern = re.compile(r"^line", flags=re.MULTILINE | re.IGNORECASE)

Pattern 对象方法

Python

import re

pattern = re.compile(r"(\w+)@(\w+)\.(\w+)")

# 所有 re 模块方法都可用
text = "user@example.com"
match = pattern.search(text)
print(match.groups())  # ('user', 'example', 'com')

matches = pattern.findall("a@b.c d@e.f")
print(matches)  # [('a', 'b', 'c'), ('d', 'e', 'f')]

# split、sub 等方法
result = pattern.sub("EMAIL", "user@example.com")
print(result)  # EMAIL

性能对比

Python

import re
import time

text = "abc123def456" * 1000

# 不编译：每次重新解析正则
start = time.time()
for _ in range(1000):
    re.findall(r"\d+", text)
print(f"未编译: {time.time() - start:.4f}s")

# 编译：只需一次解析
pattern = re.compile(r"\d+")
start = time.time()
for _ in range(1000):
    pattern.findall(text)
print(f"已编译: {time.time() - start:.4f}s")

编译版本性能提升明显，特别是：

多次使用同一正则表达式
复杂的正则表达式
循环中的正则匹配

Pattern 属性

Python

import re

pattern = re.compile(r"(?P<name>\w+)")

print(pattern.pattern)  # (?P<name>\w+)（原始模式）
print(pattern.flags)    # 标志值

# 查看编译信息
print(repr(pattern))    # re.compile(...)

实用场景

Python

import re

# 定义常用正则模式
EMAIL_PATTERN = re.compile(r"\w+@\w+\.\w+")
PHONE_PATTERN = re.compile(r"1[3-9]\d{9}")
DATE_PATTERN = re.compile(r"\d{4}-\d{2}-\d{2}")

# 验证函数
def validate_email(text):
    return EMAIL_PATTERN.fullmatch(text) is not None

def validate_phone(text):
    return PHONE_PATTERN.fullmatch(text) is not None

print(validate_email("user@example.com"))  # True
print(validate_phone("13812345678"))       # True

模式复用

Python

import re

# 多函数共享同一模式
pattern = re.compile(r"<(\w+)>.*?</\1>")

def extract_tags(text):
    return pattern.findall(text)

def remove_tags(text):
    return pattern.sub("", text)

text = "<div>content</div>"
print(extract_tags(text))  # ['div']
print(remove_tags(text))   # content

编译时机

场景	建议
单次使用	直接调用 re 函数
多次使用	使用 compile
循环匹配	必须 compile
复杂正则	建议 compile
模块级别常量	compile 定义

要点总结

re.compile(pattern) 编译正则表达式
编译后的 Pattern 对象可调用所有 re 方法
编译时可指定 flags 标志
多次使用同一正则应编译提升性能
Pattern 对象有 pattern 和 flags 属性
模块级别定义编译模式便于复用
单次简单匹配无需编译，复杂场景必须编译

📝 发现内容有误？点击此处直接编辑