Python 正则表达式替换与分割

正则替换和分割是文本处理的核心操作，提供灵活的模式匹配能力。

sub 基本替换

Python

import re

text = "Hello World, hello Python"
# 替换所有匹配
result = re.sub(r"hello", "Hi", text, flags=re.IGNORECASE)
print(result)  # Hi World, Hi Python

# 指定替换次数
result = re.sub(r"hello", "Hi", text, count=1, flags=re.IGNORECASE)
print(result)  # Hi World, hello Python

使用分组替换

Python

import re

# 重组日期格式
text = "2024-05-19"
result = re.sub(r"(\d{4})-(\d{2})-(\d{2})", r"\2/\3/\1", text)
print(result)  # 05/19/2024

# 使用命名分组
text = "John Smith"
result = re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", text)
print(result)  # Smith, John

回调函数替换

Python

import re

def uppercase(match):
    return match.group(0).upper()

text = "hello world"
result = re.sub(r"\b\w+", uppercase, text)
print(result)  # HELLO WORLD

# 根据内容决定替换
def smart_replace(match):
    word = match.group(0)
    if len(word) > 5:
        return word.upper()
    return word

text = "hello beautiful world"
result = re.sub(r"\b\w+", smart_replace, text)
print(result)  # hello BEAUTIFUL world

subn 替换计数

Python

import re

text = "a1 a2 a3 a4 a5"
result, count = re.subn(r"a\d", "X", text)
print(result)  # X X X X X
print(count)   # 5（替换次数）

split 基本分割

Python

import re

# 按空格分割
text = "hello   world\tpython"
words = re.split(r"\s+", text)
print(words)  # ['hello', 'world', 'python']

# 按多种分隔符分割
text = "a,b;c d"
parts = re.split(r"[,; ]", text)
print(parts)  # ['a', 'b', 'c', 'd']

split 捕获分隔符

Python

import re

# 普通分割，分隔符不保留
text = "a,b,c"
parts = re.split(r",", text)
print(parts)  # ['a', 'b', 'c']

# 分组分割，分隔符保留
parts = re.split(r"(,)", text)
print(parts)  # ['a', ',', 'b', ',', 'c']

split 限制分割次数

Python

import re

text = "a,b,c,d,e"
parts = re.split(r",", text, maxsplit=2)
print(parts)  # ['a', 'b', 'c,d,e']

split 处理边界

Python

import re

# 开头或结尾的分隔符
text = ",a,b,c,"
parts = re.split(r",", text)
print(parts)  # ['', 'a', 'b', 'c', '']

# 过滤空字符串
parts = [p for p in re.split(r",", text) if p]
print(parts)  # ['a', 'b', 'c']

复杂替换示例

Python

import re

# HTML 标签清理
text = "<p>Hello</p><div>World</div>"
result = re.sub(r"<[^>]+>", "", text)
print(result)  # HelloWorld

# URL 参数处理
text = "key1=value1&key2=value2"
result = re.sub(r"(\w+)=(\w+)", r"\1: \2", text)
print(result)  # key1: value1&key2: value2

# 手机号脱敏
text = "联系电话: 13812345678"
result = re.sub(r"(\d{3})(\d{4})(\d{4})", r"\1****\3", text)
print(result)  # 联系电话: 138****5678

复杂分割示例

Python

import re

# 解析配置文件
config = "name=Alice;age=25;city=Beijing"
pairs = re.split(r";", config)
for pair in pairs:
    key, value = re.split(r"=", pair)
    print(f"{key}: {value}")

# 多级分隔
text = "a:b;c:d"
parts = re.split(r"[;]", text)
for part in parts:
    sub_parts = re.split(r"[:]", part)
    print(sub_parts)

方法对比

方法	功能	返回值
sub	替换所有匹配	新字符串
subn	替换并计数	(新字符串, 次数)
split	按模式分割	列表

要点总结

sub(pattern, repl, string) 替换匹配内容
\n 或 \g<name> 在替换中引用分组
回调函数实现动态替换逻辑
subn 返回替换结果和次数
split(pattern, string) 按模式分割
分组分隔符保留在结果中
maxsplit 限制分割次数
替换和分割是文本处理的核心操作

📝 发现内容有误？点击此处直接编辑