Python 正则表达式替换与分割
正则替换和分割是文本处理的核心操作,提供灵活的模式匹配能力。
sub 基本替换
Python
import re
text = "Hello World, hello Python"
# 替换所有匹配
result = re.sub(r"hello", "Hi", text, flags=re.IGNORECASE)
print(result) # Hi World, Hi Python
# 指定替换次数
result = re.sub(r"hello", "Hi", text, count=1, flags=re.IGNORECASE)
print(result) # Hi World, hello Python
使用分组替换
Python
import re
# 重组日期格式
text = "2024-05-19"
result = re.sub(r"(\d{4})-(\d{2})-(\d{2})", r"\2/\3/\1", text)
print(result) # 05/19/2024
# 使用命名分组
text = "John Smith"
result = re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", text)
print(result) # Smith, John
回调函数替换
Python
import re
def uppercase(match):
return match.group(0).upper()
text = "hello world"
result = re.sub(r"\b\w+", uppercase, text)
print(result) # HELLO WORLD
# 根据内容决定替换
def smart_replace(match):
word = match.group(0)
if len(word) > 5:
return word.upper()
return word
text = "hello beautiful world"
result = re.sub(r"\b\w+", smart_replace, text)
print(result) # hello BEAUTIFUL world
subn 替换计数
Python
import re
text = "a1 a2 a3 a4 a5"
result, count = re.subn(r"a\d", "X", text)
print(result) # X X X X X
print(count) # 5(替换次数)
split 基本分割
Python
import re
# 按空格分割
text = "hello world\tpython"
words = re.split(r"\s+", text)
print(words) # ['hello', 'world', 'python']
# 按多种分隔符分割
text = "a,b;c d"
parts = re.split(r"[,; ]", text)
print(parts) # ['a', 'b', 'c', 'd']
split 捕获分隔符
Python
import re
# 普通分割,分隔符不保留
text = "a,b,c"
parts = re.split(r",", text)
print(parts) # ['a', 'b', 'c']
# 分组分割,分隔符保留
parts = re.split(r"(,)", text)
print(parts) # ['a', ',', 'b', ',', 'c']
split 限制分割次数
Python
import re
text = "a,b,c,d,e"
parts = re.split(r",", text, maxsplit=2)
print(parts) # ['a', 'b', 'c,d,e']
split 处理边界
Python
import re
# 开头或结尾的分隔符
text = ",a,b,c,"
parts = re.split(r",", text)
print(parts) # ['', 'a', 'b', 'c', '']
# 过滤空字符串
parts = [p for p in re.split(r",", text) if p]
print(parts) # ['a', 'b', 'c']
复杂替换示例
Python
import re
# HTML 标签清理
text = "<p>Hello</p><div>World</div>"
result = re.sub(r"<[^>]+>", "", text)
print(result) # HelloWorld
# URL 参数处理
text = "key1=value1&key2=value2"
result = re.sub(r"(\w+)=(\w+)", r"\1: \2", text)
print(result) # key1: value1&key2: value2
# 手机号脱敏
text = "联系电话: 13812345678"
result = re.sub(r"(\d{3})(\d{4})(\d{4})", r"\1****\3", text)
print(result) # 联系电话: 138****5678
复杂分割示例
Python
import re
# 解析配置文件
config = "name=Alice;age=25;city=Beijing"
pairs = re.split(r";", config)
for pair in pairs:
key, value = re.split(r"=", pair)
print(f"{key}: {value}")
# 多级分隔
text = "a:b;c:d"
parts = re.split(r"[;]", text)
for part in parts:
sub_parts = re.split(r"[:]", part)
print(sub_parts)
方法对比
| 方法 | 功能 | 返回值 |
|---|---|---|
| sub | 替换所有匹配 | 新字符串 |
| subn | 替换并计数 | (新字符串, 次数) |
| split | 按模式分割 | 列表 |
要点总结
sub(pattern, repl, string)替换匹配内容\n或\g<name>在替换中引用分组- 回调函数实现动态替换逻辑
subn返回替换结果和次数split(pattern, string)按模式分割- 分组分隔符保留在结果中
maxsplit限制分割次数- 替换和分割是文本处理的核心操作
📝 发现内容有误?点击此处直接编辑