re — Regular operations

Source code: Lib/re.py


This module provides regular matching operations similar to those found in Perl.

Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

Regular s use the backslash character (\'\\\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write \'\\\\\\\\\' as the pattern string, because the regular must be \\\\, and each backslash must be expressed as \\\\ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular patterns; backslashes are not handled in any special way in a string literal prefixed with \'r\'. So r\"\\n\" is a two-character string containing \'\\\' and \'n\', while \"\\n\" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

It is important to note that most regular operations are available as module-level functions and methods on compiled regular s. The functions are shortcuts that don’t require you to compile a regex first, but miss some fine-tuning parameters.

See also

The third-party regex module, which has an API compatible with the standard library re module, but offers additional functionality and a more thorough Unicode support.

收藏 打印
您的足迹: