From f3bf8d3110b852b8f338898c3237d16a74360cf3 Mon Sep 17 00:00:00 2001
From: Jay Berkenbilt
-#include <pcreposix.h>
-
-int regcomp(regex_t *preg, const char *pattern,
-int cflags);
-
-int regexec(regex_t *preg, const char *string,
-size_t nmatch, regmatch_t pmatch[], int eflags);
-
-size_t regerror(int errcode, const regex_t *preg,
-char *errbuf, size_t errbuf_size);
-
-void regfree(regex_t *preg);
-
-This set of functions provides a POSIX-style API to the PCRE regular expression
-package. See the
-pcreapi
-documentation for a description of the native API, which contains additional
-functionality.
-
-The functions described here are just wrapper functions that ultimately call
-the PCRE native API. Their prototypes are defined in the pcreposix.h
-header file, and on Unix systems the library itself is called
-pcreposix.a, so can be accessed by adding -lpcreposix to the
-command for linking an application which uses them. Because the POSIX functions
-call the native ones, it is also necessary to add \fR-lpcre\fR.
-
-I have implemented only those option bits that can be reasonably mapped to PCRE
-native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
-with the value zero. They have no effect, but since programs that are written
-to the POSIX interface often use them, this makes it easier to slot in PCRE as
-a replacement library. Other POSIX options are not even defined.
-
-When PCRE is called via these functions, it is only the API that is POSIX-like
-in style. The syntax and semantics of the regular expressions themselves are
-still those of Perl, subject to the setting of various PCRE options, as
-described below. "POSIX-like in style" means that the API approximates to the
-POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
-domains it is probably even less compatible.
-
-The header for these functions is supplied as pcreposix.h to avoid any
-potential clash with other POSIX libraries. It can, of course, be renamed or
-aliased as regex.h, which is the "correct" name. It provides two
-structure types, regex_t for compiled internal forms, and
-regmatch_t for returning captured substrings. It also defines some
-constants whose names start with "REG_"; these are used for setting options and
-identifying error codes.
-
-The function regcomp() is called to compile a pattern into an
-internal form. The pattern is a C string terminated by a binary zero, and
-is passed in the argument pattern. The preg argument is a pointer
-to a regex_t structure which is used as a base for storing information about
-the compiled expression.
-
-The argument cflags is either zero, or contains one or more of the bits
-defined by the following macros:
-
-
-
-
-
SYNOPSIS OF POSIX API
-
DESCRIPTION
-
COMPILING A PATTERN
-
- REG_ICASE
-
-
-The PCRE_CASELESS option is set when the expression is passed for compilation -to the native function. -
--
- REG_NEWLINE -- -
-The PCRE_MULTILINE option is set when the expression is passed for compilation -to the native function. Note that this does not mimic the defined POSIX -behaviour for REG_NEWLINE (see the following section). -
--In the absence of these flags, no options are passed to the native function. -This means the the regex is compiled with PCRE default semantics. In -particular, the way it handles newline characters in the subject string is the -Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only -some of the effects specified for REG_NEWLINE. It does not affect the way -newlines are matched by . (they aren't) or by a negative class such as [^a] -(they are). -
--The yield of regcomp() is zero on success, and non-zero otherwise. The -preg structure is filled in on success, and one member of the structure -is public: re_nsub contains the number of capturing subpatterns in -the regular expression. Various error codes are defined in the header file. -
--This area is not simple, because POSIX and Perl take different views of things. -It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never -intended to be a POSIX engine. The following table lists the different -possibilities for matching newline characters in PCRE: -
--
- Default Change with -- -
-
- . matches newline no PCRE_DOTALL - newline matches [^a] yes not changeable - $ matches \n at end yes PCRE_DOLLARENDONLY - $ matches \n in middle no PCRE_MULTILINE - ^ matches \n in middle no PCRE_MULTILINE -- -
-This is the equivalent table for POSIX: -
--
- Default Change with -- -
-
- . matches newline yes REG_NEWLINE - newline matches [^a] yes REG_NEWLINE - $ matches \n at end no REG_NEWLINE - $ matches \n in middle no REG_NEWLINE - ^ matches \n in middle no REG_NEWLINE -- -
-PCRE's behaviour is the same as Perl's, except that there is no equivalent for -PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop -newline from matching [^a]. -
--The default POSIX newline handling can be obtained by setting PCRE_DOTALL and -PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the -REG_NEWLINE action. -
--The function regexec() is called to match a pre-compiled pattern -preg against a given string, which is terminated by a zero byte, -subject to the options in eflags. These can be: -
--
- REG_NOTBOL -- -
-The PCRE_NOTBOL option is set when calling the underlying PCRE matching -function. -
--
- REG_NOTEOL -- -
-The PCRE_NOTEOL option is set when calling the underlying PCRE matching -function. -
--The portion of the string that was matched, and also any captured substrings, -are returned via the pmatch argument, which points to an array of -nmatch structures of type regmatch_t, containing the members -rm_so and rm_eo. These contain the offset to the first character of -each substring and the offset to the first character after the end of each -substring, respectively. The 0th element of the vector relates to the entire -portion of string that was matched; subsequent elements relate to the -capturing subpatterns of the regular expression. Unused entries in the array -have both structure members set to -1. -
--A successful match yields a zero return; various error codes are defined in the -header file, of which REG_NOMATCH is the "expected" failure code. -
--The regerror() function maps a non-zero errorcode from either -regcomp() or regexec() to a printable message. If preg is not -NULL, the error should have arisen from the use of that structure. A message -terminated by a binary zero is placed in errbuf. The length of the -message, including the zero, is limited to errbuf_size. The yield of the -function is the size of buffer needed to hold the whole message. -
--Compiling a regular expression causes memory to be allocated and associated -with the preg structure. The function regfree() frees all such -memory, after which preg may no longer be used as a compiled expression. -
-
-Philip Hazel <ph10@cam.ac.uk>
-
-University Computing Service,
-
-Cambridge CB2 3QG, England.
-
-Last updated: 03 February 2003
-
-Copyright © 1997-2003 University of Cambridge.
--
cgit v1.2.3-54-g00ecf