From 582b500cd996c96054615870fd13d6ab0ea77428 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sat, 10 Oct 2009 15:10:05 +0000 Subject: start integrating windows port git-svn-id: svn+q:///qpdf/trunk@757 71b93d88-0707-0410-a8cf-f5a4172ac649 --- external-libs/pcre/doc/pcreposix.3 | 194 +++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 external-libs/pcre/doc/pcreposix.3 (limited to 'external-libs/pcre/doc/pcreposix.3') diff --git a/external-libs/pcre/doc/pcreposix.3 b/external-libs/pcre/doc/pcreposix.3 new file mode 100644 index 00000000..5198630f --- /dev/null +++ b/external-libs/pcre/doc/pcreposix.3 @@ -0,0 +1,194 @@ +.TH PCRE 3 +.SH NAME +PCRE - Perl-compatible regular expressions. +.SH SYNOPSIS OF POSIX API +.B #include +.PP +.SM +.br +.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR, +.ti +5n +.B int \fIcflags\fR); +.PP +.br +.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR, +.ti +5n +.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR); +.PP +.br +.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR, +.ti +5n +.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR); +.PP +.br +.B void regfree(regex_t *\fIpreg\fR); + +.SH DESCRIPTION +.rs +.sp +This set of functions provides a POSIX-style API to the PCRE regular expression +package. See the +.\" HREF +\fBpcreapi\fR +.\" +documentation for a description of the native API, which contains additional +functionality. + +The functions described here are just wrapper functions that ultimately call +the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fR +header file, and on Unix systems the library itself is called +\fBpcreposix.a\fR, so can be accessed by adding \fB-lpcreposix\fR to the +command for linking an application which uses them. Because the POSIX functions +call the native ones, it is also necessary to add \fR-lpcre\fR. + +I have implemented only those option bits that can be reasonably mapped to PCRE +native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined +with the value zero. They have no effect, but since programs that are written +to the POSIX interface often use them, this makes it easier to slot in PCRE as +a replacement library. Other POSIX options are not even defined. + +When PCRE is called via these functions, it is only the API that is POSIX-like +in style. The syntax and semantics of the regular expressions themselves are +still those of Perl, subject to the setting of various PCRE options, as +described below. "POSIX-like in style" means that the API approximates to the +POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding +domains it is probably even less compatible. + +The header for these functions is supplied as \fBpcreposix.h\fR to avoid any +potential clash with other POSIX libraries. It can, of course, be renamed or +aliased as \fBregex.h\fR, which is the "correct" name. It provides two +structure types, \fIregex_t\fR for compiled internal forms, and +\fIregmatch_t\fR for returning captured substrings. It also defines some +constants whose names start with "REG_"; these are used for setting options and +identifying error codes. + +.SH COMPILING A PATTERN +.rs +.sp +The function \fBregcomp()\fR is called to compile a pattern into an +internal form. The pattern is a C string terminated by a binary zero, and +is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer +to a regex_t structure which is used as a base for storing information about +the compiled expression. + +The argument \fIcflags\fR is either zero, or contains one or more of the bits +defined by the following macros: + + REG_ICASE + +The PCRE_CASELESS option is set when the expression is passed for compilation +to the native function. + + REG_NEWLINE + +The PCRE_MULTILINE option is set when the expression is passed for compilation +to the native function. Note that this does \fInot\fR mimic the defined POSIX +behaviour for REG_NEWLINE (see the following section). + +In the absence of these flags, no options are passed to the native function. +This means the the regex is compiled with PCRE default semantics. In +particular, the way it handles newline characters in the subject string is the +Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only +\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way +newlines are matched by . (they aren't) or by a negative class such as [^a] +(they are). + +The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The +\fIpreg\fR structure is filled in on success, and one member of the structure +is public: \fIre_nsub\fR contains the number of capturing subpatterns in +the regular expression. Various error codes are defined in the header file. + +.SH MATCHING NEWLINE CHARACTERS +.rs +.sp +This area is not simple, because POSIX and Perl take different views of things. +It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never +intended to be a POSIX engine. The following table lists the different +possibilities for matching newline characters in PCRE: + + Default Change with + + . matches newline no PCRE_DOTALL + newline matches [^a] yes not changeable + $ matches \\n at end yes PCRE_DOLLARENDONLY + $ matches \\n in middle no PCRE_MULTILINE + ^ matches \\n in middle no PCRE_MULTILINE + +This is the equivalent table for POSIX: + + Default Change with + + . matches newline yes REG_NEWLINE + newline matches [^a] yes REG_NEWLINE + $ matches \\n at end no REG_NEWLINE + $ matches \\n in middle no REG_NEWLINE + ^ matches \\n in middle no REG_NEWLINE + +PCRE's behaviour is the same as Perl's, except that there is no equivalent for +PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop +newline from matching [^a]. + +The default POSIX newline handling can be obtained by setting PCRE_DOTALL and +PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the +REG_NEWLINE action. + +.SH MATCHING A PATTERN +.rs +.sp +The function \fBregexec()\fR is called to match a pre-compiled pattern +\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte, +subject to the options in \fIeflags\fR. These can be: + + REG_NOTBOL + +The PCRE_NOTBOL option is set when calling the underlying PCRE matching +function. + + REG_NOTEOL + +The PCRE_NOTEOL option is set when calling the underlying PCRE matching +function. + +The portion of the string that was matched, and also any captured substrings, +are returned via the \fIpmatch\fR argument, which points to an array of +\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members +\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of +each substring and the offset to the first character after the end of each +substring, respectively. The 0th element of the vector relates to the entire +portion of \fIstring\fR that was matched; subsequent elements relate to the +capturing subpatterns of the regular expression. Unused entries in the array +have both structure members set to -1. + +A successful match yields a zero return; various error codes are defined in the +header file, of which REG_NOMATCH is the "expected" failure code. + +.SH ERROR MESSAGES +.rs +.sp +The \fBregerror()\fR function maps a non-zero errorcode from either +\fBregcomp()\fR or \fBregexec()\fR to a printable message. If \fIpreg\fR is not +NULL, the error should have arisen from the use of that structure. A message +terminated by a binary zero is placed in \fIerrbuf\fR. The length of the +message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the +function is the size of buffer needed to hold the whole message. + +.SH STORAGE +.rs +.sp +Compiling a regular expression causes memory to be allocated and associated +with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such +memory, after which \fIpreg\fR may no longer be used as a compiled expression. + +.SH AUTHOR +.rs +.sp +Philip Hazel +.br +University Computing Service, +.br +Cambridge CB2 3QG, England. + +.in 0 +Last updated: 03 February 2003 +.br +Copyright (c) 1997-2003 University of Cambridge. -- cgit v1.2.3-70-g09d2