From 582b500cd996c96054615870fd13d6ab0ea77428 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sat, 10 Oct 2009 15:10:05 +0000 Subject: start integrating windows port git-svn-id: svn+q:///qpdf/trunk@757 71b93d88-0707-0410-a8cf-f5a4172ac649 --- external-libs/pcre/doc/pcretest.txt | 357 ++++++++++++++++++++++++++++++++++++ 1 file changed, 357 insertions(+) create mode 100644 external-libs/pcre/doc/pcretest.txt (limited to 'external-libs/pcre/doc/pcretest.txt') diff --git a/external-libs/pcre/doc/pcretest.txt b/external-libs/pcre/doc/pcretest.txt new file mode 100644 index 00000000..0e9cd138 --- /dev/null +++ b/external-libs/pcre/doc/pcretest.txt @@ -0,0 +1,357 @@ +PCRETEST(1) PCRETEST(1) + + + +NAME + pcretest - a program for testing Perl-compatible regular expressions. + +SYNOPSIS + pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination] + + pcretest was written as a test program for the PCRE regular expression + library itself, but it can also be used for experimenting with regular + expressions. This document describes the features of the test program; + for details of the regular expressions themselves, see the pcrepattern + documentation. For details of PCRE and its options, see the pcreapi + documentation. + + +OPTIONS + + + -C Output the version number of the PCRE library, and all avail- + able information about the optional features that are + included, and then exit. + + -d Behave as if each regex had the /D modifier (see below); the + internal form is output after compilation. + + -i Behave as if each regex had the /I modifier; information + about the compiled pattern is given after compilation. + + -m Output the size of each compiled pattern after it has been + compiled. This is equivalent to adding /M to each regular + expression. For compatibility with earlier versions of + pcretest, -s is a synonym for -m. + + -o osize Set the number of elements in the output vector that is used + when calling PCRE to be osize. The default value is 45, which + is enough for 14 capturing subexpressions. The vector size + can be changed for individual matching calls by including \O + in the data line (see below). + + -p Behave as if each regex has /P modifier; the POSIX wrapper + API is used to call PCRE. None of the other options has any + effect when -p is set. + + -t Run each compile, study, and match many times with a timer, + and output resulting time per compile or match (in millisec- + onds). Do not set -t with -m, because you will then get the + size output 20000 times and the timing will be distorted. + + +DESCRIPTION + + If pcretest is given two filename arguments, it reads from the first + and writes to the second. If it is given only one filename argument, it + reads from that file and writes to stdout. Otherwise, it reads from + stdin and writes to stdout, and prompts for each line of input, using + "re>" to prompt for regular expressions, and "data>" to prompt for data + lines. + + The program handles any number of sets of input on a single input file. + Each set starts with a regular expression, and continues with any num- + ber of data lines to be matched against the pattern. + + Each line is matched separately and independently. If you want to do + multiple-line matches, you have to use the \n escape sequence in a sin- + gle line of input to encode the newline characters. The maximum length + of data line is 30,000 characters. + + An empty line signals the end of the data lines, at which point a new + regular expression is read. The regular expressions are given enclosed + in any non-alphameric delimiters other than backslash, for example + + /(a|bc)x+yz/ + + White space before the initial delimiter is ignored. A regular expres- + sion may be continued over several input lines, in which case the new- + line characters are included within it. It is possible to include the + delimiter within the pattern by escaping it, for example + + /abc\/def/ + + If you do so, the escape and the delimiter form part of the pattern, + but since delimiters are always non-alphameric, this does not affect + its interpretation. If the terminating delimiter is immediately fol- + lowed by a backslash, for example, + + /abc/\ + + then a backslash is added to the end of the pattern. This is done to + provide a way of testing the error condition that arises if a pattern + finishes with a backslash, because + + /abc\/ + + is interpreted as the first line of a pattern that starts with "abc/", + causing pcretest to read the next line as a continuation of the regular + expression. + + +PATTERN MODIFIERS + + The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS, + PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. + For example: + + /caseless/i + + These modifier letters have the same effect as they do in Perl. There + are others that set PCRE options that do not correspond to anything in + Perl: /A, /E, /N, /U, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, + PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA respectively. + + Searching for all possible matches within each subject string can be + requested by the /g or /G modifier. After finding a match, PCRE is + called again to search the remainder of the subject string. The differ- + ence between /g and /G is that the former uses the startoffset argument + to pcre_exec() to start searching at a new point within the entire + string (which is in effect what Perl does), whereas the latter passes + over a shortened substring. This makes a difference to the matching + process if the pattern begins with a lookbehind assertion (including \b + or \B). + + If any call to pcre_exec() in a /g or /G sequence matches an empty + string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED + flags set in order to search for another, non-empty, match at the same + point. If this second match fails, the start offset is advanced by + one, and the normal match is retried. This imitates the way Perl han- + dles such cases when using the /g modifier or the split() function. + + There are a number of other modifiers for controlling the way pcretest + operates. + + The /+ modifier requests that as well as outputting the substring that + matched the entire pattern, pcretest should in addition output the + remainder of the subject string. This is useful for tests where the + subject contains multiple copies of the same substring. + + The /L modifier must be followed directly by the name of a locale, for + example, + + /pattern/Lfr + + For this reason, it must be the last modifier letter. The given locale + is set, pcre_maketables() is called to build a set of character tables + for the locale, and this is then passed to pcre_compile() when compil- + ing the regular expression. Without an /L modifier, NULL is passed as + the tables pointer; that is, /L applies only to the expression on which + it appears. + + The /I modifier requests that pcretest output information about the + compiled expression (whether it is anchored, has a fixed first charac- + ter, and so on). It does this by calling pcre_fullinfo() after compil- + ing an expression, and outputting the information it gets back. If the + pattern is studied, the results of that are also output. + + The /D modifier is a PCRE debugging feature, which also assumes /I. It + causes the internal form of compiled regular expressions to be output + after compilation. If the pattern was studied, the information returned + is also output. + + The /S modifier causes pcre_study() to be called after the expression + has been compiled, and the results used when the expression is matched. + + The /M modifier causes the size of memory block used to hold the com- + piled pattern to be output. + + The /P modifier causes pcretest to call PCRE via the POSIX wrapper API + rather than its native API. When this is done, all other modifiers + except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present, + and REG_NEWLINE is set if /m is present. The wrapper functions force + PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set. + + The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option + set. This turns on support for UTF-8 character handling in PCRE, pro- + vided that it was compiled with this support enabled. This modifier + also causes any non-printing characters in output strings to be printed + using the \x{hh...} notation if they are valid UTF-8 sequences. + + If the /? modifier is used with /8, it causes pcretest to call + pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the + checking of the string for UTF-8 validity. + + +CALLOUTS + + If the pattern contains any callout requests, pcretest's callout func- + tion will be called. By default, it displays the callout number, and + the start and current positions in the text at the callout time. For + example, the output + + --->pqrabcdef + 0 ^ ^ + + indicates that callout number 0 occurred for a match attempt starting + at the fourth character of the subject string, when the pointer was at + the seventh character. The callout function returns zero (carry on + matching) by default. + + Inserting callouts may be helpful when using pcretest to check compli- + cated regular expressions. For further information about callouts, see + the pcrecallout documentation. + + For testing the PCRE library, additional control of callout behaviour + is available via escape sequences in the data, as described in the fol- + lowing section. In particular, it is possible to pass in a number as + callout data (the default is zero). If the callout function receives a + non-zero number, it returns that value instead of zero. + + +DATA LINES + + Before each data line is passed to pcre_exec(), leading and trailing + whitespace is removed, and it is then scanned for \ escapes. Some of + these are pretty esoteric features, intended for checking out some of + the more complicated features of PCRE. If you are just testing "ordi- + nary" regular expressions, you probably don't need any of these. The + following escapes are recognized: + + \a alarm (= BEL) + \b backspace + \e escape + \f formfeed + \n newline + \r carriage return + \t tab + \v vertical tab + \nnn octal character (up to 3 octal digits) + \xhh hexadecimal character (up to 2 hex digits) + \x{hh...} hexadecimal character, any number of digits + in UTF-8 mode + \A pass the PCRE_ANCHORED option to pcre_exec() + \B pass the PCRE_NOTBOL option to pcre_exec() + \Cdd call pcre_copy_substring() for substring dd + after a successful match (any decimal number + less than 32) + \Cname call pcre_copy_named_substring() for substring + "name" after a successful match (name termin- + ated by next non alphanumeric character) + \C+ show the current captured substrings at callout + time + \C- do not supply a callout function + \C!n return 1 instead of 0 when callout number n is + reached + \C!n!m return 1 instead of 0 when callout number n is + reached for the nth time + \C*n pass the number n (may be negative) as callout + data + \Gdd call pcre_get_substring() for substring dd + after a successful match (any decimal number + less than 32) + \Gname call pcre_get_named_substring() for substring + "name" after a successful match (name termin- + ated by next non-alphanumeric character) + \L call pcre_get_substringlist() after a + successful match + \M discover the minimum MATCH_LIMIT setting + \N pass the PCRE_NOTEMPTY option to pcre_exec() + \Odd set the size of the output vector passed to + pcre_exec() to dd (any number of decimal + digits) + \S output details of memory get/free calls during matching + \Z pass the PCRE_NOTEOL option to pcre_exec() + \? pass the PCRE_NO_UTF8_CHECK option to + pcre_exec() + + If \M is present, pcretest calls pcre_exec() several times, with dif- + ferent values in the match_limit field of the pcre_extra data struc- + ture, until it finds the minimum number that is needed for pcre_exec() + to complete. This number is a measure of the amount of recursion and + backtracking that takes place, and checking it out can be instructive. + For most simple matches, the number is quite small, but for patterns + with very large numbers of matching possibilities, it can become large + very quickly with increasing length of subject string. + + When \O is used, it may be higher or lower than the size set by the -O + option (or defaulted to 45); \O applies only to the call of pcre_exec() + for the line in which it appears. + + A backslash followed by anything else just escapes the anything else. + If the very last character is a backslash, it is ignored. This gives a + way of passing an empty line as data, since a real empty line termi- + nates the data input. + + If /P was present on the regex, causing the POSIX wrapper API to be + used, only 0 causing REG_NOTBOL and REG_NOTEOL to be passed to + regexec() respectively. + + The use of \x{hh...} to represent UTF-8 characters is not dependent on + the use of the /8 modifier on the pattern. It is recognized always. + There may be any number of hexadecimal digits inside the braces. The + result is from one to six bytes, encoded according to the UTF-8 rules. + + +OUTPUT FROM PCRETEST + + When a match succeeds, pcretest outputs the list of captured substrings + that pcre_exec() returns, starting with number 0 for the string that + matched the whole pattern. Here is an example of an interactive + pcretest run. + + $ pcretest + PCRE version 4.00 08-Jan-2003 + + re> /^abc(\d+)/ + data> abc123 + 0: abc123 + 1: 123 + data> xyz + No match + + If the strings contain any non-printing characters, they are output as + \0x escapes, or as \x{...} escapes if the /8 modifier was present on + the pattern. If the pattern has the /+ modifier, then the output for + substring 0 is followed by the the rest of the subject string, identi- + fied by "0+" like this: + + re> /cat/+ + data> cataract + 0: cat + 0+ aract + + If the pattern has the /g or /G modifier, the results of successive + matching attempts are output in sequence, like this: + + re> /\Bi(\w\w)/g + data> Mississippi + 0: iss + 1: ss + 0: iss + 1: ss + 0: ipp + 1: pp + + "No match" is output only if the first match attempt fails. + + If any of the sequences \C, \G, or \L are present in a data line that + is successfully matched, the substrings extracted by the convenience + functions are output with C, G, or L after the string number instead of + a colon. This is in addition to the normal full list. The string length + (that is, the return from the extraction function) is given in paren- + theses after each string for \C and \G. + + Note that while patterns can be continued over several lines (a plain + ">" prompt is used for continuations), data lines may not. However new- + lines can be included in data by means of the \n escape. + + +AUTHOR + + Philip Hazel + University Computing Service, + Cambridge CB2 3QG, England. + +Last updated: 09 December 2003 +Copyright (c) 1997-2003 University of Cambridge. -- cgit v1.2.3-54-g00ecf