Misplaced Pages

BSR

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Perl Compatible Regular Expressions ( PCRE ) is a library written in C , which implements a regular expression engine, inspired by the capabilities of the Perl programming language . Philip Hazel started writing PCRE in summer 1997. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors (BRE, ERE) and than that of many other regular-expression libraries.

#317682

39-807: BSR may refer to: Backslash-R, a class of options in Perl Compatible Regular Expressions Basrah International Airport , IATA code Vasai Road railway station , Mumbai, India, station code Birmingham Sound Reproducers or BSR McDonald, a former UK audio manufacturer Bit Scan Reverse, find first set x86 instruction Bootstrap Router in Protocol Independent Multicast Brain stimulation reward The British School at Rome Brown Student/Community Radio , Providence, RI, US Brussels Sound Revolution ,

78-399: A stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space , often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than

117-692: A Belgian new beat band, best known for the single Qui? (1989). Building Safety Regulator a part of the Health and Safety Executive of the UK Government responsible for building safety in England Beijing Subway Rolling Stock Equipment , a rolling stock manufacturer based in Beijing, China Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with

156-464: A backreference provides a mechanism to refer to that part of the subject that has previously matched a subpattern, a subroutine provides a mechanism to reuse an underlying previously defined subpattern. The subpattern's options, such as case independence, are fixed when the subpattern is defined. (a.c)(?1) would match "aacabc" or "abcadc", whereas using a backreference (a.c)\1 would not, though both would match "aacaac" or "abcabc". PCRE also supports

195-479: A backslash typically gives it a special meaning. In the case where the sequence has not been defined to be special, an error occurs. This is different to Perl, which gives an error only if it is in warning mode (PCRE2 does not have a warning mode). In basic POSIX regular expressions, sometimes backslashes escaped non-alpha-numerics (e.g. \. ), and sometimes they introduced a special feature (e.g. \(\) ). Single-letter character classes are supported in addition to

234-456: A few kilobytes should be allocated dynamically instead of as a local variable. An example of a very large stack variable in C : On a C implementation with 8 byte double-precision floats , the declared array consumes 8 megabytes of data; if this is more memory than is available on the stack (as set by thread creation parameters or operating system limits), a stack overflow will occur. Stack overflows are made worse by anything that reduces

273-523: A horizontal ellipsis, and if encountered while the ANY newline is in effect, it would trigger newline processing. See below for configuration and options concerning what matches backslash-R. When PCRE is compiled, a default is selected for what matches \R . The default can be either to match the linebreaks corresponding to ANYCRLF or those corresponding to ANY. The default can be overridden when necessary by including (*BSR_UNICODE) or (*BSR_ANYCRLF) at

312-406: A lookbehind assertion to be the same length as each other, whereas PCRE allows those alternative branches to have different lengths from each other as long as each branch still has a fixed length. Such as (??{...}) (a callback whose return is evaluated as being part of the pattern) nor the (?{}) construct, although the latter can be emulated using (?Cn) . Recursion control verbs added in

351-429: A loop like on the right side. A function like the example above on the left would not be a problem in an environment supporting tail-call optimization ; however, it is still possible to create a recursive function that may result in a stack overflow in these languages. Consider the example below of two simple integer exponentiation functions. Both pow(base, exp) functions above compute an equivalent result, however,

390-467: A non-Perl Oniguruma construct for subroutines. They are specified using \g<subpat-number> or \g<subpat-name> . Atomic grouping is a way of preventing backtracking in a pattern. For example, a++bc will match as many "a"s as possible and never back up to try one less. Patterns may assert that previous text or subsequent text contains a pattern without consuming matched text (zero-width assertion). For example, / \w+(?=\t) / matches

429-665: A pattern. Differences between PCRE2 and Perl (as of Perl 5.9.4) include but are not limited to: This meant that "<<!>!>!>><>>!>!>!>" =~ /^(<(?:[^<>]+|(?3)|(?1))*>)()(!>!>!>)$ / would match in Perl but not in PCRE2 until release 10.30. In Perl "aba" =~ /^(a(b)?)+$ / ; will result in $ 1 containing "a" and $ 2 containing undef , but in PCRE will result in $ 2 containing "b". This means that \g{}

SECTION 10

#1732771787318

468-624: A word followed by a tab , without including the tab itself. Look-behind assertions cannot be of uncertain length though (unlike Perl) each branch can be a different fixed length. \K can be used in a pattern to reset the start of the current whole match. This provides a flexible alternative approach to look-behind assertions because the discarded part of the match (the part that precedes \K ) need not be fixed in length. E.g. \b for matching zero-width "word boundaries", similar to (? < = \ W )(?= \ w )|(? < = \ w )(?= \ W )|^|$ . A comment begins with (?# and ends at

507-444: Is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow ), the stack is said to overflow , typically resulting in a program crash . The most-common cause of stack overflow is excessively deep or infinite recursion, in which a function calls itself so many times that the space needed to store the variables and information associated with each call

546-518: Is considered obsolete, and the current 8.45 release is likely to be the last. The new PCRE2 code (the 10.xx series) has had a number of extensions and coding improvements and is where development takes place. A number of prominent open-source programs , such as the Apache and Nginx HTTP servers, and the PHP and R scripting languages, incorporate the PCRE library; proprietary software can do likewise, as

585-600: Is more than can fit on the stack. An example of infinite recursion in C . The function foo , when it is invoked, continues to invoke itself, allocating additional space on the stack each time, until the stack overflows resulting in a segmentation fault . However, some compilers implement tail-call optimization , allowing infinite recursion of a specific sort— tail recursion —to occur without stack overflow. This works because tail-recursion calls do not take up additional stack space. Some C compiler options will effectively enable tail-call optimization ; for example, compiling

624-481: Is passed to its following invocation. As no other information outside of the current function invocation must be stored, a tail-recursion optimizer can "drop" the prior stack frames, eliminating the possibility of a stack overflow. The other major cause of a stack overflow results from an attempt to allocate more memory on the stack than will fit, for example by creating local array variables that are too large. For this reason some authors recommend that arrays larger than

663-451: Is set). It also affects PCRE matching procedure (since version 7.0): when an unanchored pattern fails to match at the start of a newline sequence, PCRE advances past the entire newline sequence before retrying the match. If the newline option alternative in effect includes CRLF as one of the valid linebreaks, it does not skip the \n in a CRLF if the pattern contains specific \r or \n references (since version 7.3). Since version 8.10,

702-492: Is slower than the normal ( ASCII -only) non-UCP alternative. Note that the UCP option requires the library to have been built to include Unicode support (this is the default for PCRE2). Very early versions of PCRE1 supported only ASCII code. Later, UTF-8 support was added. Support for UTF-16 was added in version 8.30, and support for UTF-32 in version 8.32. PCRE2 has always supported all three UTF encodings. ^ and $ can match at

741-439: Is unambiguous in Perl, but potentially ambiguous in PCRE. This is no longer a difference since PCRE 8.34 (released on 2013-12-15), which no longer allows group names to start with a digit. Within lookbehind assertions, both PCRE and Perl require fixed-length patterns. That is, both PCRE and Perl disallow variable-length patterns using quantifiers within lookbehind assertions. However, Perl requires all alternative branches of

780-427: The (*UTF) option at the beginning of a pattern can be used instead of setting an external option to invoke UTF-8, UTF-16, or UTF-32 mode. A pattern may refer back to the results of a previous match. For example, (a|b)c\1 would match either "aca" or "bcb" and would not match, for example, "acb". A sub-pattern (surrounded by parentheses, like (...) ) may be named by including a leading ?P<name> after

819-517: The PCRE2 library is built. Large performance benefits are possible when (for example) the calling program utilizes the feature with compatible patterns that are executed repeatedly. The just-in-time compiler support was written by Zoltan Herczeg and is not addressed in the POSIX wrapper. The use of the system stack for backtracking can be problematic in PCRE1, which is why this feature of the implementation

SECTION 20

#1732771787318

858-422: The Perl 5.9.x series are also not supported. Support for experimental backtracking control verbs (added in Perl 5.10) is available in PCRE since version 7.3. They are (*FAIL) , (*F) , (*PRUNE) , (*SKIP) , (*THEN) , (*COMMIT) , and (*ACCEPT) . Perl's corresponding use of arguments with backtracking control verbs is not generally supported. Note however that since version 8.10, PCRE supports

897-468: The above simple program using gcc with -O1 will result in a segmentation fault, but not when using -O2 or -O3 , since these optimization levels imply the -foptimize-sibling-calls compiler option. Other languages, such as Scheme , require all implementations to include tail-recursion as part of the language standard. A recursive function that terminates in theory but causes a call stack buffer overflow in practice can be fixed by transforming

936-402: The beginning and end of a string only, or at the start and end of each "line" within the string, depending on what options are set. When PCRE is compiled, a newline default is selected. Which newline/linebreak is in effect affects where PCRE detects ^ line beginnings and $ ends (in multiline mode), as well as what matches dot (regardless of multiline mode, unless the dotall option (?s)

975-457: The compile option PCRE2_UCP is set. The option can be set for a pattern by including (*UCP) at the start of pattern. The option alters behavior of the following metacharacters: \B , \b , \D , \d , \S , \s , \W , \w , and some of the POSIX character classes. For example, the set of characters matched by \w (word characters) is expanded to include letters and accented letters as defined by Unicode properties. Such matching

1014-483: The entire string. If the U flag is set, then quantifiers are ungreedy (lazy) by default, while ? makes them greedy. Unicode defines several properties for each character. Patterns in PCRE2 can match these properties: e.g. \ p { Ps } .*? \ p { Pe } would match a string beginning with any "opening punctuation" and ending with any "close punctuation" such as [abc] . Matching of certain "normal" metacharacters can be driven by Unicode properties when

1053-454: The following verbs with a specified argument: (*MARK:markName) , (*SKIP:markName) , (*PRUNE:markName) , and (*THEN:markName) . Since version 10.32 PCRE2 has supported (*ACCEPT:markName) , (*FAIL:markName) , and (*COMMIT:markName) . Perl allows quantifiers on the (?!...) construct, which is meaningless but harmless (albeit inefficient); PCRE produces an error in versions before 8.13. Stack overflow In software,

1092-460: The library is BSD-licensed. As of Perl 5.10, PCRE is also available as a replacement for Perl's default regular-expression engine through the re::engine::PCRE module. The library can be built on Unix, Windows, and several other environments. PCRE2 is distributed with a POSIX C wrapper, several test programs, and the utility program pcregrep / pcre2grep that is built in tandem with the library. The just-in-time compiler can be enabled when

1131-401: The longer POSIX names. For example, \d matches any digit exactly as [[:digit:]] would in POSIX regular expressions. A ? may be placed after any repetition quantifier to indicate that the shortest match should be used. The default is to attempt the longest match first and backtrack through shorter matches: e.g. a.*?b would match first "ab" in "ababab", where a.*b would match

1170-426: The metacharacter \N always matches any character other than linebreak characters. It has the same behavior as . when the dotall option aka (?s) is not in effect. The newline option can be altered with external options when PCRE is compiled and when it is run. Some applications using PCRE provide users with the means to apply this setting through an external option. So the newline option can also be stated at

1209-464: The next closing parenthesis. A pattern can refer back to itself recursively or to any subpattern. For example, the pattern \ (( a *|(? R ))* \ ) will match any combination of balanced parentheses and "a"s. PCRE expressions can embed (?C n ) , where n is some number. This will call out to an external user-defined function through the PCRE API and can be used to embed arbitrary code in

BSR - Misplaced Pages Continue

1248-486: The one on the left is prone to causing a stack overflow because tail-call optimization is not possible for this function. During execution, the stack for these functions will look like this: Notice that the function on the left must store in its stack exp number of integers, which will be multiplied when the recursion terminates and the function returns 1. In contrast, the function at the right must only store 3 integers at any time, and computes an intermediary result which

1287-423: The opening parenthesis. Named subpatterns are a feature that PCRE adopted from Python regular expressions. This feature was subsequently adopted by Perl, so now named groups can also be defined using (?<name>...) or (?'name'...) , as well as (?P<name>...) . Named groups can be backreferenced with, for example: (?P=name) (Python syntax) or \k'name' (Perl syntax). While

1326-453: The recursion into a loop and storing the function arguments in an explicit stack (rather than the implicit use of the call stack). This is always possible because the class of primitive recursive functions is equivalent to the class of LOOP computable functions. Consider this example in C++ -like pseudocode: A primitive recursive function like the one on the left side can always be transformed into

1365-473: The start of the pattern using one of the following: When not in UTF-8 mode, corresponding linebreaks can be matched with (?: \ r \ n ?| \ n | \ x0B | \ f | \ x85 ) or \R . In UTF-8 mode, two additional characters are recognized as line breaks with (*ANY) : On Windows, in non-Unicode data, some of the ANY linebreak characters have other meanings. For example, \x85 can match

1404-557: The start of the pattern. A (* newline ) option can be provided in addition to a (*BSR..) option, e.g., (*BSR_UNICODE)(*ANY) rest-of-pattern . The backslash-R options also can be changed with external options by the application calling PCRE2, when a pattern is compiled. Linebreak options such as (*LF) documented above; backslash-R options such as (*BSR_ANYCRLF) documented above; Unicode Character Properties option (*UCP) documented above; (*UTF8) option documented as follows: if PCRE2 has been compiled with UTF support,

1443-522: The title BSR . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=BSR&oldid=1239745757 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Perl Compatible Regular Expressions While PCRE originally aimed at feature-equivalence with Perl,

1482-406: The two implementations are not fully equivalent. During the PCRE 7.x and Perl 5.9.x phase, the two projects coordinated development, with features being ported between them in both directions. In 2015, a fork of PCRE was released with a revised programming interface (API). The original software, now called PCRE1 (the 1.xx–8.xx series), has had bugs mended, but no further development. As of 2020 , it

1521-444: Was changed in PCRE2. The heap is now used for this purpose, and the total amount can be limited. The problem of stack overflow , which came up regularly with PCRE1, is no longer an issue with PCRE2 from release 10.30 (2017). Like Perl, PCRE2 has consistent escaping rules: any non-alpha-numeric character may be escaped to mean its literal value by prefixing a \ (backslash) before the character. Any alpha-numeric character preceded by

#317682