
DOCUMENTATION  -=- Last edit Time-stamp: <01/07/2013 14:15 Sebastien@PC-SEKI>
=============

This is the documentation for uo_regex (pbni C++ extension) and unvo_regexp
(plain pbscript version).

This code is released under the MIT license that grants the reusability of
the code for any purpose and is GPL-compatible with less restrictions.
Please read license.txt for the details.

Preamble
========

The unvo_regexp userobject was first written to map the vbscript.regex OLE
object into a object useable in PB10. The possibilities (and limitations ;)
are those offered by the vbscript object. No less, no more.

The uo_regex (C++ PBNI extension) was later written as an attempt to get the
same functionnalities in a compiled extension without the bottleneck of
repetitive OLE objects instanciations. As the unvo_regexp was already
heavily used in a project and the uo_regex was planned as a replacement for
it, the uo_regex mimics by design the unvo_regexp.


Quick start
===========

unvo_regexp regex
regex = create unvo_regexp
regex.initialize('[0-9]+', true, true)
if regex.test('foobar 42') then messagebox('test','match!')
or 
if regex.search('foobar 42') then messagebox('search','match : ' + regex.match(1))
destroy unvo_regexp


List of the available methods & properties for unvo_regexp (powerscript + vbscrip.regex)
==========================================================

Properties
----------

boolean globalScope : tells if the regex search and/or replaces in global mode or not 
		      (default = true)

boolean ignoreCase : tells if the matching is case sensitive or not 
		     (default = true = case INsensitive)

string searchPattern : stores the regexp pattern

Methods
-------

initialize(string as_pattern, boolean ab_globalscope, boolean ab_casesensitive) 
   Sets the global, ignoreCase & pattern attributes of the ole regex. 
   You MUST call initialize() to compile the search pattern before actual search.

initialize()
   Same as the previous initialize, reuses the instance variables as arguments

test( string as_teststring ) 
   Tells only if the given string matches the regexp.

search( string as_searchstring )
   Look for the previously compiled regexp into the given search string. The
   right name for this method should be 'execute' but it is a reserved name
   in powerscripts ;)
   Returns : the number of matches, or -1
   Remark : if you don't search in global mode, there can be only 0 or 1 match.

matchCount( )  
   Returns the number of matches in the previous search() 

match( long al_index )  
   Returns the content of the given match. The first match has index 1.

matchPosition( long al_index )
   Returns the offset of the given match or -1 if al_index is wrong.

matchLength( long al_index )
   Returns the length of the given match or -1 if al_index is wrong.

replace( string as_searchstring, string as_replacestring )
   Returns the result of the replacement of the pattern by the
   as_replacestring into the as_searchstring. If no match occurs, the
   original string is returned.

groupcount( long al_matchindex )  
   When using the pattern grouping, returns the number of groups in the
   given match.

group( long al_matchindex, long al_groupindex )
   When using the pattern grouping, returns the content of the given group of
   the given match.

List of the available methods for uo_regex (C++ wrapper for PCRE)
==========================================

pcreversion()
   Returns the version of PCRE wrapped in the pbx

getversion()
   Returns the short form of the PbniRegex version

getversionfull()
   Returns the full form of the PbniRegex version, including the build timestamp

The following methods and functions give the same results than with the unvo_regexp :
initialize() - see the remark below about error codes
test( string as_teststring ) 
search( string as_searchstring )
matchCount()  
match( long al_index ) 
     (note that a null string might be returned if the regex was not initialized)
matchPosition( long al_index )
matchLength( long al_index )
groupcount( long al_matchindex )  
group( long al_matchindex, long al_groupindex ) 
     (note that a null string might be returned if the regex was not initialized)

  remark: the whole match will be in group #0 to mimick the vbscript behavior

Initialize() errors
-------------------
If the compiling of the regexp fails, an error message is outputted via the
OutputDebugString API and can be seen e.g. with SysInternal DebugView tool
or MS Visual Studio.

Replace
-------
replace( string as_searchstring, string as_replacestring )

The PCRE engine can only search and does not provides replacing
functionality, therefore I have implemented my own replace method. 
While the replace func has the same prototype than the vbscript one, it has
been extended and has the possiblity to uses the perl-like backslash-number
notation (\1, \2 .. \n) to reuse for replacement the matched groups.
\0 returns the whole match.
Since 1.3.1: replace() supports named groups with the perlish notation $+{name}

The following functions are in addition to the unvo_regexp ones :

group(long al_matchindex, string as_name)
   Returns a group by its name

groupposition(long al_matchindex, long al_groupindex)
   Returns the offset of the given group in the given match.

groupposition(long al_matchindex, string as_name)
   Returns the offset of the named group in the given match.

grouplength(long al_matchindex, long al_groupindex)
   Returns the length of the given group in the given match.

grouplength(long al_matchindex, string as_name)
   Returns the length of the named group in the given match.

setmultiline(boolean ismulti)
   Sets the possibilty to match the EOL character with '.'

ismultiline()
   Returns the state of the multiline setting.

setutf8(boolean isutf) - DEPRECATED
   Sets the possibilty to use utf-8 strings for pattern & search strings
   NOTE : pbniregex now ALWAYS use utf-8 encoding internally.
   This method does nothing.

isutf8() - DEPRECATED
   Returns the state of the utf-8 processing.
   NOTE : pbniregex now ALWAYS use utf-8 encoding internally.
   This method always returns true.

study()
   Proceed to a study of the regexp in ordrer to speed up (if possible)
   repetitive use of it.

   From the PCRE documentation :
    At present, studying a pattern is useful only for non-anchored patterns
   that do not have a single fixed starting character. A bitmap of possible
   starting bytes is created. 

getdotmatchnewline()
   Tells if the dot char '.' matches on newlines or not.

setdotmatchnewline(boolean match)
   Sets the matching of the newline characters.

setextendedsyntax(boolean extended)
   Set the compliance with the perl extended syntax :
    whitespace data characters in the pattern are totally ignored except
   when escaped or inside a character class. Whitespace does not include the
   VT character (code 11). In addition, characters between an unescaped #
   outside a character class and the next newline, inclusive, are also
   ignored. This is equivalent to Perl's /x option, and it can be changed
   within a pattern by a (?x) option setting.

getextendedsyntax()
   Tells if the regexen can use the perl extended syntax.

setungreedy(boolean greedy)
   Sets the UNgreediness of the regex

getungreedy()
   Returns the state of the regex UNgreediness.

getnameduplicates()
   Returns the state of the regex usage of duplicates in named capturing groups.
   Note that the flag may change if either setnameduplicates(true) is called
   or the option (?J) is used in the pattern.

setnameduplicates(boolean allow_dup)
   Set the flag that allow duplices in named capturing groups

getpattern()
   Returns the pattern that was given to initialize()

getLastError()
   Returns the last error message, when a pattern failed to compile

stringTest()
   It is only a debugging function to return the given string.
   Useful to track down string encoding and / or conversion problems.

Named groups
------------

Starting with release 1.3.1, capturing groups can be named.
 - a named capturing group is declared by (?<name>) or (?'name')
 - inside a pattern, a named group can be refered by \k<name>, \k'name', \g<name>, \g'name'
 - the replace() method can use reference to named groups with $+{name} in addition to the 
   \1, \2, \n notation.
   You can have duplicated names in groups, provided that only one group is matching at a time
   (or you will have unwanted results - the last named group that is matching is used)

   
Global functions
----------------

In order to replace some powerscript written functions by native code, the
PbniRegex.pbx also provides the followin global functions :

fastreplaceall(string as_source, string as_pattern, string as_replace)
   Returns the resulting string of the replacement of as_pattern by
   as_replace into as_source. If the as_pattern is not found, returns the
   as_source input string.
   Remark : the search is ALWAYS case sensitive.

fastreplaceall2(string as_source, string as_pattern, string as_replace, boolean ab_casesensitive)
   Same as fastreplaceall but with the possibility to specify if the search is case-sensitive or not.


;;----------------------------------------------------------------------------
;; Local Variables:
;; mode:text
;; fill-column:76
;; indent-tabs-mode:nil
;; tab-stop-list:(2 4 6 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120)
;; time-stamp-format: "%02d/%02m/%:y %02H:%02M %u@%s"
;; End:
