
1. Scripts in this package
--------------------------

The file FindStringsToNewFile.zip contains 4 scripts:

   - FindStringsToNewFile.js
     searches in entire active file for strings with a regular expression
     and outputs only all found strings or even only parts of the found
     strings line by line to a new file. It can be used for small files
     with some KiB and larger files up to a few MiB.

   - FindStringsToNewFileExtended.js
     is like FindStringsToNewFile.js with the difference that it can
     be used also for really large files with many MiB or even GiB.
     This script can't be used with UE for Windows v13.00 and UES v6.20.

   - FindStringsWithLineNumbers.js
     is like FindStringsToNewFile.js with the difference that line number
     information is also output on every line with a found string in the
     new file. This script can be used for small files with some KiB and
     larger files up to a few MiB.

   - FindStringsWithLineNumbersExtended.js
     is like FindStringsWithLineNumbers.js with the difference that it
     can be used also for really large files with many MiB or even GiB.
     This script can't be used with UE for Windows v13.00 and UES v6.20.



2. Regular expression search string
-----------------------------------

The scripts search in entire active file for strings using the RegExp
object of JavaScript. For more information on RegExp object see for
example https://www.w3schools.com/jsref/jsref_obj_regexp.asp

The search is executed always case insensitive (modifiers "gi" used).

The regular expression search string entered during script execution must
be in Perl syntax. But please note that the RegExp object of JavaScript
is not as powerful as the Perl regular expression engine of UltraEdit.
For example lookbehind is not supported by RegExp object of JavaScript.

There are even more differences. For example forward slashes '/' must be
escaped with a backslash character. In FindStringsToNewFile.js and also
in FindStringsToNewFileExtended.js the character ^ is not interpreted as
start of a line and $ is not interpreted as end of a line because a large
string is searched at once and ^ means beginning of string and $ means
end of string to search in. In the other 2 scripts running the regular
expressions search on single lines without the line ending characters
special character ^ is equal start of line and $ is equal end of line.

For example entering \d+ during script execution results in searching in
entire file for integer numbers and output all found numbers to a new file.

With FindStringsWithLineNumbers or FindStringsWithLineNumbersExtended.js it
is possible to mimic a Find in Files on currently active file only by using
^.*STRING OF INTEREST.*$ as regular expression.

The search string can be predefined in the scripts below the introductory
comment if one of the scripts is used always on same type of file with same
regular expression search string in which case entering the search string
on every script execution is just annoying and susceptible to mistakes.

The found strings are written to a new file line by line.

The scripts report with a message box how many strings were found.



3. Regular expression replace string
------------------------------------

Instead of getting just entire found string it also possible to output
to the new file just parts of the found strings with or without adding
additional text.

This can be achieved by using 1 or up to 9 marking (tagging) groups in the
regular expression search string. A marking group is an expression enclosed
in parentheses (round brackets).

The scripts search in the regular expression search string entered by the
user of the script or predefined in the script at top for an opening ( with
next character being not a question mark ? and also not a closing ) and a
closing ). So whenever a pair of opening and closing parentheses with at
least 1 character between is found in the search regular expression, the
scripts ask the script user for the replace string to determine a special
output format.

Non marking groups as often needed in complex regular expressions are
defined by (?:...) which is the reason for ignoring an opening round
bracket with next character being a question mark.

Please note that the script does not check if a pair of parentheses really
builds a valid marking group. Escaped parentheses or round brackets used for
other purposes like lookahead are not detected by the script. The user of
the script should know if the search string contains marking groups or not.

Like the search regular expression also the replace regular expression can
be predefined at top of the scripts after the introductory comment if one
of the scripts is used always on same type of file with same regular
expression replace string.

The string parts in a found string marked by the up to 9 marking groups
are referenced in the replace string with $1, $2, ..., $9 in any order.
Additional characters like the separator character for a CSV file can
be added also to the replace regular expression.



4. Usage notes
--------------

Save always all files before running one of the scripts and best have only
the file opened in UE/UES which should be searched for strings of interest.

Use FindStringsToNewFile.js or FindStringsWithLineNumbers.js by
default for small files with just some KiB or up to about 50 MiB.

It is possible that the standard scripts work also for larger files,
but that cannot be guaranteed as maximum size of input text which can
be processed in one block depends also on size of output which of course
cannot be estimated by the scripts.

A successful termination of a script results always in a message prompt.

If on execution of a standard script within some seconds the small dialog
with Cancel button disappears and there is no new file or more badly UE/UES
crashes, the standard script failed due to a memory problem.

You might see in this case an error message in current output window as
there would be a syntax error in script (see introductory comments in
the scripts).

In such cases use the extended version of the appropriate script.

Please read also the comments in the scripts for more details,
especially the introductory comments at top of every script.

The script shows an error message if the entered expression string is not a
valid regular expression for a JavaScript RegExp object. The error message
contains also the error text from JavaScript core.

Before running the extended scripts on very large files it is good practice
to test first on a small file created by copying a block from the large file
to a new file if the regular expressions you want to use produce the result
you want.

The extended scripts can perhaps not process a really large file in one run
because of some limitations, see chapter 6 with the known limitations. The
script informs the user at end with a message prompt if the script must be
executed once again to continue processing the input file or if the file is
processed finally completely.

If an extended script must be executed once again on a file according to
message prompt at end of script execution, please run simply the script
once again. In case of using UE for Windows >= v14.20 or UES >= v9.00 the
script has stored all variable values determined on first run of the script
on current input file in user clipboard 8 and reloads their values from the
clipboard. So search string, replace string and line number format must be
entered only once.

Using an extended script with UE for Windows < v14.20 or UES < v9.00 requires
setting the caret manually to top of the input file before running the script
the first time on a very large file. Also the input data must be entered on
every script run as with those old versions of UE / UES it is not possible to
remember the variable values in user clipboard 8 as with newer versions of
UE / UES. And instead of getting only one output file, the extended scripts
executed with UE for Windows < v14.20 or UES < v9.00 produce always a new
output file on every script run. Finally the total number of found strings
is just for the current script run instead of all runs on one input file.



5. Some examples
----------------

Here are some examples in question and answer style.

Q: How to get all image references in an HTML file output into a new file?

A: Use script FindStringsToNewFile.js with search string: src=".+?"


Q: But I want only the image references without src="..."?

A: Use script FindStringsToNewFile.js with search string: src="(.+?)"
   and enter as string for the output just: $1


Q: I have a very large CSV file with comma as separator and more than
   2 millions of lines and I want data column 8 and data column 5 in
   this order with line number as first value written to a new CSV file
   if data column 3 contains the word "phone". How to get this output?

A: Use script FindStringsWithLineNumbersExtended.js with search string:

      ^(?:[^,]*,){2}phone,[^,]*,([^,]*),(?:[^,]*,){2}([^,]*,).*$

   and enter as string for the output: $2$1
   and specify for line number format: #,


Q: I want to know which lines contain the number 304 in my log file
   with only 20 MB.

A: Use script FindStringsWithLineNumbers.js with search string: 304


Q: Thanks. But my file is a website access log with the line format
      ip - data/time - "GET URL HTTP/1.1" 304 ... other data ...
   and it would be good to get just the URL in the new file with
   HTTP status code 304. Is that also possible?

A: Yes, use script FindStringsToNewFile.js with search string:

      "GET ([^ ]+).*?" 304

   and enter as string for the output: $1


Q: I use FindStringsToNewFileExtended.js with search string:

      [12][09]\d\d-[01]\d-[0-3]\d [0-5]\d:[0-5]\d:[0-6]\d

   to find date and time strings and write them to a new file. That works
   fine. But I would like the date and time string format changed in the
   output file from yyyy-mm-dd hh:mm:ss to hh:mm:ss dd.mm.yyyy. Is this
   possible with marking groups?

A: Yes, it is. Slightly modify the search string to:

      ([12][09]\d\d)-([01]\d)-([0-3]\d) ([0-5]\d:[0-5]\d:[0-6]\d)

   and specify as replace string for special output: $4 $3.$2.$1



6. Known limitations
--------------------

The scripts require UltraEdit for Windows >= v13.10 or UEStudio >= v6.30
with the message box commands and access to output window.

The two scripts FindStringsToNewFile.js and FindStringsWithLineNumbers.js can
be used also with UE for Windows v13.00 and UES v6.20 with final message of
script printed for the script user to a new file instead of showing a message.

The 2 extended versions can't be used with UE for Windows v13.00 and UES v6.20.

With UE for Windows < v14.20 and UES < v9.00 writing the found strings into
the new file can take very long (from several seconds up to many minutes)
if a large number of strings is found by the script. The write command
of UltraEdit is very slow on large blocks.


The available versions of UltraEdit for Windows and UEStudio were on
release of this package at 2013-04-14:

UE:  19.00.0.1028
UES: 12.20.0.1006

These versions of UE and UES as all previous versions do not manage memory
for selected text well. A selected text is copied to RAM with always 2 bytes
per character if using the read-only "selection" property of the UltraEdit
document object as all 4 scripts need to do. But the string object with
the selected text is not immediately deleted in RAM when the selection is
discarded although not accessible anymore. Instead the once selected text
remains in memory until the script terminates and then all of them are
removed from RAM.

This memory usage behavior for selected text is bad for the two scripts
designed for running on really large files with hundreds of MiB or even
a few GiB. On execution of the script more and more RAM is allocated by
UltraEdit resulting sooner or later in an out of memory situation and a
crash of UE/UES or unexpected termination of the script with an error.

Using an x64 computer with more than 2 GiB RAM is no solution for this
problem as a 32-bit application like UE/UES and the integrated JavaScript
core can allocate only up to 2 GiB of RAM.

To workaround this bad memory usage behavior the two scripts

   FindStringsToNewFileExtended.js
   FindStringsWithLineNumbersExtended.js

just process in one run only 50.000 x 10 respectively 50.000 x 8 lines
at most. Depending on length of the lines this limit should avoid an
out of memory situation during script execution.

Reduce the number of lines per block (variable c_nMaxLinesPerBlock) or the
maximum loop run value (variable c_nMaxLoopRuns) at top of the two scripts
after the introductory comment if the script fails or UE/UES crashes
because of an out of memory situation caused by very long lines.



7. Other notes
--------------

Don't forget that command Find with advanced find option "List Lines
Containing String" can be used also to find entire lines containing
a string.

Further command Find in Files can be used also to get lines with a string
either to output window or a new file. Use the option "Open Files".

Or use first command Copy File Name/Path in menu Edit before opening
Find in Files dialog, select "Files Listed", paste the copied file name
with Ctrl+V into edit field of "In Files/Types", clear "Directory" and
execute the Find searching now only in this file.

The results format in output window or a new file for Find in Files can
be customized under "Advanced - Configuration - Search - Set Find Output
Format".



8. Copyright
------------

The scripts are copyrighted by Mofi for free usage by UE/UES users.

The author cannot be responsible for any damage caused by the scripts.

You use the scripts at your own risk.


PS: 1 KiB = 1024 bytes, 1 MiB = 1024 KiB, 1 GiB is 1024 MiB.
