Nirvana Editor (NEdit) Help Documentation

Table of Contents

    Getting Started                    

    Basic Operation                     Macro/Shell Extensions
      Selecting Text                      Shell Commands and Filters
      Finding and Replacing Text          Learn/Replay
      Cut and Paste                       Macro Language
      Using the Mouse                     Macro Subroutines
      Keyboard Shortcuts                  Highlighting Information
      Shifting and Filling                Range Sets
      Tabbed Editing                      Action Routines
      File Format                         
                                        Customizing
    Features for Programming              Customizing NEdit
      Programming with NEdit              Preferences
      Tabs/Emulated Tabs                  X Resources
      Auto/Smart Indent                   Key Binding
      Syntax Highlighting                 Highlighting Patterns
      Finding Declarations (ctags)        Smart Indent Macros
      Calltips

    Regular Expressions                 NEdit Command Line
      Basic Regular Expression Syntax   Client/Server Mode
      Metacharacters                    Crash Recovery
      Parenthetical Constructs          Version
      Advanced Topics                   GNU General Public License
      Example Regular Expressions       Mailing Lists
                                        Problems/Defects

Getting Started

Welcome to NEdit!

NEdit is a standard GUI (Graphical User Interface) style text editor for programs and plain-text files. Users of Macintosh and MS Windows based text editors should find NEdit a familiar and comfortable environment. NEdit provides all of the standard menu, dialog, editing, and mouse support, as well as all of the standard shortcuts to which the users of modern GUI based environments are accustomed. For users of older style Unix editors, welcome to the world of mouse-based editing!

Help sections of interest to new users are listed under the "Basic Operation" heading in the top-level Help menu:

      Selecting Text
      Finding and Replacing Text
      Cut and Paste
      Using the Mouse
      Keyboard Shortcuts
      Shifting and Filling

Programmers should also read the introductory section under the "Features for Programming" section:

      Programming with NEdit

If you get into trouble, the Undo command in the Edit menu can reverse any modifications that you make. NEdit does not change the file you are editing until you tell it to Save.

Editing an Existing File

To open an existing file, choose Open... from the file menu. Select the file that you want to open in the pop-up dialog that appears and click on OK. You may open any number of files at the same time. Depending on your settings (cf. "Tabbed Editing") each file can appear in its own editor window, or it can appear under a tab in the same editor window. Using Open... rather than re-typing the NEdit command and running additional copies of NEdit, will give you quick access to all of the files you have open via the Windows menu, and ensure that you don't accidentally open the same file twice. NEdit has no "main" window. It remains running as long as at least one editor window is open.

Creating a New File

If you already have an empty (Untitled) window displayed, just begin typing in the window. To create a new Untitled window, choose New Window or New Tab from the File menu. To give the file a name and save its contents to the disk, choose Save or Save As... from the File menu.

Backup Files

NEdit maintains periodic backups of the file you are editing so that you can recover the file in the event of a problem such as a system crash, network failure, or X server crash. These files are saved under the name `~filename` (on Unix) or `_filename` (on VMS), where filename is the name of the file you were editing. If an NEdit process is killed, some of these backup files may remain in your directory. (To remove one of these files on Unix, you may have to prefix the `~' (tilde) character with a (backslash) to prevent the shell from interpreting it as a special character.)

Shortcuts

As you become more familiar with NEdit, substitute the control and function keys shown on the right side of the menus for pulling down menus with the mouse.

Dialogs are also streamlined so you can enter information quickly and without using the mouse*. To move the keyboard focus around a dialog, use the tab and arrow keys. One of the buttons in a dialog is usually drawn with a thick, indented, outline. This button can be activated by pressing Return or Enter. The Cancel or Dismiss button can be activated by pressing escape. For example, to replace the string "thing" with "things" type:

      <ctrl-r>thing<tab>things<return>

To open a file named "whole_earth.c", type:

      <ctrl-o>who<return>

(how much of the filename you need to type depends on the other files in the directory). See the section called "Keyboard Shortcuts" for more details.

* Users who have set their keyboard focus mode to "pointer" should set "Popups Under Pointer" in the Default Settings menu to avoid the additional step of moving the mouse into the dialog.


Basic Operation

Selecting Text

NEdit has two general types of selections, primary (highlighted text), and secondary (underlined text). Selections can cover either a simple range of text between two points in the file, or they can cover a rectangular area of the file. Rectangular selections are only useful with non-proportional (fixed spacing) fonts.

To select text for copying, deleting, or replacing, press the left mouse button with the pointer at one end of the text you want to select, and drag it to the other end. The text will become highlighted. To select a whole word, double click (click twice quickly in succession). Double clicking and then dragging the mouse will select a number of words. Similarly, you can select a whole line or a number of lines by triple clicking or triple clicking and dragging. Quadruple clicking selects the whole file. After releasing the mouse button, you can still adjust a selection by holding down the shift key and dragging on either end of the selection. To delete the selected text, press delete or backspace. To replace it, begin typing.

To select a rectangle or column of text, hold the Ctrl key while dragging the mouse. Rectangular selections can be used in any context that normal selections can be used, including cutting and pasting, filling, shifting, dragging, and searching. Operations on rectangular selections automatically fill in tabs and spaces to maintain alignment of text within and to the right of the selection. Note that the interpretation of rectangular selections by Fill Paragraph is slightly different from that of other commands, the section titled "Shifting and Filling" has details.

The middle mouse button can be used to make an additional selection (called the secondary selection). As soon as the button is released, the contents of this selection will be copied to the insert position of the window where the mouse was last clicked (the destination window). This position is marked by a caret shaped cursor when the mouse is outside of the destination window. If there is a (primary) selection, adjacent to the cursor in the window, the new text will replace the selected text. Holding the shift key while making the secondary selection will move the text, deleting it at the site of the secondary selection, rather than copying it.

Selected text can also be dragged to a new location in the file using the middle mouse button. Holding the shift key while dragging the text will copy the selected text, leaving the original text in place. Holding the control key will drag the text in overlay mode.

Normally, dragging moves text by removing it from the selected position at the start of the drag, and inserting it at a new position relative to to the mouse. Dragging a block of text over existing characters, displaces the characters to the end of the selection. In overlay mode, characters which are occluded by blocks of text being dragged are simply removed. When dragging non-rectangular selections, overlay mode also converts the selection to rectangular form, allowing it to be dragged outside of the bounds of the existing text.

The section "Using the Mouse" summarizes the mouse commands for making primary and secondary selections. Primary selections can also be made via keyboard commands, see "Keyboard Shortcuts".


Finding and Replacing Text

The Search menu contains a number of commands for finding and replacing text.

The Find... and Replace... commands present dialogs for entering text for searching and replacing. These dialogs also allow you to choose whether you want the search to be sensitive to upper and lower case, or whether to use the standard Unix pattern matching characters (regular expressions). Searches begin at the current text insertion position.

Find Again and Replace Again repeat the last find or replace command without prompting for search strings. To selectively replace text, use the two commands in combination: Find Again, then Replace Again if the highlighted string should be replaced, or Find Again again to go to the next string.

Find Selection searches for the text contained in the current primary selection (see Selecting Text). The selected text does not have to be in the current editor window, it may even be in another program. For example, if the word dog appears somewhere in a window on your screen, and you want to find it in the file you are editing, select the word dog by dragging the mouse across it, switch to your NEdit window and choose Find Selection from the Search menu.

Find Incremental, which opens the interactive search bar, is yet another variation on searching, where every character typed triggers a new search. After you've completed the search string, the next occurrence in the buffer is found by hitting the Return key, or by clicking on the icon to the left (magnifying glass). Holding a Shift key down finds the previous occurrences. To the right there is a clear button with an icon resembling "|<". Clicking on it empties the search text widget without disturbing selections. A middle click on the clear button copies the content of any existing selection into the search text widget and triggers a new search.

Searching Backwards

Holding down the shift key while choosing any of the search or replace commands from the menu (or using the keyboard shortcut), will search in the reverse direction. Users who have set the search direction using the buttons in the search dialog, may find it a bit confusing that Find Again and Replace Again don't continue in the same direction as the original search (for experienced users, consistency of the direction implied by the shift key is more important).

Selective Replacement

To replace only some occurrences of a string within a file, choose Replace... from the Search menu, enter the string to search for and the string to substitute, and finish by pressing the Find button. When the first occurrence is highlighted, use either Replace Again (^T) to replace it, or Find Again (^G) to move to the next occurrence without replacing it, and continue in such a manner through all occurrences of interest.

To replace all occurrences of a string within some range of text, select the range (see Selecting Text), choose Replace... from the Search menu, type the string to search for and the string to substitute, and press the "R. in Selection" button in the dialog. Note that selecting text in the Replace... dialog will unselect the text in the window.


Cut and Paste

The easiest way to copy and move text around in your file or between windows, is to use the clipboard, an imaginary area that temporarily stores text and data. The Cut command removes the selected text (see Selecting Text) from your file and places it in the clipboard. Once text is in the clipboard, the Paste command will copy it to the insert position in the current window. For example, to move some text from one place to another, select it by dragging the mouse over it, choose Cut to remove it, click the pointer to move the insert point where you want the text inserted, then choose Paste to insert it. Copy copies text to the clipboard without deleting it from your file. You can also use the clipboard to transfer text to and from other Motif programs and X programs which make proper use of the clipboard.

There are many other methods for copying and moving text within NEdit windows and between NEdit and other programs. The most common such method is clicking the middle mouse button to copy the primary selection (to the clicked position). Copying the selection by clicking the middle mouse button in many cases is the only way to transfer data to and from many X programs. Holding the Shift key while clicking the middle mouse button moves the text, deleting it from its original position, rather than copying it. Other methods for transferring text include secondary selections, primary selection dragging, keyboard-based selection copying, and drag and drop. These are described in detail in the sections: "Selecting Text", "Using the Mouse", and "Keyboard Shortcuts".


Using the Mouse

Mouse-based editing is what NEdit is all about, and learning to use the more advanced features like secondary selections and primary selection dragging will be well worth your while.

If you don't have time to learn everything, you can get by adequately with just the left mouse button: Clicking the left button moves the cursor. Dragging with the left button makes a selection. Holding the shift key while clicking extends the existing selection, or begins a selection between the cursor and the mouse. Double or triple clicking selects a whole word or a whole line.

This section will make more sense if you also read the section called, "Selecting Text", which explains the terminology of selections, that is, what is meant by primary, secondary, rectangular, etc.

Button and Modifier Key Summary

General meaning of mouse buttons and modifier keys:

Buttons

      Button 1 (left)    Cursor position and primary selection

      Button 2 (middle)  Secondary selections, and dragging and
                         copying the primary selection

      Button 3 (right)   Quick-access programmable menu and pan
                         scrolling

Modifier keys

      Shift   On primary selections, (left mouse button):
                 Extends selection to the mouse pointer
              On secondary and copy operations, (middle):
                 Toggles between move and copy

      Ctrl    Makes selection rectangular or insertion
              columnar

      Alt*    (on release) Exchange primary and secondary
              selections.

Left Mouse Button

The left mouse button is used to position the cursor and to make primary selections.

    Click         Moves the cursor

    Double Click  Selects a whole word

    Triple Click  Selects a whole line

    Quad Click    Selects the whole file

    Shift Click   Adjusts (extends or shrinks) the
                  selection, or if there is no existing
                  selection, begins a new selection
                  between the cursor and the mouse.

    Ctrl+Shift+   Adjusts (extends or shrinks) the
    Click         selection rectangularly.

    Drag          Selects text between where the mouse
                  was pressed and where it was released.

    Ctrl+Drag     Selects rectangle between where the
                  mouse was pressed and where it was
                  released.

Right Mouse Button

The right mouse button posts a programmable menu for frequently used commands.

    Click/Drag    Pops up the background menu (programmed
                  from Preferences -> Default Settings ->
                  Customize Menus -> Window Background).

    Ctrl+Drag     Pan scrolling.  Scrolls the window
                  both vertically and horizontally, as if
                  you had grabbed it with your mouse.

Middle Mouse Button

The middle mouse button is for making secondary selections, and copying and dragging the primary selection.

    Click         Copies the primary selection to the
                  clicked position.

    Shift+Click   Moves the primary selection to the
                  clicked position, deleting it from its
                  original position.

    Drag          1) Outside of the primary selection:
                      Begins a secondary selection.
                  2) Inside of the primary selection:
                      Moves the selection by dragging.

    Ctrl+Drag     1) Outside of the primary selection:
                      Begins a rectangular secondary
                      selection.
                  2) Inside of the primary selection:
                      Drags the selection in overlay
                      mode (see below).

When the mouse button is released after creating a secondary selection:

    No Modifiers  If there is a primary selection,
                  replaces it with the secondary
                  selection.  Otherwise, inserts the
                  secondary selection at the cursor
                  position.

    Shift         Move the secondary selection, deleting
                  it from its original position.  If
                  there is a primary selection, the move
                  will replace the primary selection
                  with the secondary selection.
                  Otherwise, moves the secondary
                  selection to to the cursor position.

    Alt*          Exchange the primary and secondary
                  selections.

While moving the primary selection by dragging with the middle mouse button:

    Shift         Leaves a copy of the original
                  selection in place rather than
                  removing it or blanking the area.

    Ctrl          Changes from insert mode to overlay
                  mode (see below).

    Escape        Cancels drag in progress.

Overlay Mode: Normally, dragging moves text by removing it from the selected position at the start of the drag, and inserting it at a new position relative to to the mouse. When you drag a block of text over existing characters, the existing characters are displaced to the end of the selection. In overlay mode, characters which are occluded by blocks of text being dragged are simply removed. When dragging non-rectangular selections, overlay mode also converts the selection to rectangular form, allowing it to be dragged outside of the bounds of the existing text.

Mouse buttons 4 and 5 are usually represented by a mouse wheel nowadays. They are used to scroll up or down in the text window.

* The Alt key may be labeled Meta or Compose-Character on some keyboards. Some window managers, including default configurations of mwm, bind combinations of the Alt key and mouse buttons to window manager operations. In NEdit, Alt is only used on button release, so regardless of the window manager bindings for Alt-modified mouse buttons, you can still do the corresponding NEdit operation by using the Alt key AFTER the initial mouse press, so that Alt is held while you release the mouse button. If you find this difficult or annoying, you can re-configure most window managers to skip this binding, or you can re-configure NEdit to use a different key combination.


Keyboard Shortcuts

Most of the keyboard shortcuts in NEdit are shown on the right hand sides of the pull-down menus. However, there are more which are not as obvious. These include; dialog button shortcuts; menu and dialog mnemonics; labeled keyboard keys, such as the arrows, page-up, page-down, and home; and optional Shift modifiers on accelerator keys, like [Shift]Ctrl+F.

Menu Accelerators

Pressing the key combinations shown on the right of the menu items is a shortcut for selecting the menu item with the mouse. Some items have the shift key enclosed in brackets, such as [Shift]Ctrl+F. This indicates that the shift key is optional. In search commands, including the shift key reverses the direction of the search. In Shift commands, it makes the command shift the selected text by a whole tab stop rather than by single characters.

Menu Mnemonics

Pressing the Alt key in combination with one of the underlined characters in the menu bar pulls down that menu. Once the menu is pulled down, typing the underlined characters in a menu item (without the Alt key) activates that item. With a menu pulled down, you can also use the arrow keys to select menu items, and the Space or Enter keys to activate them.

Keyboard Shortcuts within Dialogs

One button in a dialog is usually marked with a thick indented outline. Pressing the Return or Enter key activates this button.

All dialogs have either a Cancel or Dismiss button. This button can be activated by pressing the Escape (or Esc) key.

Pressing the tab key moves the keyboard focus to the next item in a dialog. Within an associated group of buttons, the arrow keys move the focus among the buttons. Shift+Tab moves backward through the items.

Most items in dialogs have an underline under one character in their name. Pressing the Alt key along with this character, activates a button as if you had pressed it with the mouse, or moves the keyboard focus to the associated text field or list.

You can select items from a list by using the arrow keys to move the selection and space to select.

In file selection dialogs, you can type the beginning characters of the file name or directory in the list to select files

Labeled Function Keys

The labeled function keys on standard workstation and PC keyboards, like the arrows, and page-up and page-down, are active in NEdit, though not shown in the pull-down menus.

Holding down the control key while pressing a named key extends the scope of the action that it performs. For example, Home normally moves the insert cursor the beginning of a line. Ctrl+Home moves it to the beginning of the file. Backspace deletes one character, Ctrl+Backspace deletes one word.

Holding down the shift key while pressing a named key begins or extends a selection. Combining the shift and control keys combines their actions. For example, to select a word without using the mouse, position the cursor at the beginning of the word and press Ctrl+Shift+RightArrow. The Alt key modifies selection commands to make the selection rectangular.

Under X and Motif, there are several levels of translation between keyboard keys and the actions they perform in a program. The "Customizing NEdit", and "X Resources" sections of the Help menu have more information on this subject. Because of all of this configurability, and since keyboards and standards for the meaning of some keys vary from machine to machine, the mappings may be changed from the defaults listed below.

Modifier Keys (in general)

    Ctrl   Extends the scope of the action that the key
           would otherwise perform.  For example, Home
           normally moves the insert cursor to the beginning
           of a line. Ctrl+Home moves it to the beginning of
           the file.  Backspace deletes one character, Ctrl+
           Backspace deletes one word.

    Shift  Extends the selection to the cursor position. If
           there's no selection, begins one between the old
           and new cursor positions.

    Alt    When modifying a selection, makes the selection
           rectangular.

(For the effects of modifier keys on mouse button presses, see the section titled "Using the Mouse")

All Keyboards

    Escape        Cancels operation in progress: menu
                  selection, drag, selection, etc.  Also
                  equivalent to cancel button in dialogs.

    Backspace     Delete the character before the cursor

    Ctrl+BS       Delete the word before the cursor

    Arrows --

      Left        Move the cursor to the left one character

      Ctrl+Left   Move the cursor backward one word
                  (Word delimiters are settable, see
                  "Customizing NEdit", and "X Resources")

      Right       Move the cursor to the right one character

      Ctrl+Right  Move the cursor forward one word

      Up          Move the cursor up one line

      Ctrl+Up     Move the cursor up one paragraph.
                  (Paragraphs are delimited by blank lines)

      Down        Move the cursor down one line.

      Ctrl+Down   Move the cursor down one paragraph.

    Ctrl+Return   Return with automatic indent, regardless
                  of the setting of Auto Indent.

    Shift+Return  Return without automatic indent,
                  regardless of the setting of Auto Indent.

    Ctrl+Tab      Insert an ASCII tab character, without
                  processing emulated tabs.

    Alt+Ctrl+<c>  Insert the control-code equivalent of
                  a key <c>

    Ctrl+/        Select everything (same as Select
                  All menu item or ^A)

    Ctrl+\        Unselect

    Ctrl+U        Delete to start of line

PC Standard Keyboard

    Ctrl+Insert   Copy the primary selection to the
                  clipboard (same as Copy menu item or ^C)
                  for compatibility with Motif standard key
                  binding
    Shift+Ctrl+
    Insert        Copy the primary selection to the cursor
                  location.

    Delete        Delete the character before the cursor.
                  (Can be configured to delete the character
                  after the cursor, see "Customizing NEdit",
                  and "X Resources")

    Ctrl+Delete   Delete to end of line.

    Shift+Delete  Cut, remove the currently selected text
                  and place it in the clipboard. (same as
                  Cut menu item or ^X) for compatibility
                  with Motif standard key binding
    Shift+Ctrl+
    Delete        Cut the primary selection to the cursor
                  location.

    Home          Move the cursor to the beginning of the
                  line

    Ctrl+Home     Move the cursor to the beginning of the
                  file

    End           Move the cursor to the end of the line

    Ctrl+End      Move the cursor to the end of the file

    PageUp        Scroll and move the cursor up by one page.

    PageDown      Scroll and move the cursor down by one
                  page.

    F10           Make the menu bar active for keyboard
                  input (Arrow Keys, Return, Escape,
                  and the Space Bar)

    Alt+Home      Switch to the previously active document.

    Ctrl+PageUp   Switch to the previous document.

    Ctrl+PageDown Switch to the next document.

Specialty Keyboards

On machines with different styles of keyboards, generally, text editing actions are properly matched to the labeled keys, such as Remove, Next-screen, etc.. If you prefer different key bindings, see the section titled "Key Binding" under the Customizing heading in the Help menu.


Shifting and Filling

Shift Left, Shift Right

While shifting blocks of text is most important for programmers (See Features for Programming), it is also useful for other tasks, such as creating indented paragraphs.

To shift a block of text one tab stop to the right, select the text, then choose Shift Right from the Edit menu. Note that the accelerator keys for these menu items are Ctrl+9 and Ctrl+0, which correspond to the right and left parenthesis on most keyboards. Remember them as adjusting the text in the direction pointed to by the parenthesis character. Holding the Shift key while selecting either Shift Left or Shift Right will shift the text by one character.

It is also possible to shift blocks of text by selecting the text rectangularly, and dragging it left or right (and up or down as well). Using a rectangular selection also causes tabs within the selection to be recalculated and substituted, such that the non-whitespace characters remain stationary with respect to the selection.

Filling

Text filling using the Fill Paragraph command in the Edit menu is one of the most important concepts in NEdit. And it will be well worth your while to understand how to use it properly.

In plain text files, unlike word-processor files, there is no way to tell which lines are continuations of other lines, and which lines are meant to be separate, because there is no distinction in meaning between newline characters which separate lines in a paragraph, and ones which separate paragraphs from other text. This makes it impossible for a text editor like NEdit to tell parts of the text which belong together as a paragraph from carefully arranged individual lines.

In continuous wrap mode (Preferences -> Wrap -> Continuous), lines automatically wrap and unwrap themselves to line up properly at the right margin. In this mode, you simply omit the newlines within paragraphs and let NEdit make the line breaks as needed. Unfortunately, continuous wrap mode is not appropriate in the majority of situations, because files with extremely long lines are not common under Unix and may not be compatible with all tools, and because you can't achieve effects like indented sections, columns, or program comments, and still take advantage of the automatic wrapping.

Without continuous wrapping, paragraph filling is not entirely automatic. Auto-Newline wrapping keeps paragraphs lined up as you type, but once entered, NEdit can no longer distinguish newlines which join wrapped text, and newlines which must be preserved. Therefore, editing in the middle of a paragraph will often leave the right margin messy and uneven.

Since NEdit can't act automatically to keep your text lined up, you need to tell it explicitly where to operate, and that is what Fill Paragraph is for. It arranges lines to fill the space between two margins, wrapping the lines neatly at word boundaries. Normally, the left margin for filling is inferred from the text being filled. The first line of each paragraph is considered special, and its left indentation is maintained separately from the remaining lines (for leading indents, bullet points, numbered paragraphs, etc.). Otherwise, the left margin is determined by the furthest left non-whitespace character. The right margin is either the Wrap Margin, set in the preferences menu (by default, the right edge of the window), or can also be chosen on the fly by using a rectangular selection (see below).

There are three ways to use Fill Paragraph. The simplest is, while you are typing text, and there is no selection, simply select Fill Paragraph (or type Ctrl+J), and NEdit will arrange the text in the paragraph adjacent to the cursor. A paragraph, in this case, means an area of text delimited by blank lines.

The second way to use Fill Paragraph is with a selection. If you select a range of text and then chose Fill Paragraph, all of the text in the selection will be filled. Again, continuous text between blank lines is interpreted as paragraphs and filled individually, respecting leading indents and blank lines.

The third, and most versatile, way to use Fill Paragraph is with a rectangular selection. Fill Paragraph treats rectangular selections differently from other commands. Instead of simply filling the text inside the rectangular selection, NEdit interprets the right edge of the selection as the requested wrap margin. Text to the left of the selection is not disturbed (the usual interpretation of a rectangular selection), but text to the right of the selection is included in the operation and is pulled in to the selected region. This method enables you to fill text to an arbitrary right margin, without going back and forth to the wrap-margin dialog, as well as to exclude text to the left of the selection such as comment bars or other text columns.


Tabbed Editing

NEdit is able to display files in distinct editor windows, or to display files under tabs in the same editor window. The Options for controlling the tabbed interface are found under Preferences -> Default Settings -> Tabbed Editing (cf. "Preferences", also "NEdit Command Line").

Notice that you can re-group tabs at any time by detaching and attaching them, or moving them, to other windows. This can be done using the Windows menu, or using the context menu, which pops up when right clicking on a tab.

You can switch to a tab by simply clicking on it, or you can use the keyboard. The default keybindings to switch tabs (which are Ctrl+PageUp/-Down and Alt+Home, see "Keyboard Shortcuts") can be changed using the actions previous_document(), next_document() and last_document().


File Format

While plain-text is probably the simplest and most interchangeable file format in the computer world, there is still variation in what plain-text means from system to system. Plain-text files can differ in character set, line termination, and wrapping.

While character set differences are the most obvious and pose the most challenge to portability, they affect NEdit only indirectly via the same font and localization mechanisms common to all X applications. If your system is set up properly, you will probably never see character-set related problems in NEdit. NEdit can not display Unicode text files, or any multi-byte character set.

The primary difference between an MS DOS format file and a Unix format file, is how the lines are terminated. Unix uses a single newline character. MS DOS uses a carriage-return and a newline. NEdit can read and write both file formats, but internally, it uses the single character Unix standard. NEdit auto-detects MS DOS format files based on the line termination at the start of the file. Files are judged to be DOS format if all of the first five line terminators, within a maximum range, are DOS-style. To change the format in which NEdit writes a file from DOS to Unix or visa versa, use the Save As... command and check or un-check the MS DOS Format button.

Wrapping within text files can vary among individual users, as well as from system to system. Both Windows and MacOS make frequent use of plain text files with no implicit right margin. In these files, wrapping is determined by the tool which displays them. Files of this style also exist on Unix systems, despite the fact that they are not supported by all Unix utilities. To display this kind of file properly in NEdit, you have to select the wrap style called Continuous. Wrapping modes are discussed in the sections: Customizing -> Preferences, and Basic Operation -> Shifting and Filling.

The last and most minute of format differences is the terminating newline. Some Unix compilers and utilities require a final terminating newline on all files they read and fail in various ways on files which do not have it. Vi and approximately half of Unix editors enforce the terminating newline on all files that they write; Emacs does not enforce this rule. Users are divided on which is best. NEdit makes the final terminating newline optional (Preferences -> Default Settings -> Terminate with Line Break on Save).


Features for Programming

Programming with NEdit

Though general in appearance, NEdit has many features intended specifically for programmers. Major programming-related topics are listed in separate sections under the heading: "Features for Programming": Syntax Highlighting, Tabs/Emulated Tabs, Finding Declarations (ctags), Calltips, and Auto/Smart Indent. Minor topics related to programming are discussed below:

Language Modes

When NEdit initially reads a file, it attempts to determine whether the file is in one of the computer languages that it knows about. Knowing what language a file is written in allows NEdit to assign highlight patterns and smart indent macros, and to set language specific preferences like word delimiters, tab emulation, and auto-indent. Language mode can be recognized from both the file name and from the first 200 characters of content. Language mode recognition and language-specific preferences are configured in: Preferences -> Default Settings -> Language Modes....

You can set the language mode manually for a window, by selecting it from the menu: Preferences -> Language Modes.

Backlighting [EXPERIMENTAL]

NEdit can be made to set the background color of particular classes of characters to allow easy identification of those characters. This is particularly useful if you need to be able to distinguish between tabs and spaces in a file where the difference is important. The colors used for backlighting are specified by a resource, "nedit*backlightCharTypes". You can turn backlighting on and off through the Preferences -> Apply Backlighting menu entry.

If you prefer to have backlighting turned on for all new windows, use the Preferences -> Default Settings -> Apply Backlighting menu entry. This settings can be saved along with other preferences using Preferences -> Save Defaults.

Important: In future versions of NEdit, the backlighting feature will be extended and reworked such that it becomes easier to configure. The current way of controlling it through a resource is generally considered to be below NEdit's usability standards. These future changes are likely to be incompatible with the current format of the "nedit*backlightCharTypes" resource, though. Therefore, it is expected that there will be no automatic migration path for users who customize the resource.

Line Numbers

To find a particular line in a source file by line number, choose Goto Line #... from the Search menu. You can also directly select the line number text in the compiler message in the terminal emulator window (xterm, decterm, winterm, etc.) where you ran the compiler, and choose Goto Selected from the Search menu.

To find out the line number of a particular line in your file, turn on Statistics Line in the Preferences menu and position the insertion point anywhere on the line. The statistics line continuously updates the line number of the line containing the cursor.

To go to a specific column on a given line, choose Goto Line #... from the Search menu and enter a line number and a column number separated by a comma. (e.g. Enter "100,12" for line 100 column 12.) If you want to go to a column on the current line just leave out the line number. (e.g. Enter ",45" to go the column 45 on the current line.)

Matching Parentheses

To help you inspect nested parentheses, brackets, braces, quotes, and other characters, NEdit has both an automatic parenthesis matching mode, and a Goto Matching command. Automatic parenthesis matching is activated when you type, or move the insertion cursor after a parenthesis, bracket, or brace. It momentarily highlights either the opposite character ('Delimiter') or the entire expression ('Range') when the opposite character is visible in the window. To find a matching character anywhere in the file, select it or position the cursor after it, and choose Goto Matching from the Search menu. If the character matches itself, such as a quote or slash, select the first character of the pair. NEdit will match {, (, [, <, ", ', `, /, and \. Holding the Shift key while typing the accelerator key (Shift+Ctrl+M, by default), will select all of the text between the matching characters.

When syntax highlighting is enabled, the matching routines can optionally make use of the syntax information for improved accuracy. In that case, a brace inside a highlighted string will not match a brace inside a comment, for instance.

Opening Included Files

The Open Selected command in the File menu understands the C preprocessor's #include syntax, so selecting an #include line and invoking Open Selected will generally find the file referred to, unless doing so depends on the settings of compiler switches or other information not available to NEdit.

Interface to Programming Tools

Integrated software development environments such as SGI's CaseVision and Centerline Software's Code Center, can be interfaced directly with NEdit via the client server interface. These tools allow you to click directly on compiler and runtime error messages and request NEdit to open files, and select lines of interest. The easiest method is usually to use the tool's interface for character-based editors like vi, to invoke nc, but programmatic interfaces can also be derived using the source code for nc.

There are also some simple compile/review, grep, ctree, and ctags browsers available in the NEdit contrib directory on ftp.nedit.org.


Tabs/Emulated Tabs

Changing the Tab Distance

Tabs are important for programming in languages which use indentation to show nesting, as short-hand for producing white-space for leading indents. As a programmer, you have to decide how to use indentation, and how or whether tab characters map to your indentation scheme.

Ideally, tab characters map directly to the amount of indent that you use to distinguish nesting levels in your code. Unfortunately, the Unix standard for interpretation of tab characters is eight characters (probably dating back to mechanical capabilities of the original teletype), which is usually too coarse for a single indent.

Most text editors, NEdit included, allow you to change the interpretation of the tab character, and many programmers take advantage of this, and set their tabs to 3 or 4 characters to match their programming style. In NEdit you set the hardware tab distance in Preferences -> Tabs... for the current window, or Preferences -> Default Settings -> Tabs... (general), or Preferences -> Default Settings -> Language Modes... (language-specific) to change the defaults for future windows.

Changing the meaning of the tab character makes programming much easier while you're in the editor, but can cause you headaches outside of the editor, because there is no way to pass along the tab setting as part of a plain-text file. All of the other tools which display, print, and otherwise process your source code have to be made aware of how the tabs are set, and must be able to handle the change. Non-standard tabs can also confuse other programmers, or make editing your code difficult for them if their text editors don't support changes in tab distance.

Emulated Tabs

An alternative to changing the interpretation of the tab character is tab emulation. In the Tabs... dialog(s), turning on Emulated Tabs causes the Tab key to insert the correct number of spaces and/or tabs to bring the cursor the next emulated tab stop, as if tabs were set at the emulated tab distance rather than the hardware tab distance. Backspacing immediately after entering an emulated tab will delete the fictitious tab as a unit, but as soon as you move the cursor away from the spot, NEdit will forget that the collection of spaces and tabs is a tab, and will treat it as separate characters. To enter a real tab character with "Emulate Tabs" turned on, use Ctrl+Tab.

It is also possible to tell NEdit not to insert ANY tab characters at all in the course of processing emulated tabs, and in shifting and rectangular insertion/deletion operations, for programmers who worry about the misinterpretation of tab characters on other systems.


Auto/Smart Indent

Programmers who use structured languages usually require some form of automatic indent, so that they don't have to continually re-type the sequences of tabs and/or spaces needed to maintain lengthy running indents. NEdit therefore offers "smart" indent, in addition to the traditional automatic indent which simply lines up the cursor position with the previous line.

Smart Indent

Smart indent macros are only available by default for C and C++, and while these can easily be configured for different default indentation distances, they may not conform to everyone's exact C programming style. Smart indent is programmed in terms of macros in the NEdit macro language which can be entered in: Preferences -> Default Settings -> Indent -> Program Smart Indent. Hooks are provided for intervening at the point that a newline is entered, either via the user pressing the Enter key, or through auto-wrapping; and for arbitrary type-in to act on specific characters typed.

To type a newline character without invoking smart-indent when operating in smart-indent mode, hold the Shift key while pressing the Return or Enter key.

Auto-Indent

With Indent set to Auto (the default), NEdit keeps a running indent. When you press the Return or Enter key, spaces and tabs are inserted to line up the insert point under the start of the previous line.

Regardless of indent-mode, Ctrl+Return always does the automatic indent; Shift+Return always does a return without indent.

Block Indentation Adjustment

The Shift Left and Shift Right commands as well as rectangular dragging can be used to adjust the indentation for several lines at once. To shift a block of text one character to the right, select the text, then choose Shift Right from the Edit menu. Note that the accelerator keys for these menu items are Ctrl+9 and Ctrl+0, which correspond to the right and left parenthesis on most keyboards. Remember them as adjusting the text in the direction pointed to by the parenthesis character. Holding the Shift key while selecting either Shift Left or Shift Right will shift the text by one tab stop (or by one emulated tab stop if tab emulation is turned on). The help section "Shifting and Filling" under "Basic Operation" has details.


Syntax Highlighting

Syntax Highlighting means using colors and fonts to help distinguish language elements in programming languages and other types of structured files. Programmers use syntax highlighting to understand code faster and better, and to spot many kinds of syntax errors more quickly.

To use syntax highlighting in NEdit, select Highlight Syntax in the Preferences menu. If NEdit recognizes the computer language that you are using, and highlighting rules (patterns) are available for that language, it will highlight your text, and maintain the highlighting, automatically, as you type.

If NEdit doesn't correctly recognize the type of the file you are editing, you can manually select a language mode from Language Modes in the Preferences menu. You can also program the method that NEdit uses to recognize language modes in Preferences -> Default Settings -> Language Modes....

If no highlighting patterns are available for the language that you want to use, you can create new patterns relatively quickly. The Help section "Highlighting Patterns" under "Customizing", has details.

If you are satisfied with what NEdit is highlighting, but would like it to use different colors or fonts, you can change these by selecting Preferences -> Default Settings -> Syntax Highlighting -> Text Drawing Styles. Highlighting patterns are connected with font and color information through a common set of styles so that colorings defined for one language will be similar across others, and patterns within the same language which are meant to appear identical can be changed in the same place. To understand which styles are used to highlight the language you are interested in, you may need to look at "Highlighting Patterns" section, as well.

Syntax highlighting is CPU intensive, and under some circumstances can affect NEdit's responsiveness. If you have a particularly slow system, or work with very large files, you may not want to use it all of the time. Syntax highlighting introduces two kinds of delays. The first is an initial parsing delay, proportional to the size of the file. This delay is also incurred when pasting large sections of text, filtering text through shell commands, and other circumstances involving changes to large amounts of text. The second kind of delay happens when text which has not previously been visible is scrolled in to view. Depending on your system, and the highlight patterns you are using, this may or may not be noticeable. A typing delay is also possible, but unlikely if you are only using the built-in patterns.


Finding Declarations (ctags)

NEdit can process tags files generated using the Unix ctags command or the Exuberant Ctags program. Ctags creates index files correlating names of functions and declarations with their locations in C, Fortran, or Pascal source code files. (See the ctags manual page for more information). Ctags produces a file called "tags" which can be loaded by NEdit. NEdit can manage any number of tags files simultaneously. Tag collisions are handled with a popup menu to let the user decide which tag to use. In 'Smart' mode NEdit will automatically choose the desired tag based on the scope of the file or module. Once loaded, the information in the tags file enables NEdit to go directly to the declaration of a highlighted function or data structure name with a single command. To load a tags file, select "Load Tags File" from the File menu and choose a tags file to load, or specify the name of the tags file on the NEdit command line:

      nedit -tags tags

NEdit can also be set to load a tags file automatically when it starts up. Setting the X resource nedit.tagFile to the name of a tag file tells NEdit to look for that file at startup time (see "Customizing NEdit"). The file name can be either a complete path name, in which case NEdit will always load the same tags file, or a file name without a path or with a relative path, in which case NEdit will load it starting from the current directory. The second option allows you to have different tags files for different projects, each automatically loaded depending on the directory you're in when you start NEdit. Setting the name to "tags" is an obvious choice since this is the name that ctags uses. NEdit normally evaluates relative path tag file specifications every time a file is opened. All accessible tag files are loaded at this time. To disable the automatic loading of tag files specified as relative paths, set the X resource nedit.alwaysCheckRelativeTagsSpecs to False.

To unload a tags file, select "Un-load Tags File" from the File menu and choose from the list of tags files. NEdit will keep track of tags file updates by checking the timestamp on the files, and automatically update the tags cache.

To find the definition of a function or data structure once a tags file is loaded, select the name anywhere it appears in your program (see "Selecting Text") and choose "Find Definition" from the Search menu.


Calltips

Calltips are little yellow boxes that pop up to remind you what the arguments and return type of a function are. More generally, they're a UI mechanism to present a small amount of crucial information in a prominent location. To display a calltip, select some text and choose "Show Calltip" from the Search menu. To kill a displayed calltip, hit Esc.

Calltips get their information from one of two places -- either a tags file (see "Finding Declarations (ctags)") or a calltips file. First, any loaded calltips files are searched for a definition, and if nothing is found then the tags database is searched. If a tag is found that matches the highlighted text then a calltip is displayed with the first few lines of the definition -- usually enough to show you what the arguments of a function are.

You can load a calltips file by using choosing "Load Calltips File" from the File menu. You can unload a calltips file by selecting it from the "Unload Calltips File" submenu of the File menu. You can also choose one or more default calltips files to be loaded for each language mode using the "Default calltips file(s)" field of the Language Modes dialog.

The calltips file format is very simple. calltips files are organized in blocks separated by blank lines. The first line of the block is the key, which is the word that is matched when a calltip is requested. The rest of the block is displayed as the calltip.

Almost any text at all can appear in a calltip key or a calltip. There are no special characters that need to be escaped. The only issues to note are that trailing whitespace is ignored, and you cannot have a blank line inside a calltip. (Use a single period instead -- it'll be nearly invisible.) You should also avoid calltip keys that begin and end with '*' characters, since those are used to mark special blocks.

There are five special block types--comment, include, language, alias, and version--which are distinguished by their first lines, "* comment *", "* include *", "* language *", "* alias *", and "* version *" respectively (without quotes).

Comment blocks are ignored when reading calltips files.

Include blocks specify additional calltips files to load, one per line. The ~ character can be used for your $HOME directory, but other shell shortcuts like * and ? can't be used. Include blocks allow you to make a calltips file for your project that includes, say, the calltips files for C, Motif, and Xt.

Language blocks specify which language mode the calltips should be used with. When a calltip is requested it won't match tips from languages other than the current language mode. Language blocks only affect the tips listed after the block.

Alias blocks allow a calltip to have multiple keys. The first line of the block is the key for the calltip to be displayed, and the rest of the lines are additional keys, one per line, that should also show the calltip.

Version blocks are ignored for the time being.

You can use calltips in your own macros using the calltip() and kill_calltip() macro subroutines and the $calltip_ID macro variable. See the Macro Subroutines section for details.


Regular Expressions

Basic Regular Expression Syntax

Regular expressions (regex's) are useful as a way to match inexact sequences of characters. They can be used in the `Find...' and `Replace...' search dialogs and are at the core of Color Syntax Highlighting patterns. To specify a regular expression in a search dialog, simply click on the `Regular Expression' radio button in the dialog.

A regex is a specification of a pattern to be matched in the searched text. This pattern consists of a sequence of tokens, each being able to match a single character or a sequence of characters in the text, or assert that a specific position within the text has been reached (the latter is called an anchor.) Tokens (also called atoms) can be modified by adding one of a number of special quantifier tokens immediately after the token. A quantifier token specifies how many times the previous token must be matched (see below.)

Tokens can be grouped together using one of a number of grouping constructs, the most common being plain parentheses. Tokens that are grouped in this way are also collectively considered to be a regex atom, since this new larger atom may also be modified by a quantifier.

A regex can also be organized into a list of alternatives by separating each alternative with pipe characters, `|'. This is called alternation. A match will be attempted for each alternative listed, in the order specified, until a match results or the list of alternatives is exhausted (see Alternation section below.)

The 'Any' Character

If a dot (`.') appears in a regex, it means to match any character exactly once. By default, dot will not match a newline character, but this behavior can be changed (see help topic Parenthetical Constructs, under the heading, Matching Newlines).

Character Classes

A character class, or range, matches exactly one character of text, but the candidates for matching are limited to those specified by the class. Classes come in two flavors as described below:

     [...]   Regular class, match only characters listed.
     [^...]  Negated class, match only characters NOT listed.

As with the dot token, by default negated character classes do not match newline, but can be made to do so.

The characters that are considered special within a class specification are different than the rest of regex syntax as follows. If the first character in a class is the `]' character (second character if the first character is `^') it is a literal character and part of the class character set. This also applies if the first or last character is `-'. Outside of these rules, two characters separated by `-' form a character range which includes all the characters between the two characters as well. For example, `[^f-j]' is the same as `[^fghij]' and means to match any character that is not `f', `g', `h', `i', or `j'.

Anchors

Anchors are assertions that you are at a very specific position within the search text. NEdit regular expressions support the following anchor tokens:

     ^    Beginning of line
     $    End of line
     <    Left word boundary
     >    Right word boundary
     \B   Not a word boundary

Note that the \B token ensures that neither the left nor the right character are delimiters, or that both left and right characters are delimiters. The left word anchor checks whether the previous character is a delimiter and the next character is not. The right word anchor works in a similar way.

Quantifiers

Quantifiers specify how many times the previous regular expression atom may be matched in the search text. Some quantifiers can produce a large performance penalty, and can in some instances completely lock up NEdit. To prevent this, avoid nested quantifiers, especially those of the maximal matching type (see below.)

The following quantifiers are maximal matching, or "greedy", in that they match as much text as possible.

     *   Match zero or more
     +   Match one  or more
     ?   Match zero or one

The following quantifiers are minimal matching, or "lazy", in that they match as little text as possible.

     *?   Match zero or more
     +?   Match one  or more
     ??   Match zero or one

One final quantifier is the counting quantifier, or brace quantifier. It takes the following basic form:

     {min,max}  Match from `min' to `max' times the
                previous regular expression atom.

If `min' is omitted, it is assumed to be zero. If `max' is omitted, it is assumed to be infinity. Whether specified or assumed, `min' must be less than or equal to `max'. Note that both `min' and `max' are limited to 65535. If both are omitted, then the construct is the same as `*'. Note that `{,}' and `{}' are both valid brace constructs. A single number appearing without a comma, e.g. `{3}' is short for the `{min,min}' construct, or to match exactly `min' number of times.

The quantifiers `{1}' and `{1,1}' are accepted by the syntax, but are optimized away since they mean to match exactly once, which is redundant information. Also, for efficiency, certain combinations of `min' and `max' are converted to either `*', `+', or `?' as follows:

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width assertion of the enclosed regular expression. In other words, a match of the regular expression contained in the positive look-ahead construct is attempted. If it succeeds, control is passed to the next regular expression atom, but the text that was consumed by the positive look-ahead is first unmatched (backtracked) to the place in the text where the positive look-ahead was first encountered.

One application of positive look-ahead is the manual implementation of a first character discrimination optimization. You can include a positive look-ahead that contains a character class which lists every character that the following (potentially complex) regular expression could possibly start with. This will quickly filter out match attempts that can not possibly succeed.

Negative Look-Ahead

Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as positive look-ahead except that the enclosed regular expression must NOT match. This can be particularly useful when you have an expression that is general, and you want to exclude some special cases. Simply precede the general expression with a negative look-ahead that covers the special cases that need to be filtered out.

Positive Look-Behind

Positive look-behind constructs are of the form `(?<=<regex>)' and implement a

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

Note that {0} and {0,0} are meaningless and will generate an error message at regular expression compile time.

Brace quantifiers can also be "lazy". For example {2,5}? would try to match 2 times if possible, and will only match 3, 4, or 5 times if that is what is necessary to achieve an overall match.

Alternation

A series of alternative patterns to match can be specified by separating them with vertical pipes, `|'. An example of alternation would be `a|be|sea'. This will match `a', or `be', or `sea'. Each alternative can be an arbitrarily complex regular expression. The alternatives are attempted in the order specified. An empty alternative can be specified if desired, e.g. `a|b|'. Since an empty alternative can match nothingness (the empty string), this guarantees that the expression will match.

Comments

Comments are of the form `(?#<comment text>)' and can be inserted anywhere and have no effect on the execution of the regular expression. They can be handy for documenting very complex regular expressions. Note that a comment begins with `(?#' and ends at the first occurrence of an ending parenthesis, or the end of the regular expression... period. Comments do not recognize any escape sequences.


Metacharacters

Escaping Metacharacters

In a regular expression (regex), most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. Other characters don't match themselves, but are metacharacters. For example, backslash is a special metacharacter which 'escapes' or changes the meaning of the character following it. Thus, to match a literal backslash would require a regular expression to have two backslashes in sequence. NEdit provides the following escape sequences so that metacharacters that are used by the regex syntax can be specified as ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

Special Control Characters

There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit recognizes the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling NEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X), respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escape Sequences

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]', extended with possible locale dependent letters. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


Parenthetical Constructs

Capturing Parentheses

Capturing Parentheses are of the form `(<regex>)' and can be used to group arbitrarily complex regular expressions. Parentheses can be nested, but the total number of parentheses, nested or otherwise, is limited to 50 pairs. The text that is matched by the regular expression between a matched set of parentheses is captured and available for text substitutions and backreferences (see below.) Capturing parentheses carry a fairly high overhead both in terms of memory used and execution speed, especially if quantified by `*' or `+'.

Non-Capturing Parentheses

Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate grouping only and do not incur the overhead of normal capturing parentheses. They should not be counted when determining numbers for capturing parentheses which are used with backreferences and substitutions. Because of the limit on the number of capturing parentheses allowed in a regex, it is advisable to use non-capturing parentheses when possible.

Positive Look-Ahead

Positive look-ahead constructs are of the form `(?=<regex>)' and implement a zero width asse