To fix some problems with lexing in Scintilla and to add new capabilities, there are going to be some major changes. It is likely these will go into the release after the next one. This release will be called 2.20 as the changes are not completely backwards compatible.
A problem with current Scintilla is that lexers and lexer options
such as properties and keywords are attached to the view (
object rather than the
Document object. When two views are
showing one document then it is possible for two different lexers to be
called to style the text leading to arbitrary and confusing results.
To fix this, lexer state is being moved from
Document although the state is still being set up by
as it is providing the API to client code.
This will change the scope of some settings so may require changes to applications. Applications that only set up properties or word lists at initialisation or when changing languages will have to repeat these for each document. Conversely, there will no longer be a need to set parameters for each view on a document or when switching between documents on a view since documents retain settings.
Some languages may benefit from features like styling local variables differently to global variables or showing fields that are not present in a structure in an error style. These sorts of features require that something like a symbol table is maintained by the lexer.
Lexers currently have only limited space to store information about each document: the document's style bytes and line state (a single integer per line). There are some other locations that could be used, like unused bits in folding state, but using these for lexer state may not be compatible with future changes. This makes it too difficult to implement a symbol table with only the current features.
The solution is to create a lexer object which can contain arbitrary
additional data. Each document has a separate lexer object. Lexer
objects implement the
The lexer object may contain any data required for the functioning of the lexer. This can include information extracted from the document such as a list of functions.
There is no current way for a lexer to indicate that changing a
property or keyword list should cause restyling. In SciTE, you can for
example add a keyword to keywordclass.cpp, then return to a C++ document
and not see any change to existing styles. Only lexing done after the
addition will use the new keywords. Lexer objects will be responsible
for storing properties and word lists. They provide
WordListSet methods to receive these parameters and
return a position where lexing should be restarted from (normally 0
although lexers may be more intelligent about this) or -1 if the change
does not affect lexing or folding.
Release is called to destroy the lexer object.
PrivateCall allows for direct communication between the
application and a lexer. An example would be where an application
maintains a single large data structure containing symbolic information
about system headers (like Windows.h) and provides this to the lexer
where it can be applied to each document. This avoids the costs of
constructing the system header information for each document. This is
invoked with the
Currently lexers interact with the document through a concrete class
derived from the
Accessor abstract base class with
providing most of the functionality and the derived class implementing
communication with the document. This is either direct (
for lexers linked into Scintilla or through messages (
for external lexers. In the new scheme, the only way of performing
communications with the document is through the
interface which can be used for external lexers as well as lexers linked
This avoids dependence on GUI windowing code and makes it easier to move lexers between shared libraries and linked in. It could also be used for lexers housed within application code although this has not yet been implemented.
IDocument is an interface, it can be used across
build boundaries (such as between two DLLs) where the implementation can
not be seen from the client so can not be optimized by the compiler.
This gets in the way of efficient buffering, so the task of buffering is
moved to a helper class that is local to the lexer. Example helper
classes are the simple
LexAccessor and its subclass
which provides more services. These may be used by lexers or lexers may
create their own helper classes.
The use of interfaces between components is similar to COM or XPCOM.
Using actual COM or XPCOM would add complexity. The interfaces are
defined as C++ but can be emulated by C and probably by other languages
that are compatible with COM.
SCI_METHOD is defined to be
whatever is needed to specify a reasonable calling convention on each
platform so that each side of the interface can call the other. This is
__stdcall on Windows and is unspecified on Unix.
IDocument interfaces may be
expanded in the future with extended versions (
Version method indicates which interface is
implemented and thus which methods may be called.
Scintilla tries to minimize the consequences of modifying text to
only relex and redraw the line of the change where possible. Lexer
objects contain their own private extra state which can affect later
lines. For example, if the C++ lexer is greying out inactive code
segments then changing the statement
#define BEOS 0 to
BEOS 1 may require restyling and redisplaying later parts of the
document. The lexer can call
ChangeLexerState to signal to
the document that it should relex and display more.
SetErrorStatus is used to notify the document of
exceptions. Exceptions should not be thrown over build boundaries as the
two sides may be built with different compilers or incompatible
External lexers will require changes.
They will have to implement a lexer object factory function (exposed
GetLexerFactory) instead of the current
Fold functions. Once a lexer object has been created,
it is called exactly the same as internal lexer objects.
Existing lexers do not have to change much as the
LexerSimple classes provide a very similar
environment. The set of headers used by lexers has changed but is fairly
consistent among lexers so can just be copied from a lexer included
with Scintilla. Lexers should not include Platform.h and only use
headers from the include and lexlib directories. Using headers from the
src, win32, or gtk directories makes the code dependent on features that
may change so should not be done.
A lexer may be converted to an
ILexer implementing class
by defining a class derived from
ILexer, a factory
function and changing the
LexerModule to use the factory
function rather than lexing and folding functions. Initially it is
simplest to derive the class from
LexerBase as this
provides some default functionality including standard property set and
word lists. Later these should be overridden to optimize changes to
Around 60 lines of boiler-plate additional code are needed to convert an existing lexer into an external lexer that implements ILexer.
An implementation of all this is available from http://www.scintilla.org/nulex.zip
Additional directories have been used to impose some more order on the source code. Lexers have been moved into the lexers directory and classes used by lexers are in the lexlib directory. The build files work for Windows and GTK+, but those for OS X have not been updated.
The C++ lexer included has some code to show whether or not code
is active based on preprocessor state with inactive code shown in
different styles to active code. This is turned on with
keywords5 containing a set of preprocessor definitions
in the form
... Definitions within the source will be picked up if
Both these options have some cost in terms of speed and memory. The
inactive states are 64 greater than their active counterparts. This
Example properties to achieve above:
lexer.cpp.track.preprocessor=1 lexer.cpp.update.preprocessor=1 keywords5.$(file.patterns.cpp)=\ PLAT_GTK=1 \ _MSC_VER \ PLAT_GTK_WIN32=1 # White space style.cpp.64=fore:#808080,fore:#C0C0C0 # Comment: /* */. style.cpp.65=$(style.cpp.1),fore:#90B090 style.cpp.66=$(style.cpp.2),fore:#90B090 style.cpp.67=$(style.cpp.3),fore:#D0D0D0 style.cpp.68=$(style.cpp.4),fore:#90B0B0 style.cpp.69=$(style.cpp.5),fore:#9090B0 style.cpp.70=$(style.cpp.6),fore:#B090B0 style.cpp.71=$(style.cpp.7),fore:#B090B0 style.cpp.72=$(style.cpp.8),fore:#C0C0C0 style.cpp.73=$(style.cpp.9),fore:#B0B090 style.cpp.74=$(style.cpp.10),fore:#B0B0B0 style.cpp.75=$(style.cpp.11),fore:#B0B0B0 style.cpp.76=$(style.cpp.12),fore:#000000 style.cpp.77=$(style.cpp.13),fore:#007F00 style.cpp.78=$(style.cpp.14),fore:#7FAF7F style.cpp.79=$(style.cpp.15),fore:#C0C0C0 style.cpp.80=$(style.cpp.16),fore:#C0C0C0 style.cpp.81=$(style.cpp.17),fore:#C0C0C0 style.cpp.82=$(style.cpp.18),fore:#C0C0C0