Scintilla icon Scintilla and SciTE

Code Style

Introduction

The source code of Scintilla and SciTE follow my preferences. Some of these decisions are arbitrary and based on my sense of aesthetics but its good to have all the code look the same even if its not exactly how everyone would prefer.

Code that does not follow these conventions will be accepted, but will be modified as time goes by to fit the conventions. Scintilla code follows the conventions more closely than SciTE except for lexers which are relatively independent modules. Lexers that are maintained by others are left as they are submitted except that warnings will be fixed so the whole project can compile cleanly.

The AStyle formatting program with '--style=attach --indent=force-tab=8 --keep-one-line-blocks --pad-header --unpad-paren --pad-comma --indent-cases --align-pointer=name --pad-method-prefix --pad-return-type --pad-param-type --align-method-colon --pad-method-colon=after' arguments formats code in much the right way although there are a few bugs in AStyle.

Language features

Design goals for Scintilla and SciTE include portability to currently available C++ compilers on diverse platforms with high performance and low resource usage. Scintilla has stricter portability requirements to SciTE as it may be ported to low capability platforms. Scintilla code must build with C++17 which can be checked with "g++ --std=c++17". SciTE can use C++17 features that are widely available from g++ 7.1, MSVC 2017.6 and Clang 5.0 compilers.

To achieve portability, only a subset of C++ features are used. Exceptions and templates may be used but, since Scintilla can be used from C as well as C++, exceptions may not be thrown out of Scintilla and all exceptions should be caught before returning from Scintilla. A 'Scintilla' name space is used. This helps with name clashes on macOS.

The goto statement is not used because of bad memories from my first job maintaining FORTRAN programs. The union feature is not used as it can lead to non-type-safe value access.

The SCI_METHOD preprocessor definition should be used when implementing interfaces which include it like ILexer and only there.

Headers should always be included in the same order as given by the scripts/HeaderOrder.txt file.

Casting

Do not use old C style casts like (char *)s. Instead use the most strict form of C++ cast possible like const_cast<char *>(s). Use static_cast and const_cast where possible rather than reinterpret_cast.

The benefit to using the new style casts is that they explicitly detail what evil is occurring and act as signals that something potentially unsafe is being done.

Code that treats const seriously is easier to reason about both for humans and compilers, so use const parameters and avoid const_cast.

Warnings

To help ensure code is well written and portable, it is compiled with almost all warnings turned on. This sometimes results in warnings about code that is completely good (false positives) but changing the code to avoid the warnings is generally fast and has little impact on readability.

Initialise all variables and minimise the scope of variables. If a variable is defined just before its use then it can't be misused by code before that point. Use loop declarations that are compatible with both the C++ standard and currently available compilers.

Allocation

Memory exhaustion can occur in many Scintilla methods. This should be checked for and handled but once it has happened, it is very difficult to do anything as Scintilla's data structures may be in an inconsistent state. Fixed length buffers are often used as these are simple and avoid the need to worry about memory exhaustion but then require that buffer lengths are respected.

The C++ new and delete operators are preferred over C's malloc and free as new and delete are type safe.

Bracketing

Start brackets, '{', should be located on the line of the control structure they start and end brackets, '}', should be at the indented start of a line. When there is an else clause, this occurs on the same line as the '}'. This format uses less lines than alternatives, allowing more code to be seen on screen. Fully bracketed control structures are preferred because this makes it more likely that modifications will be correct and it allows Scintilla's folder to work. No braces on returned expressions as return is a keyword, not a function call.

bool fn(int a) {
        
if (a) {
                
s();
                
t();
        
} else {
                
u();
        
}
        
return !a;
}

Spacing

Spaces on both sides of '=' and comparison operators and no attempt to line up '='. No space before or after '(', when used in calls, but a space after every ','. No spaces between tokens in short expressions but may be present in longer expressions. Space before '{'. No space before ';'. No space after '*' when used to mean pointer and no space after '[' or ']'. One space between keywords and '('.

void StoreConditionally(int c, const char *s) {
        
if (c && (baseSegment == trustSegment["html"])) {
                
baseSegment = s+1;
                
Store(s, baseSegment, "html");
        
}
}

Names

Identifiers use mixed case and no underscores. Class, function and method names start with an uppercase letter and use further upper case letters to distinguish words. Variables start with a lower case letter and use upper case letters to distinguish words. Loop counters and similar variables can have simple names like 'i'. Function calls should be differentiated from method calls with an initial '::' global scope modifier.

class StorageZone {
public:
        
void Store(const char *s) {
                
Media *mediaStore = ::GetBaseMedia(zoneDefault);
                
for (int i=mediaStore->cursor; mediaStore[i], i++) {
                        
mediaStore->Persist(s[i]);
                
}
        
}
};

Submitting a lexer

Lexers have been moved to the separate Lexilla project which is on GitHub which may have updated instructions.

Add an issue or pull request on the Lexilla project page.

Define all of the lexical states in a modified LexicalStyles.iface.

Ensure there are no warnings under the compiler you use. Warnings from other compilers will be noted on the feature request.

sc.ch is an int: do not pass this around as a char.

The ctype functions like isalnum and isdigit only work on ASCII (0..127) and may cause undefined behaviour including crashes if used on other values. Check with IsASCII before calling is*.

Functions, structs and classes in lexers should be in an unnamed namespace (see LexCPP) or be marked "static" so they will not leak into other lexers.

If you copy from an existing lexer, remove any code that is not needed since it makes it more difficult to maintain and review.

When modifying an existing lexer, try to maintain as much compatibility as possible. Do not renumber lexical styles as current client code may be built against the earlier values.

Properties

Properties provided by a new lexer should follow the naming conventions and should include a comment suitable for showing to end users. The convention is for properties that control styling to be named lexer.<lexername>.* and those that control folding to be named fold.<lexername>.*. Examples are "lexer.python.literals.binary" and "fold.haskell.imports".

The properties "fold" and "fold.comment" are generic and can be used by any lexer.

See LexPython for examples of properties in an object lexer and LexHTML for a functional lexer.