Adding Incremental parse to WisiToken

Posts

Update the Ada parser

September 28, 2020

The WisiToken error recovering parser was not hard to update to use the new syntax tree; once that was working, I started updating the Ada parser to use it. I took a detour here; Ada 2020 is almost out, so I added the new syntax to the Ada parser. Along the way I found a better way to implement the indent engine. That's all working now; the ada-mode tests all pass. So back to incremental parse. Lahav 2004 describes an "incremental scanner"; only the tokens that actually require scanning due to text changes are scanned. The other tokens following any change are adjusted to reflect the change in position. WisiToken uses the term "lexer" instead of "scanner"; it uses the re2c lexer. The lexing process must modify the existing terminal stream in the syntax tree, rather than creating a new one; so it makes sense to also modify the parse tree at the same time, breaking it down where it must change.

Altering WisiToken data structures, take 2

July 31, 2020

Found another paper; Overbey 2004 - Practical, Incremental, Noncanonical Parsing: Celentano's Method and the Generalized Piecewise LR Parsing Algorithm. That at least discusses incremental generalized parsing, but it gives no details on the syntax tree structure. Implementing code is also more educational than thinking about it. It proved impractical to keep both the array-based and allocation-based syntax trees, so I just changed it to allocation-based. Then I ran into the problem of finalizing the syntax tree; many nodes are referenced several times, in several different parse streams, so we can't traverse each stream to deallocate nodes. That also means we can't delete the nodes created by a stream when the stream terminates due to a parse error. I ended up adding an array of node access, and using that to deallocate all the nodes when finalizing the tree. In order to delete nodes from terminated streams, we wait until the parse is complete, then copy the final tree from...

Bibliography

July 25, 2020

I'll maintain a bibliography list in this post (alphabetical by author or web site name): papers/books: Lahav 2004 - Efficient Semantic Analysis for Text Editors Describes some implementation details for a parser based on Wagner, Graham 1998 Leake 2020 - Robust error correction in a generalized LR parser Describes the error correction algorithm used in WisiToken Overbey 2004 - Practical, Incremental, Noncanonical Parsing: Celentano's Method and the Generalized Piecewise LR Parsing Algorithm No details on data structures. Wagner, Graham 1998 - Efficient and Flexible Incremental Parsing T horough discussion of incremental parsing, but does not discuss generalized parsing. Web sites: Lezer - incremental generalized parser/generator, written in Typescript Tree-sitter - incremental generalized parser/generator; generator written in Rust, parser in C

Daydreams about incremental vs partial parse

July 25, 2020

WisiToken currently supports "partial-parse"; Emacs passes only part of a file to the parser, in order to speed up the response. Emacs ada-mode uses regular expressions to find the start and end point of the parse region. This works fairly well, but not as well as I'd like, which is the motivation behind implementing incremental parse. Incremental parse is initially intended to replace partial-parse, but it might be useful to keep both. "Standard" incremental parse requires a full parse when the file is first opened; that will be my initial approach. Some of my customers report that the most common use case is navigating in files, to understand the code; navigation typically requires a full file parse anyway. I've daydreamed about relaxing that; using the current partial parse initially, and extending the parsed region on demand. There could be several disjoint parsed regions in a file (that's supported now by the wisi Gnu ELPA package; it does not requi...

Altering current WisiToken data structures

July 25, 2020

We must decide how to alter the current WisiToken data structures to accommodate incremental parsing. (I'll use "we" in similar contexts, because that's the style in academic papers (it refers the reader in collaboration with the author). But there's no one else involved here, just me). Another useful paper: Lahav 2004 - Efficient Semantic Analysis for Text Editors That describes a "kmn-list" for representing changes to a source file. We'd eventually like to support the Language Server Protocol; that represents a text change by an affected range and the inserted text (https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_didChange). Lahav 2004 also describes a P-Tree, which is an enhanced parse tree used for incremental parsing. It is a list of partial parse trees, which can range from the list of tokens produced by the lexer, to a single fully parsed tree. Each P-Tree node is allocated from memory se...

Just getting started

July 24, 2020

I'm just getting started adding incremental parse to WisiToken , so I decided to try writing a blog about the process. Maybe people will be interested in this experience. First step; literature search, mostly in Google scholar. The best paper seems to be Wagner, Graham 1998 Efficient and Flexible Incremental Parsing . There are many papers that cite that one, but none seems to improve on it much. Next step: add a test to the WisiToken test set. It won't be very elaborate; just a string to lex and parse, an edit to the string, and expected results - a token sequence and some representation of the syntax tree. I'll add cases as the code progresses.

Search This Blog

Adding Incremental parse to WisiToken

Posts

Starting on incremental parser

Update the Ada parser

Altering WisiToken data structures, take 2

Bibliography

Daydreams about incremental vs partial parse

Altering current WisiToken data structures

Just getting started