Posts

Showing posts from July, 2020

Altering WisiToken data structures, take 2

Found another paper; Overbey 2004 - Practical, Incremental, Noncanonical Parsing: Celentano's Method and the Generalized Piecewise LR Parsing Algorithm. That at least discusses incremental generalized parsing, but it gives no details on the syntax tree structure. Implementing code is also more educational than thinking about it. It proved impractical to keep both the array-based and allocation-based syntax trees, so I just changed it to allocation-based. Then I ran into the problem of finalizing the syntax tree; many nodes are referenced several times, in several different parse streams, so we can't traverse each stream to deallocate nodes. That also means we can't delete the nodes created by a stream when the stream terminates due to a parse error. I ended up adding an array of node access, and using that to deallocate all the nodes when finalizing the tree. In order to delete nodes from terminated streams, we wait until the parse is complete, then copy the final tree from...

Bibliography

I'll maintain a bibliography list in this post (alphabetical by author or web site name): papers/books: Lahav 2004 - Efficient Semantic Analysis for Text Editors     Describes some implementation details for a parser based on Wagner, Graham 1998 Leake 2020 - Robust error correction in a generalized LR parser     Describes the error correction algorithm used in WisiToken Overbey 2004 - Practical, Incremental, Noncanonical Parsing: Celentano's Method and the Generalized Piecewise LR Parsing Algorithm      No details on data structures. Wagner, Graham 1998 - Efficient and Flexible Incremental Parsing     T horough discussion of incremental parsing, but does not discuss generalized parsing. Web sites: Lezer  - incremental generalized parser/generator, written in Typescript Tree-sitter - incremental generalized parser/generator; generator written in Rust, parser in C

Daydreams about incremental vs partial parse

WisiToken currently supports "partial-parse"; Emacs passes only part of a file to the parser, in order to speed up the response. Emacs ada-mode uses regular expressions to find the start and end point of the parse region. This works fairly well, but not as well as I'd like, which is the motivation behind implementing incremental parse. Incremental parse is initially intended to replace partial-parse, but it might be useful to keep both. "Standard" incremental parse requires a full parse when the file is first opened; that will be my initial approach. Some of my customers report that the most common use case is navigating in files, to understand the code; navigation typically requires a full file parse anyway. I've daydreamed about relaxing that; using the current partial parse initially, and extending the parsed region on demand. There could be several disjoint parsed regions in a file (that's supported now by the wisi Gnu ELPA package; it does not requi...

Altering current WisiToken data structures

We must decide how to alter the current WisiToken data structures to accommodate incremental parsing. (I'll use "we" in similar contexts, because that's the style in academic papers (it refers the reader in collaboration with the author). But there's no one else involved here, just me). Another useful paper: Lahav 2004 - Efficient Semantic Analysis for Text Editors That describes a "kmn-list" for representing changes to a source file. We'd eventually like to support the Language Server Protocol; that represents a text change by an affected range and the inserted text (https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_didChange). Lahav 2004 also describes a P-Tree, which is an enhanced parse tree used for incremental parsing. It is a list of partial parse trees, which can range from the list of tokens produced by the lexer, to a single fully parsed tree. Each P-Tree node is allocated from memory se...

Just getting started

I'm just getting started adding incremental parse to WisiToken , so I decided to try writing a blog about the process. Maybe people will be interested in this experience. First step; literature search, mostly in Google scholar. The best paper seems to be Wagner, Graham 1998 Efficient and Flexible Incremental Parsing . There are many papers that cite that one, but none seems to improve on it much. Next step: add a test to the WisiToken test set. It won't be very elaborate; just a string to lex and parse, an edit to the string, and expected results - a token sequence and some representation of the syntax tree. I'll add cases as the code progresses.