Back to the schedule
Previous: Powering-up Special Blocks
Next: Analyze code quality through Emacs: a smart forensics approach and the story of a hack

Incremental Parsing with emacs-tree-sitter

Tuấn-Anh Nguyễn

Download video

Download compressed .webm video (21.8M)

Download Q&A video

Download compressed Q&A .webm video (16.4M)

Tree-sitter is a parser generator and an incremental parsing library. emacs-tree-sitter is its most popular Emacs binding, which aims to be the foundation of Emacs packages that understand source code's structure. Examples include better code highlighting, folding, indexing, structural navigation.

In this talk, I will describe the current state of emacs-tree-sitter's APIs and functionalities. I will also discuss areas that need improvements and contribution from the community.

  • Actual start and end time (EST): Start: 2020-11-29T09.49.24; Q&A: 2020-11-29T10.13.56; End: 2020-11-29T10.31.44

Questions

Q20: can we integrate it with Spacemacs Python layer

Q19: The Python mode example was pretty good. Is that something that one can use already?

Yes, already using it at work right now.

Q18: Regarding Emacs integration, will it always need to be a foreign library or can it be included / linked directly in compilation?

Building a parser from source needs Node.js https://tree-sitter.github.io/tree-sitter/creating-parsers#dependencies so I don't know if it'll be in-tree and included at compile time.

Core library dynamic module, would be better to be included in core Emacs eventually. Language definitions might be better distributed separately.

Q17: Is there a link to the slides?

Yes, will post in IRC later.

Slides: https://ubolonton.org/slides/emacs-tree-sitter-emacsconf2020.pdf

Q16: Are there any language major modes that have integrated already?

Not yet (answered during talk).

Typescript: discussing integration, not integrated yet.

Q15: Is it possible to use tree-sitter for structural editing?

Covered by Q4 / Q8 / Q11.

Q14: Is there a folding mode for tree-sitter?

Not yet. There are multiple code folding frameworks inside Emacs, and it's better to integrate with these modes rather than writing something new entirely.

+1 Would be nice if it worked with outshine mode or similar.

Q13: MaxCity on IRC asks: "That pop up M-x window. How do you get that?"

ivy-posframe most likely https://github.com/tumashu/ivy-posframe/. Or not. Cool!

Custom helm code.

Q12: I'm new to the tree-sitter world. Is it easy to install/use it also on Windows? (I have to use winbloat at work)

The usual approach is hoping someone else made a precompiled version for you and download it. Otherwise you'll have to set up a development environment with mingw-msys or whatever.

  • No, both tree-sitter and tree-sitter-langs provide pre-compiled binaries for macOS, Linux, and Windows.

Yes, it should work out-of-the-box on Windows, provided that Emacs was compiled with module support turned on.

Q11: Is it possible to use this for refactoring too?

For the kind of refactoring inside a buffer, it's very doable right now with some glue code. For more extensive refactoring where you want to touch all files in a project, there needs to be some kind of understanding of the language model system, how they are laid out in the filesystem… even files that are not yet loaded into Emacs. That sounds like something a lot more extensive. Sounds like an IDE in Emacs.

Q10: Can language major-mode authors start taking advantage of this now? Or is it intended to be used as a minor-mode?

Minor mode depended on by the major modes.

Q9: I'm completely new to tree-sitter, how do I use it as an end user? Is there an easy example config out there by the organizer or otherwise that shows standard usage with whatever programming language? Or are we not there yet?

Answering own question: Sounds like major mode maintainers need to integrate.

Syntax highlighting is pretty easy to activate https://ubolonton.github.io/emacs-tree-sitter/getting-started/ - nice, tree-sitter-hl-mode looks easy

Need to add more examples to the documentation.

Q8: (Following on from Q4) Could there be a standardised approach to coding automatic refactorings in the future? e.g. so that whichever language mode you are using, you could see a menu of available refactoring operations?

Not sure about this. Most refactoring operations are highly specific to a class of languages. Not one single approach for all the languages, but maybe one for object-oriented languages, one for Lisp-type languages, one for Javascript and Typescript…

I meant the lisp and user interfaces being unified, not the implementations of the refactorings. But maybe it belongs in a separate mode on top. So you could have a defrefactor macro or similar.

Q7: How extensive will the compatibility be between highlighting grammars for Emacs and those for Vim/Neovim with Tree-sitter?

For the time being it looks like nvim-treesitter also uses the S-exp syntax for queries so it shouldn't be too hard. See https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/rust/highlights.scm.

  • No effort has been spent on compatibility yet. Each editor has its own existing conventions for highlighting. Having a common set of basic "capture names" is possible, and will require efforts from multiple editor communities. (Emacs and NeoVim for now. The editor that introduced Tree-sitter, Atom, hasn't used these queries for highlighting.)

Q6: Will it ever be possible to write Tree-sitter grammars in a Lisp, or will JS be required?

The grammar part is written in JSON, you don't need to actually understand JS to write it. Using Lisp would merely give you a s-expression version, that wouldn't buy you much.

  • Ah, so all that is needed is (json-encode '(grammar …))? Great!

Q5: Could you show the source that was matched by the parser in the debug view in addition to the grammar part matched?

Q4: Could this be used with packages like smartparens that aim to bring structrual editing to non-s-expression based languages? AST-based refactoring?

It is one of the goals, but not yet achieved.

Q3: Do you think Tree-sitter would be useful for Org buffers? I can imagine it being used to keep a parsed AST of an Org buffer (e.g. like org-element's output) updated in real time.

An obstacle here is Org not having anything anywhere close to a formal grammar, so that would need to be corrected first.

FIXME: Add link to a emacs-tree-sitter project/snippet for org-mode.

Q2: Will Elisp performance be more competitive with GCCEmacs enough to make Tree-sitter in Elisp more attractive?

The point of this project is to reuse other people's efforts, not rewriting them.

It's a possibility. In terms of probability, probably not. It's a huge amount of work. The GC latency is also a fundamental issue.

Q1: Do you think that his package can be included into Emacs/GNU ELPA?

Yes, it is just matter of paperwork.

Notes

Sunday, Nov 29 2020, ~ 9:56 AM - 10:46 AM EST
Sunday, Nov 29 2020, ~ 6:56 AM - 7:46 AM PST
Sunday, Nov 29 2020, ~ 2:56 PM - 3:46 PM UTC
Sunday, Nov 29 2020, ~ 3:56 PM - 4:46 PM CET
Sunday, Nov 29 2020, ~10:56 PM - 11:46 PM +08

Back to the schedule
Previous: Powering-up Special Blocks
Next: Analyze code quality through Emacs: a smart forensics approach and the story of a hack