Tree-sitter beyond syntax highlighting
Abin Simon (IRC: meain on libera.chat, Matrix: @meain:matrix.org, mail@meain.io)
In this talk, Abin Simon shares many ways in which Tree-sitter can help improve your text editing workflow. Afterwards, he will answer questions via IRC.
00:00.000 Opening 00:24.201 Introduction to Tree-sitter 00:50.280 Querying Tree-sitter tree 01:37.040 Syntax highlighting 02:15.640 Custom syntax highlighting 03:47.120 Text objects 05:48.760 Code folding 06:20.480 Navigating config files 08:10.480 Navigating code 08:21.560 Intelligent templates 09:31.520 Structural editing 09:59.080 tree-sitter-save-excursion 10:26.240 The future
Description
Tree sitter has seen a lot of development recently, but more often than not folks are only aware of its use for syntax highliting. The idea of this talk is to introduce some other usecases where they could benefit from tree-sitter.
This talk will be an overview of the kind of things that they will be able to do with tree-sitter with demos but won't go in depth into how they would all of them. The presentation will link to the resources mentioned during the talk where folks can learn more about each of them.
This session will introduce them to things like (not final list):
- textobjects using tree-sitter: https://github.com/meain/evil-textobj-tree-sitter/
- Folding using tree-sitter: https://github.com/emacs-tree-sitter/ts-fold
- Navigating config headings: https://blog.meain.io/2022/navigating-config-files-using-tree-sitter/
- Using tree-sitter for narrowing: https://blog.meain.io/2022/more-treesitter-emacs/#narrow-to-language-level-constructs
- Intelligent snippets using tree-sitter: https://blog.meain.io/2021/intelligent-snippets-treesitter/
- Using tree-sitter to get which-func like functionality: https://blog.meain.io/2022/more-treesitter-emacs/#show-current-class%2Ffunction-name-in-modeline
- Some useful tree-sitter functions: tree-sitter-save-excursion
Discussion
Notes
- The speaker's blog: https://blog.meain.io/
- Fancy Narrow: https://github.com/Malabarba/fancy-narrow
- Text objects using tree-sitter in evil-mode: https://github.com/meain/evil-textobj-tree-sitter/
- Notes/Slides: https://github.com/meain/emacsconf-talk-tree-sitter
Questions and answers
- Q: What treesitter package is being used I think there is 3
different ones
- A: Most of what is demoed here is using https://github.com/emacs-tree-sitter/elisp-tree-sitter
- Q: Can the folds be treated as outlines as in outline-minor-mode
folds?
- A: I don't think the package ts-fold which I showcased works with outline mode, but it should be simple enough to add something like that (https://github.com/emacs-tree-sitter/ts-fold)
- Q: Is there any benefit to use tree-sitter for sexp-based languages?
+1
- A: Being able to query for specific things like variables / conditions might come in handy
- Q:Do you have to have an LSP set up in order to use tree-sitter?
- A:I still use eglot for lsp. While tree-sitter help with highlighting, folding, nav etc . . tree-sitter can be more thought of to be working on a single file. So when I need to do project wide things like jump to defenition, find reference or renames lsp comes in handy.
- Q: Is there any example configuration for the transition from
traditional major mode to new *-ts-major-mode? It seems that
configuration of major mode (xxx-mode-hook, yasnippet, etc) has to
been rewritten
- A: I am just starting to work with builtin tree-sitter, so don't have much input here unfortunately
- Q: So, is there a tree-sitter language definition for elisp?
- A: I'm just starting to look into built-in tree sitter, but I feel like we should be able to do all of them.
- Q: awesome stuff. i always wonder when itll appear in my fingers. sure the lib is in 29 but i guess some glue is required?
- Q: thanks for the great talk. I have one question. Will tree sitter able to highlight syntax in sourceblocks of org files?
- A: there is nothing technically stopping one from enabling highlighting in config blocks. Since I don't use org, I've not really looked into how it currently is.
- Q: So Emacs 29 includes the original tree sitter C library by Max Brunsfeld or is it a custom rewrite?
- Q: what about the relationship between emacs-tree-sitter and treesitter in core emacs
- A: there are just two Emacs side implementaions for the tree-sitter lib https://tree-sitter.github.io/tree-sitter/ . The first one is in Rust and the second in C within core
- A: I have not extensively tested out the builtin one, but both should be more or less the same. The builtin one is less mature as of now and has a slightly different api.
- A: Plus most plugins that work with tree-sitter will be working with elisp-tree-sitter only as of now
- I'm speaking of the the third party package. Perhaps it has been fixed. IIUC the issue is not the emacs package but rather tree-sitter itself.
- A: You might wanna open an issue at https://github.com/emacs-tree-sitter/tree-sitter-langs if you are having issues with tree-sitter highlighting
- Q: biggest difference between the treesitter functionality built into Emacs 29 and the emacs-tree-sitter github package?
- Q: Are there any sample configurations about *-ts-mode integrated with default major-mode?
- Q: Building Emacs 29 with native tree-sitter support seem challanging any useful tip.
- building emacs ... tips from Xah Lee http://xahlee.info/emacs/emacs/building_emacs_on_linux.html
- Q: How much of what you showed can be done with the build-in tree-sitter?
- Q: How easy is it to hack the syntax definition?
- A: It is super easy once you learn a bit about tree-sitter. This is how the highlight queries looks like. https://github.com/emacs-tree-sitter/tree-sitter-langs/blob/master/queries/python/highlights.scm Once we have this, the tree-sitter integration can take care of the rest.
- Q: So Emacs 29 includes the original tree sitter C library by Max Brunsfeld or is it a custom rewrite?
- both elisp-tree-sitter and tree-sitter in emacs core are emacs side wrappers on top of the tree-sitter lib from Max.
- Q: What was the name of the module used for doing AST queries on the current buffer?
- A: for viewing and querying the tree, they are commands built into
tree-sitter-debug-mode
andtree-sitter-query-builder
- A: for viewing and querying the tree, they are commands built into
Other IRC discussions
- thanks for the great talk meain!
- thank you for the talk
- Great talk. Can I use this with Python? Bash?
- Amazing stuff!!! I need that YAML thing!
- When I am writing lisp macros I'm always having problems with the highlighting. I'm seeing this can be achieve with tree-sitter, is there a more streamlined way of doing it with treesitter - with less code?
- thank you for tree-sitter talk, It's awesome
- Now I definitely need to try tree-sitter
- very inspiring talk, the future looks bright!
- Very well done talk. Thank you.
- I've actually added a lot of highlighting for rust mode on my editor, using tree sitter. It's very powerful once you get into it
- with the new *-ts-mode, seems that a lot of configurations of language specific major mode have to be rewritten
- A: yup, there are quite a few things that being rewritten a bit like indent, highlight etc.
- yup, previously I would just jump to top after reformatting code. These days I've been also trying https://github.com/radian-software/apheleia which has been pretty good at keeping the position by using some other methods
- I use your tree-sitter package for a long time, It works very stable.
- I use tree-sitter write plugin to replace paredit: https://github.com/manateelazycat/grammatical-edit
Transcript
[00:00:00.000] Hey everyone, my name is Abin Simon and this talk is about "Tree-sitter: Beyond Syntax Highlighting." For those who are not aware of what Tree-sitter is, let me give you a quick intro. Tree-sitter, at its core, is a parser generator tool and an incremental parsing library. What it essentially means is that it gives you an always up-to-date AST [abstract syntax tree] of your code. In the current Emacs frame, what you see to the right
[00:00:27.960] is the AST tree produced by Tree-sitter of the code that is on the left. For example, if you go to this "if" statement, you can see it goes here. It is also really good at handling errors. For example, if I were to delete this [if statement], it still parses out a tree as much as it can, but with an error node.
[00:00:50.280] Now let's see how we can query the tree to get the information that we need. Let's first try to get all the identifiers in the buffer. It highlights all the identifiers in the buffer, but let's say we want to get something a little more precise. Let's say we wanted to get this "i" here. This, in our case, would be this identifier inside this assignment expression inside this "for" statement. We can write it out like this. I hope this gives you a basic idea of how Tree-sitter works and how you can query to get the information that you need.
[00:01:37.040] First of all, let's see how Tree-sitter can help us with syntax highlighting. This is the default syntax highlighting by Emacs for SQL. Now let's see how Tree-sitter helps. This is the syntax highlighting in Emacs which Tree-sitter enabled. You'll see that we're able to target a lot more things and highlight them. That said, you don't always have to highlight everything. I personally prefer a much simpler theme.
[00:02:15.640] Now let's see how Tree-sitter helps you simplify adding custom syntax highlighting to your code. This is a Python file which has a class and a few member functions. Anyone who has used Python will know that the "self" keyword, while it is passed in as an argument, it has more meaning than that. Let's see if you can use Tree-sitter to highlight just the "self" keyword. If you look at the Tree-sitter tree, you can see that this is the first identifier in the list of parameters for a function definition. This is how you would query for the first identifier inside parameters inside a function definition. Now, if you see here, it also matches "cls", but let's restrict it to match just "self". Now we have a Tree-sitter query that identifies the first argument to the function definition and is also called "self". We can use this to apply custom highlighting onto this. This is pretty much all the code that you'll need to do this. The first block here is essentially to say to Tree-sitter to highlight anything with python.self with the face of custom-set. Now the second block here essentially is how we match for that. Now if you go back into a Python buffer and re-enable python-mode, we'll see that "self" is highlighted differently.
[00:03:47.120] How about creating text objects? Tree-sitter can help there too. For those who don't know, text objects is an idea that comes from Vim, and you can do things like select word, delete word, things like that. There are other text objects like line and paragraph. For each text object, you can have operations that are defined on them. For example, delete, copy, select, comment, all of these are operations that you can do. Let's try and use Tree-sitter to add more text objects. This is a plugin that I wrote which lets you add more text objects into Emacs. It helps you code aware text objects like functions, conditionals, loops, and such. Let's see an example scenario of how something like this could come in handy. For example, I can select inside this condition or inside this function and do things like that. Let's say I want to take this conditional, move to the next function, and create it here. What I would do is something like delete the conditional, move to the next function, create a conditional there, and paste. Let's try another example. Let's say I want to take this and move it to the end. If I had to do it without text objects, I'd probably have to go back to the previous comma, delete till next comma, find the closing bracket, and paste before. That works, but let's see how Tree-sitter can simplify it. With Tree-sitter, I can say delete the argument, go to the end of the next argument, and then paste. Tree-sitter essentially helps Emacs understand the code better semantically. Here is yet another use case. I work at a remote company, and I often find myself being in a call with my teammates, explaining the code to them. And one thing that really comes in handy is the narrowing capability of Emacs. Specifically, the fancy-narrow package. I use it to narrow just the function, or I could narrow to the conditional.
[00:05:48.760] Next to the end, the list would be code folding. This is a package which uses Tree-sitter to improve the code folding functionalities of Emacs. Code folding has always been this thing that I've had a love-hate relationship with. It usually works most of the time, but then fails if the indentation is wrong or we do something weird with the arguments. But now with Tree-sitter in the mix, it's a lot more precise. I can fold comments, I can fold functions, I can fold conditionals. You get the idea.
[00:06:20.480] I work with Kubernetes, which means I end up having to write and read a lot of YAML files. And navigating big YAML files is a mess. The two main problems are figuring out where I am, and two, navigating to where I want to be. Let's see how Tree-sitter can help us with both of this. This is an example YAML file. To be precise, this is the values file of the Redis helm chart. I'm somewhere in the file on tag under image, but I don't know what this tag is for. But with the help of Tree-sitter, I've been able to add this information into my header line. If you see in the header line, you'll see that I'm under sentinel.image. Now let's see how this helps with navigation. Let's say I want to enable persistence on master node. So with the help of Tree-sitter, I was able to enumerate every field that is available in this YAML file, and I can pass that information onto imenu, which I can then use to go to exactly where I want to. Also, since we're not dealing with any language specific constructs, this is very easy to extend to other similar languages or config files in this case. So for example, this is a JSON file, and I can navigate to location or project. And just like in YAML, it shows me where I'm at. I'm in projects.name, or I'm inside projects.highlights. Or how about Nix? This is my home.nix file. Again, I can search for services, and this lists me all the services that I've enabled. How about just services.description? So this is all the services that I've enabled and have descriptions.
[00:08:10.480] Now that we have seen this for config files, let's see how similar things apply for code. Just like in config files, I can see which function I'm under, and if I go to the next function, it changes.
[00:08:21.560] Okay, here is something really awesome. This is probably one of my favorites, and one of the things that actually made me understand how powerful Tree-sitter is, and got me into it. I work with a lot of Go code, and anyone who has worked with Go will tell you how repetitive it is handling errors. For those who don't write Go, let me give you a rough idea of what I'm talking about. If you want to bubble up the error, the way you would do it is just to return the error to the function that called it. Over here, you can either return nil or an empty value, and at the end, you return error. Let's try and use Tree-sitter to do this. Using the help of Tree-sitter, let's make Emacs go back, figure out what the return arguments are, figure out what their default values are, and automatically fill in the return statement. It would look something like this. In my case, it filled in the complete form, it figured out what the return arguments are, what their types are, and what their default values are, and filled out the entire return. And since this is a template, I can go to the next function, do the same thing, next function, do the same thing, next function, do the same thing.
[00:09:31.520] Here is a really fascinating use case of Tree-sitter, structural editing. You might be aware of plugins like paredit, which seems to "know" your code. This sort of takes it onto another level. It is in its early stages, but what this lets you do is completely treat your code as an AST, and edit as if it's a tree instead of characters. I am not going to go much in depth into it, but if you're interested, there is a talk from last year's EmacsConf around it.
[00:09:59.080] I'm just going to end this with one last tiny thing that I found in the tree-sitter-extras package. It's this tiny macro called tree-sitter-save-excursion. It works pretty much like save-excursion, but better. It uses the Tree-sitter syntax tree instead of just the code to figure out where to restore the position. My main use case for this was with code formatters. Since the code moves around a lot when it gets formatted, save-excursion was completely useless, but this came in handy.
[00:10:26.240] I'll just leave you off with what the future of Tree-sitter looks like for Emacs. So far, every Tree-sitter related feature that I've talked about is powered by this library. But there is talk about Tree-sitter coming into the core. It will most probably be landing in Emacs 29, and if you want to check out the work on Tree-sitter in core Emacs, you can check out the features/tree-sitter branch. You'll probably see more and more features and packages relying upon Tree-sitter, and even major modes being powered by Tree-sitter. And that's a wrap from me. Thank you.
Captioner: sachac
Questions or comments? Please e-mail mail@meain.io