Back to the talks Previous by time: Back to school with Emacs Next by time: How to incorporate handwritten notes into Emacs Orgmode Track: Development

Tree-sitter beyond syntax highlighting

Abin Simon (IRC: meain on libera.chat, Matrix: @meain:matrix.org, mail@meain.io)

In this talk, Abin Simon shares many ways in which Tree-sitter can help improve your text editing workflow. Afterwards, he will answer questions via IRC.

The following image shows where the talk is in the schedule for Sat 2022-12-03. Solid lines show talks with Q&A via BigBlueButton. Dashed lines show talks with Q&A via IRC or Etherpad.

Schedule for Saturday Saturday 9:00- 9:05 Saturday opening remarks sat-open 9:05- 9:25 Emacs journalism (or everything's a nail if you hit it with Emacs) journalism 9:45- 9:55 Back to school with Emacs school 10:05-10:15 How to incorporate handwritten notes into Emacs Orgmode handwritten 10:45-11:05 Writing and organizing literature notes for scientific writing science 11:25-11:35 The Emacs Buddy initiative buddy 1:00- 1:20 Attending and organizing Emacs meetups meetups 1:40- 1:55 Linking personal info with Hyperbole implicit buttons buttons 2:15- 2:40 Real estate and Org table formulas realestate 3:00- 3:25 Health data journaling and visualization with Org Mode and gnuplot health 3:45- 4:05 Edit live Jupyter notebook cells with Emacs jupyter 4:50- 4:55 Saturday closing remarks sat-close 10:00-10:15 Tree-sitter beyond syntax highlighting treesitter 10:25-10:45 lsp-bridge: a smooth-as-butter asynchronous LSP client lspbridge 10:55-11:15 asm-blox: a game based on WebAssembly that no one asked for asmblox 11:25-11:35 Emacs should become a Wayland compositor wayland 1:00- 1:25 Using SQLite as a data source: a framework and an example sqlite 1:50- 2:30 Revisiting the anatomy of Emacs mail user agents mail 2:50- 3:10 Maintaining the Maintainers: Attribution as an Economic Model for Open Source maint 3:35- 3:40 Bidirectional links with eev eev 3:50- 3:55 Short hyperlinks to Python docs python 4:05- 4:35 Haskell code exploration with Emacs haskell 9 AM 10 AM 11 AM 12 PM 1 PM 2 PM 3 PM 4 PM 5 PM

Format: 12-min talk followed by IRC Q&A (#emacsconf-dev)
Etherpad: https://pad.emacsconf.org/2022-treesitter
Discuss on IRC: #emacsconf-dev
Status: Q&A starting (not yet open for joining)

Times in different timezones:
Saturday, Dec 3 2022, ~10:00 AM - 10:15 AM EST (US/Eastern)
which is the same as:
Saturday, Dec 3 2022, ~9:00 AM - 9:15 AM CST (US/Central)
Saturday, Dec 3 2022, ~8:00 AM - 8:15 AM MST (US/Mountain)
Saturday, Dec 3 2022, ~7:00 AM - 7:15 AM PST (US/Pacific)
Saturday, Dec 3 2022, ~3:00 PM - 3:15 PM UTC
Saturday, Dec 3 2022, ~4:00 PM - 4:15 PM CET (Europe/Paris)
Saturday, Dec 3 2022, ~5:00 PM - 5:15 PM EET (Europe/Athens)
Saturday, Dec 3 2022, ~8:30 PM - 8:45 PM IST (Asia/Kolkata)
Saturday, Dec 3 2022, ~11:00 PM - 11:15 PM +08 (Asia/Singapore)
Sunday, Dec 4 2022, ~12:00 AM - 12:15 AM JST (Asia/Tokyo)
Find out how to watch and participate

00:00:00.000 Opening 00:24.201 Introduction to Tree-sitter 00:50.280 Querying Tree-sitter tree 01:37.040 Syntax highlighting 02:15.640 Custom syntax highlighting 03:47.120 Text objects 05:48.760 Code folding 06:20.480 Navigating config files 08:10.480 Navigating code 08:21.560 Intelligent templates 09:31.520 Structural editing 09:59.080 tree-sitter-save-excursion 10:26.240 The future

Description

Tree sitter has seen a lot of development recently, but more often than not folks are only aware of its use for syntax highliting. The idea of this talk is to introduce some other usecases where they could benefit from tree-sitter.

This talk will be an overview of the kind of things that they will be able to do with tree-sitter with demos but won't go in depth into how they would all of them. The presentation will link to the resources mentioned during the talk where folks can learn more about each of them.

This session will introduce them to things like (not final list):

Transcript

[00:00:00.000] Hey everyone, my name is Abin Simon and this talk is about "Tree-sitter: Beyond Syntax Highlighting." For those who are not aware of what Tree-sitter is, let me give you a quick intro. Tree-sitter, at its core, is a parser generator tool and an incremental parsing library. What it essentially means is that it gives you an always up-to-date AST [abstract syntax tree] of your code. In the current Emacs frame, what you see to the right

[00:00:27.960] is the AST tree produced by Tree-sitter of the code that is on the left. For example, if you go to this "if" statement, you can see it goes here. It is also really good at handling errors. For example, if I were to delete this [if statement], it still parses out a tree as much as it can, but with an error node.

[00:00:50.280] Now let's see how we can query the tree to get the information that we need. Let's first try to get all the identifiers in the buffer. It highlights all the identifiers in the buffer, but let's say we want to get something a little more precise. Let's say we wanted to get this "i" here. This, in our case, would be this identifier inside this assignment expression inside this "for" statement. We can write it out like this. I hope this gives you a basic idea of how Tree-sitter works and how you can query to get the information that you need.

[00:01:37.040] First of all, let's see how Tree-sitter can help us with syntax highlighting. This is the default syntax highlighting by Emacs for SQL. Now let's see how Tree-sitter helps. This is the syntax highlighting in Emacs which Tree-sitter enabled. You'll see that we're able to target a lot more things and highlight them. That said, you don't always have to highlight everything. I personally prefer a much simpler theme.

[00:02:15.640] Now let's see how Tree-sitter helps you simplify adding custom syntax highlighting to your code. This is a Python file which has a class and a few member functions. Anyone who has used Python will know that the "self" keyword, while it is passed in as an argument, it has more meaning than that. Let's see if you can use Tree-sitter to highlight just the "self" keyword. If you look at the Tree-sitter tree, you can see that this is the first identifier in the list of parameters for a function definition. This is how you would query for the first identifier inside parameters inside a function definition. Now, if you see here, it also matches "cls", but let's restrict it to match just "self". Now we have a Tree-sitter query that identifies the first argument to the function definition and is also called "self". We can use this to apply custom highlighting onto this. This is pretty much all the code that you'll need to do this. The first block here is essentially to say to Tree-sitter to highlight anything with python.self with the face of custom-set. Now the second block here essentially is how we match for that. Now if you go back into a Python buffer and re-enable python-mode, we'll see that "self" is highlighted differently.

[00:03:47.120] How about creating text objects? Tree-sitter can help there too. For those who don't know, text objects is an idea that comes from Vim, and you can do things like select word, delete word, things like that. There are other text objects like line and paragraph. For each text object, you can have operations that are defined on them. For example, delete, copy, select, comment, all of these are operations that you can do. Let's try and use Tree-sitter to add more text objects. This is a plugin that I wrote which lets you add more text objects into Emacs. It helps you code aware text objects like functions, conditionals, loops, and such. Let's see an example scenario of how something like this could come in handy. For example, I can select inside this condition or inside this function and do things like that. Let's say I want to take this conditional, move to the next function, and create it here. What I would do is something like delete the conditional, move to the next function, create a conditional there, and paste. Let's try another example. Let's say I want to take this and move it to the end. If I had to do it without text objects, I'd probably have to go back to the previous comma, delete till next comma, find the closing bracket, and paste before. That works, but let's see how Tree-sitter can simplify it. With Tree-sitter, I can say delete the argument, go to the end of the next argument, and then paste. Tree-sitter essentially helps Emacs understand the code better semantically. Here is yet another use case. I work at a remote company, and I often find myself being in a call with my teammates, explaining the code to them. And one thing that really comes in handy is the narrowing accessibility of Emacs. Specifically, the fancy-narrow package. I use it to narrow just the function, or I could narrow to the conditional.

[00:05:48.760] Next to the end, the list would be code folding. This is a package which uses Tree-sitter to improve the code folding functionalities of Emacs. Code folding has always been this thing that I've had a love-hate relationship with. It usually works most of the time, but then fails if the indentation is wrong or we do something weird with the arguments. But now with Tree-sitter in the mix, it's a lot more precise. I can fold comments, I can fold functions, I can fold conditionals. You get the idea.

[00:06:20.480] I work with Kubernetes, which means I end up having to write and read a lot of YAML files. And navigating big YAML files is a mess. The two main problems are figuring out where I am, and two, navigating to where I want to be. Let's see how Tree-sitter can help us with both of this. This is an example YAML file. To be precise, this is the values file of the Redis helm chart. I'm somewhere in the file on tag under image, but I don't know what this tag is for. But with the help of Tree-sitter, I've been able to add this information into my header line. If you see in the header line, you'll see that I'm under sentinel.image. Now let's see how this helps with navigation. Let's say I want to enable persistence on master node. So with the help of Tree-sitter, I was able to enumerate every field that is available in this YAML file, and I can pass that information onto imenu, which I can then use to go to exactly where I want to. Also, since we're not dealing with any language specific constructs, this is very easy to extend to other similar languages or config files in this case. So for example, this is a JSON file, and I can navigate to location or project. And just like in YAML, it shows me where I'm at. I'm in projects.name, or I'm inside projects.highlights. Or how about Nix? This is my home.nix file. Again, I can search for services, and this lists me all the services that I've enabled. How about just services.description? So this is all the services that I've enabled and have descriptions.

[00:08:10.480] Now that we have seen this for config files, let's see how similar things apply for code. Just like in config files, I can see which function I'm under, and if I go to the next function, it changes.

[00:08:21.560] Okay, here is something really awesome. This is probably one of my favorites, and one of the things that actually made me understand how powerful Tree-sitter is, and got me into it. I work with a lot of Go code, and anyone who has worked with Go will tell you how repetitive it is handling errors. For those who don't write Go, let me give you a rough idea of what I'm talking about. If you want to bubble up the error, the way you would do it is just to return the error to the function that called it. Over here, you can either return nil or an empty value, and at the end, you return error. Let's try and use Tree-sitter to do this. Using the help of Tree-sitter, let's make Emacs go back, figure out what the return arguments are, figure out what their default values are, and automatically fill in the return statement. It would look something like this. In my case, it filled in the complete form, it figured out what the return arguments are, what their types are, and what their default values are, and filled out the entire return. And since this is a template, I can go to the next function, do the same thing, next function, do the same thing, next function, do the same thing.

[00:09:31.520] Here is a really fascinating use case of Tree-sitter, structural editing. You might be aware of plugins like paredit, which seems to "know" your code. This sort of takes it onto another level. It is in its early stages, but what this lets you do is completely treat your code as an AST, and edit as if it's a tree instead of characters. I am not going to go much in depth into it, but if you're interested, there is a talk from last year's EmacsConf around it.

[00:09:59.080] I'm just going to end this with one last tiny thing that I found in the tree-sitter-extras package. It's this tiny macro called tree-sitter-save-excursion. It works pretty much like save-excursion, but better. It uses the Tree-sitter syntax tree instead of just the code to figure out where to restore the position. My main use case for this was with code formatters. Since the code moves around a lot when it gets formatted, save-excursion was completely useless, but this came in handy.

[00:10:26.240] I'll just leave you off with what the future of Tree-sitter looks like for Emacs. So far, every Tree-sitter related feature that I've talked about is powered by this library. But there is talk about Tree-sitter coming into the core. It will most probably be landing in Emacs 29, and if you want to check out the work on Tree-sitter in core Emacs, you can check out the features/tree-sitter branch. You'll probably see more and more features and packages relying upon Tree-sitter, and even major modes being powered by Tree-sitter. And that's a wrap from me. Thank you.

Questions or comments? Please e-mail mail@meain.io

Back to the talks Previous by time: Back to school with Emacs Next by time: How to incorporate handwritten notes into Emacs Orgmode Track: Development

CategoryTreeSitter