Writing a Language Server In OCaml for Emacs, fun, and profit
Austin Theriault (he/they) - last name prounounced tare -e -o, austin@cutedogs.org
Format: 17-min talk ; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings
Talk
Duration: 16:04 minutes00:00.000 Introduction 00:16.540 What is Semgrep? 00:40.720 How do we show security bugs early? 01:37.880 What is the Language Server Protocol? 02:29.040 Case study: Rust Analyzer 03:42.760 Rust Analyzer in action 04:09.960 Why is this useful? 05:36.220 So what about Emacs? 06:40.700 Technical part - Brief communication overview 07:58.760 Example request 08:03.380 LSP capabilities 09:23.380 Tips on writing a LS 11:03.480 Supporting a LS through LSP mode in Emacs 12:06.000 Create a client 13:07.300 Add to list of client packages 14:11.680 Add documentation! 14:17.880 Adding commands and custom capabilities 15:01.360 Thanks for listening
Q&A
Description
Recently, while working at Semgrep, Inc. I wrote a language server for our SAST tool in OCaml: https://github.com/returntocorp/semgrep/tree/develop/src/language_server. I then added support for it to emacs https://github.com/emacs-lsp/lsp-mode/blob/master/clients/lsp-semgrep.el. In this talk I plan to go over what LSP is, why it's important, getting started writing a language server, and supporting a language server in Emacs.
About the speaker:
Austin Theriault is a software engineer at Semgrep, Inc. working on their SAST tool Semgrep. In this talk he will cover the Language Server Protocol, a way to provide language features to an editor, why it's important to the future of editors, and how someone might go about writing a server, and how to integrate it with Emacs.
Discussion
Questions and answers
- Q:Why not write the LSP server in OCaml? I missed the reasoning to
switch to Rust/etc - performance?
- A: The "stack" (cross-compilation, libraries, etc.) being less developed than for developing LSP servers in, e.g., TypeScript
- Q: What are the corner cases, limitations, and other issues you
encountered in implementing an LSP server with client in Emacs, that
were surprising?
- A: Multiple, but performance being the big one. Caching implementation. And then delivery/distribution (doing so cross-platform given the OCaml tooling, etc.)
Transcript
lsp-mode
,
which is an LSP client commonly included
in popular Emacs distributions.
So a lot of people already have that.
If you're using Emacs 29 or greater, you have eglot-mode
,
which is a lighter weight version of lsp-mode
.
It's just another LSP client.
When I wrote the Semgrep language server,
Emacs 29 hadn't come out yet.
I'm not going to talk too much about eglot-mode
because I did everything in lsp-mode
,
but I would imagine a lot of this stuff is very similar.
Here's a list of some supported languages.
eglot-mode
is pretty similar.
So, lsp-mode
's repository is on GitHub,
like everything, and it has a ton of different clients
for a ton of different languages and frameworks and tools,
like Semgrep, and these are available
to anyone who installs LSP mode.
Alternatively, you can make a separate package
and just use LSP mode as a library,
but I'm not going to focus on this,
because there's already a ton of resources out there
on packaging and Emacs.
So, our steps, very quickly, are going to look like
adding an Emacs Lisp file that contains some logic,
add an entry somewhere, so we added a new client
to the list of clients, and then do some documentation,
because documentation's great.
clients/
folder in lsp-mode/
,
literally just add, like, lsp-
whatever it is,
require
the library, and register a client.
Registering a client just means, like,
saying what kind of connection it is.
It's most likely going to be standard I/O,
because that's pretty easy to implement,
and then you just pass it the executable
that you actually want to run.
Say what the activation function is,
so this is when the client should start,
so you can specify the language
or the major mode or whatever,
and now your client will start whenever that's triggered,
and then finally provide just a server ID,
so that way it's easy to keep track of,
and then run this LSP consistency check function.
This just makes sure everything up there is good.
You can do more advanced stuff with making an LSP client
that I'm not going to get into,
but just know that these aren't your only options,
and then finally provide your client.
lsp-mode
supports,
and now you've added support for a whole new language,
whole new framework, whole new tool to Emacs,
and it's taking you, what, like, what is that,
20 lines of Lisp? No, not even, like, 15.
15 lines of Lisp, whole new language for Emacs.
It's really exciting. Now that you have your client,
let's do some documentation. Go fill out this, like, name,
where the repository, the source code is,
because free software is great,
and you should open source your stuff.
Specify the installation command.
What's cool about this is
this can be run automatically from Emacs,
so if it's, like, pip install pyright
, right,
you can put that there, and Emacs will ask you,
do you want to install the language server,
and you can hit yes
and users will just have it installed for them,
and then you can say whether or not it's a debugger.
This is completely separate,
so there's this thing called DAP,
which is the debugger adapter protocol,
and it's similar to LSP but for debuggers,
which is very cool,
lsp-notify
and then a custom method,
and it's great because now you can just scan your project
from a simple Emacs function.
Requests, very similar to notifications.
You send it and then pass it a lambda
and do something with the result,
and so that's adding custom capabilities.
lsp-mode
and the docs.
lsp-mode
, right, that's where you want to add your client.
The docs are great, super useful.
Rust Analyzer is just a great reference
for language servers in general
if you want to write one or if you just want to, like,
see how they work. It's all just really well done.
It's great code, very readable.
And then down here is just a long video tutorial,
a longer video tutorial, not by me,
by someone else, on how to add a language client to Emacs,
but hopefully this is sufficient for y'all,
and now it's time for some Q&A.
Captioner: sachac
Q&A transcript (unedited)
who are currently watching, who have questions, put them into the pad that I can ask them. I'm kind of monitoring the IRC concurrently. So the first question that we have on the pad is concerning why you have switched from OCaml. Maybe the person has missed it in the talk, if you've mentioned it. Why have you switched from OCaml to, in this case, I guess, Rust? language server that I wrote mine for my company in OCaml But I wouldn't recommend it just in general unless like you're doing something specific with OCaml And the reason for that and I recommended Rust or like TypeScript is like OCaml is great. It's very performant but it's cross compilation story is not great. It's like really hard to cross compile like from 1 platform to another. And then like the ecosystem and its standard library is also not great. And like Rust, its cross compilation is great. Its ecosystem is great. OCaml is great if you need to use it, but it's just it's not ideal. And there's just also no good examples of a language server in OCaml. There's the official like OCaml language server, But they use a ton of super advanced language features, like module functors and a bunch of other random stuff. So it's not really readable. But Rust, there's Rust analyzer, which is readable. In TypeScript, there's like a million different ones. So it's less of a, not OCaml is like, it's not that OCaml isn't great. It's more of a, these other languages would probably just be easier. So. for example, like NeoVim or some other editors are just revenue fine because of the so it's a standard LSP specification that you're using. So you can also, for instance, use it and other editors, like for instance, new them or so. It's most, most editors nowadays support it. Like obviously Emacs, NeoVim, Sublime, VS code, Intel, all the IntelliJ ones. So yeah, that's, that's the fun part. You don't have to write 10 different languages to get a bunch of editor support. So I didn't have really time to hear into your talk. So I'm sorry if I ask you questions that you have already said. How was the experience of writing an LSP? So have you any knowledge beforehand or do you just read it all on yourself? which is what motivated me to do this talk. Basically, I just looked at the specification, and I knew Rust Analyzer was cool. And so I looked at Rust Analyzer, and I looked at PyRite. And I just went from there. I found out about all this because I already using Emacs, I already knew about it. I was like, this is going to be easier than something else. So yeah, there's the experience is fine. It's just a lot of wiring stuff up. It's not a lot of like hard thinking until you get to like performance heavy stuff. Like, so for some graph, like we're doing a ton of like code parsing and like analyzing. And so that's, it takes up like a ton of processing power. So like for stuff like that, like now you have to think about caching and like ordering things. So that part's hard, but that's more of a, like very much application specific thing. I think not. It's nothing I can see. No questions, that's kind of odd to be honest. I cannot really ask questions concerning LSP specific. Let's call, let's ask something very unspecific concerning the Emacs usage. And when have you started? How did you came through it and stuff like this? me and my friends just were like, got obsessed with Linux for whatever reason. And then like we traveled down like the, like the free software, like we just thought that was like very entertaining and like interesting to read about all the free software stuff. They were like, yeah, that's cool. And so we all started using Linux. And I'm like, well, if I'm using free software, I'm going to use Emacs. And so I started using Emacs just to try it out. And then I kind of got, I feel like, Stockholm syndrome into it. And now I've realized like, I don't know, now that I've done the like actual work to get into Emacs, it's just, there's so much more I can do with it. But yeah, it was somewhat unintentional. like 2 years ago using Emacs. And also just, oh, there's at first some cool people on YouTube, so systems crafters and people like this. And also, ah, VS Code, I used a lot of VS Code beforehand and then VS Codium because open source and then oh are there any other alternatives and I came to like Neovim and Emacs and often switching around but I stick to Emacs at some point to be honest. cool. I will say that. And also just like I like Vim. Vim is cool but like being able to like write lists and like modify your editor on the fly is just like very appealing to me. I don't know, Emacs was tough at first because like all the like default key bindings are just kind of like and then and then I read somewhere someone was like yeah well Richard Stallman uses evil mode so it's okay. I was like alright I can that's like blessing enough for me Like I'm just gonna switch to evil mode. And I was like, this is way, way better as far as key bindings go. I think, half a year to the default key bindings from Vim beforehand. I switched back to Evil and now I'm losing some kind of hybrid styles. It's kind of weird. But we have a question on the pad. So what are the corner cases, limitations, and other issues you encountered in implementing an LSP server with client in Emacs that were surprising? limitations are definitely like, once again, they're going to be very application specific, but it's usually just the performance part. So like I was saying before, right, in general if you're doing language tooling, you're gonna be doing either parsing or interpreting or something like that, which is very just like computationally heavy and so if you're trying to like do that stuff while someone is editing a file right like every keystrokes every like 1 to 2 seconds if they have a fast computer that's great but a lot of people don't have like that fast of a computer that they can go and like do compilation every single keystroke. So like, I would say, I would say the like limitation is just how fast your computer is and how good you are at like implementing caching for like whatever you're doing. That's also just the main issues I've run into is just it's a constant uphill battle. People will somehow find larger and larger files. You'll end up with files that are like thousands, like tens of thousands of lines long and you think yeah, surely no 1 would expect like instantaneous response for like like editing a file that has like tens of thousands of lines, but then they do. As far as corner cases go, I would say the corner case is like, just in general is actually distributing the language server. Cause like writing the language server is fine. Like wiring everything up is fine. But then like, once you actually have to go and distribute it, well, now you're distributing in a binary. Like I was saying before with OCaml, doesn't have great cross compilation. So for some graph for our language server, we target Linux and Mac OS, and we have a ton of people who use Windows, but compiling OCaml for Windows is basically impossible. So our corner case there, the way we solved it was now we're transpiling OCaml to JavaScript, which is a huge can of worms. Like it's a lot of fun. It's very interesting, but like it's not ideal. And so that's what I was saying before. I recommend like Rust or TypeScript because those are way more portable and a lot easier to install. And you don't have to worry about any of that weird packaging stuff. So yeah, I would say that's like the main corner case and the main limitation is just speed and caching. someone doesn't want to refactor or something. How did you start? So did you have any way to still be relatively performant when they have big files or is it just not supported? I don't care. And the way we ended up doing that, so SemGrep is like you write this generic pattern. You kind of write the language, but then there's these other symbols and stuff that are included in that, this like meta language. And so what happens is, is most languages get, they get parsed and then into a syntax tree, right? Like whatever the language is syntax tree is, and then they get, the syntax tree gets converted into this, like, we call it like an abstract syntax tree, which is like abstract from like any, like languages specific syntax tree. And so then we can cache that, which is really good because like if someone types something like we don't have to go through and do like the full parsing and like converting, we only have to do it incrementally. And so that's, that's how we dealt with that. Or the other option is that we just, we just cache whatever the previous results are, and then run it asynchronously, and they might get it delayed. But we've ended up doing more AST caching, which is fun and cool. Blaine. If Eaglet is a subset of LSP mode, can EGLOT conflict with LSP mode if both are present in your initial .el file? mode a ton, so I'm not 100% sure. I think all of the key bindings and commands, if you just install it out of the box, I Think they're different. So I don't think there's like any like overlap as far as that stuff goes but you will have the overlap of like you entered, like you started a major mode for like some language, like they'll both probably start the language server and provide diagnostics and everything. And so then now you're getting like, you're just like doubling the work your computer is doing. So there's that conflict. But if you prefer EGLOT mode or LSP mode for like 1 language or framework, like 1 major mode and LSP mode for the other, I think you should be fine. we have like 1 minute on the stream and then we'll switch back and to the pre-recorded stuff I guess. interruption but I'm just doing a little bit of time keeping so thank you so much Austin sadly I wasn't able to follow the Q&A because I was in the other track answering questions. If, Austin, you want to stay and answer some more questions, feel free to do so. People tend to start talking as soon as we go off air, And I wouldn't be surprised with LSP that people would do the same. We're gonna move on for this track. We're gonna move on in 20 seconds to the next- So Floey, thank you for hosting.
Questions or comments? Please e-mail austin@cutedogs.org