Back to the talks Previous by track: One year progress update Schemacs (formerly Gypsum) Next by track: Swanky Python: Interactive development for Python Track: Development

Juicemacs: Exploring Speculative JIT Compilation for ELisp in Java

Kana (they/them) - IRC: kanakana, Blog: https://kyo.iroiro.party - ActivityPub: @kana@f.iroiro.party - Bluesky: @kana.iroiro.party, kana@iroiro.party

Format: 20-min talk ; Q&A: Etherpad
Etherpad: https://pad.emacsconf.org/2025-juicemacs
Status: TO_REVIEW_QA

Duration: 19:10 minutes

Description

Just-in-time (JIT) compilation helps dynamic languages run fast, and speculative compilation makes them run faster, as has been showcased by JVMs, LuaJIT, JavaScript engines, and many more JIT runtimes. However, Emacs native-compilation, despite its JIT compilation (native-comp-jit-compilation), does not speculate about runtime execution, making it effectively a JIT-ish AOT (ahead-of-time) compiler. By introducing a speculative runtime for ELisp, we could potentially improve ELisp performance even further, with many new optimization opportunities.

Juicemacs is my work-in-progress toy project re-implementing Emacs in Java. At its centre sits an ELisp JIT runtime powered by Graal Truffle, a JIT interpreter framework based on partial evaluation and Futamura projections. This talk will cover the following along with some demonstrations:

What is Juicemacs and its ambition? How compatible is it (or does it plan to be) with GNU Emacs and how feature-complete is it now?
What is speculative compilation? How is it useful for an ELisp JIT runtime?
How is the performance of Juicemacs compared to Emacs nativecomp? How do we interpret the benchmarks?
What is Truffle and partial evaluation? What is needed if we are to implement a speculative runtime in C without Truffle?
What JIT techniques and other things does Juicemacs plan to explore? How to get involved?

Relevant links:

Accompanying blog post (slides + transcript + more discussions): https://kyo.iroiro.party/en/posts/juicemacs-exploring-jit-for-elisp/ (scheduled to become available after the talk)
Project repository: https://github.com/gudzpoz/Juicemacs or https://codeberg.org/gudzpoz/Juicemacs
ERT testing results: https://emacsen.de/dev/tests/
Zulip chat (devlog + discussions): https://juice.zulipchat.com

About the speaker:

Hello! This is Kana, an Emacs hobbyist and Java lover from China. A few years ago I discovered the Truffle JIT compilation framework and have since hoped to implement a JIT runtime myself. Last year I finally started implementing one for ELisp, called Juicemacs, and have made some progress. In this talk I will share what I've learned during the journey, including how three interpreters out of four (or more?) in Emacs are implemented in Juicemacs and how speculative compilation can make some optimizations possible.

Transcript

Hello! This is Kana! And today I'll be talking about Just-In-Time compilation, or JIT, for Emacs Lisp, based on my work-in-progress Emacs clone, Juicemacs. Juicemacs aims to explore a few things that I've been wondering about for a while. For exmaple, what if we had better or even transparent concurrency in ELisp? Or, can we have a concurrent GUI? One that does not block, or is blocked by Lisp code? And finally what can JIT compilation do for ELisp? Will it provide better performance? However, a main problem with explorations in Emacs clones is that, Emacs is a whole universe. And that means, to make these explorations meaningful for Emacs users, we need to cover a lot of Emacs features, before we can ever begin. For example, one of the features of Emacs is that, it supports a lot of encodings. Let's look at this string: it can be encoded in both Unicode and Shift-JIS, a Japanese encoding system. But currently, Unicode does not have an official mapping for this "ki" (﨑) character. So when we map from Shift-JIS to Unicode, in most programming languages, you end up with something like this: it's a replacement character. But in Emacs, it actually extends the Unicode range by threefold, and uses the extra range to losslessly support characters like this. So if you want to support this feature, that basically rules out all string libraries with Unicode assumptions. For another, you need to support the regular expressions in Emacs, which are, really irregular. For example, it supports asserting about the user cursor position. And it also uses some character tables, that can be modified from Lisp code, to determine to case mappings. And all that makes it really hard, or even impossible to use any existing regexp libraries. Also, you need a functional garbage collector. You need threading primitives, because Emacs has already had some threading support. And you might want the performance of your clone to match Emacs, even with its native compilation enabled. Not to mention you also need a GUI for an editor. And so on. For Juicemacs, building on Java and a compiler framework called Truffle, helps in getting better performance; and by choosing a language with a good GC, we can actually focus more on the challenges above. Currently, Juicemacs has implemented three out of, at least four of the interpreters in Emacs. One for lisp code, one for bytecode, and one for regular expressions, all of them JIT-capable. Other than these, Emacs also has around two thousand built-in functions in C code. And Juicemacs has around four hundred of them implemented. It's not that many, but it is surprisingly enough to bootstrap Emacs and run the portable dumper, or pdump, in short. Let's have a try. So this is the binary produced by Java native image. And it's loading all the files needed for bootstrapping. Then it dumps the memory to a file to be loaded later, giving us fast startup. As we can see here, it throws some frame errors because Juicemacs doesn't have an editor UI or functional frames yet. But otherwise, it can already run quite some lisp code. For example, this code uses the benchmark library to measure the performance of this Fibonacci function. And we can see here, the JIT engine is already kicking in and makes the execution faster. In addition to that, with a bit of workaround, Juicemacs can also run some of the ERT, or, Emacs Regression Test suite, that comes with Emacs. So... Yes, there are a bunch of test failures, which means we are not that compatible with Emacs and need more work. But the whole testing procedure runs fine, and it has proper stack traces, which is quite useful for debugging Juicemacs. So with that, a rather functional JIT runtime, let's now try look into today's topic, JIT compilation for ELisp. So, you probably know that Emacs has supported native-compilation, or nativecomp in short, for some time now. It mainly uses GCC to compile Lisp code into native code, ahead of time. And during runtime, Emacs loads those compiled files, and gets the performance of native code. However, for example, for installed packages, we might want to compile them when we actually use them instead of ahead of time. And Emacs supports this through this native-comp-jit-compilation flag. What it does is, during runtime, Emacs sends loaded files to external Emacs worker processes, which will then compile those files asynchronously. And when the compilation is done, the current Emacs session will load the compiled code back and improves its performance, on the fly. When you look at this procedure, however, it is, ahead-of-time compilation, done at runtime. And it is what current Emacs calls JIT compilation. But if you look at some other JIT engines, you'll see much more complex architectures. So, take luaJIT for an example, in addition to this red line here, which leads us from an interpreted state to a compiled native state, which is also what Emacs does, LuaJIT also supports going from a compiled state back to its interpreter. And this process is called "deoptimization". In contrast to its name, deoptimization here actually enables a huge category of JIT optimizations. They are called speculation. Basically, with speculation, the compiler can use runtime statistics to speculate, to make bolder assumptions in the compiled code. And when the assumptions are invalidated, the runtime deoptimizes the code, updates statistics, and then recompile the code based on new assumptions, and that will make the code more performant. Let's look at an example. So, here is a really simple function, that adds one to the input number. But in Emacs, it is not that simple, because Emacs has three categories of numbers, that is, fix numbers, or machine-word-sized integers, floating numbers, and big integers. And when we compile this, we need to handle all three cases. And if we analyze the code produced by Emacs, as is shown by this gray graph here, we can see that it has, two paths: One fast path, that does fast fix number addition; and one for slow paths, that calls out to an external plus-one function, to handle floating number and big integers. Now, if we pass integers into this function, it's pretty fast because it's on the fast path. However, if we pass in a floating number, then it has to go through the slow path, doing an extra function call, which is slow. What speculation might help here is that, it can have flexible fast paths. When we pass a floating number into this function, which currently has only fixnumbers on the fast path, it also has to go through the slow path. But the difference is that, a speculative runtime can deoptimize and recompile the code to adapt to this. And when it recompiles, it might add floating number onto the fast path, and now floating number operations are also fast. And this kind of speculation is why speculative runtime can be really fast. Let's take a look at some benchmarks. They're obtained with the elisp-benchmarks library on ELPA. The blue line here is for nativecomp, and these blue areas mean that nativecomp is slower. And, likewise, green areas mean that Juicemacs is slower. At a glance, the two (or four) actually seems somehow on par, to me. But, let's take a closer look at some of them. So, the first few benchmarks are the classic, Fibonacci benchmarks. We know that, the series is formed by adding the previous two numbers in the series. And looking at this expression here, Fibonacci benchmarks are quite intensive in number additions, subtractions, and function calls, if you use recursions. And it is exactly why Fibonacci series is a good benchmark. And looking at the results here... wow. Emacs nativecomp executes instantaneously. It's a total defeat for Juicemacs, seemingly. Now, if you're into benchmarks, you know something is wrong here: we are comparing the different things. So let's look under the hood and disassemble the function with this convenient Emacs command called disassemble... And these two lines of code is what we got. So, we already can see what's going on here: GCC sees Fibonacci is a pure function, because it returns the same value for the same arguments, so GCC chooses to do the computation at compile time and inserts the final number directly into the compiled code. It is actually great! Because it shows that nativecomp knows about pure functions, and can do all kinds of things like removing or constant-folding them. And Juicemacs just does not do that. However, we are also concerned about the things we mentioned earlier: the performance of number additions, or function calls. So, in order to let the benchmarks show some extra things, we need to modify it a bit... by simply making things non-constant. With that, Emacs gets much slower now. And again, let's look what's happening behind these numbers. Similarly, with the disassemble command, we can look into the assembly. And again, we can already see what's happening here. So, Juicemacs, due to its speculation nature, supports fast paths for all three kind of numbers. However, currently, Emacs nativecomp does not have any fast path for the operations here like additions, or subtractions, or comparisons, which is exactly what Fibonacci benchmarks are measuring. Emacs, at this time, has to call some generic, external functions for them, and this is slow. But is nativecomp really that slow? So, I also ran the same benchmark in Common Lisp, with SBCL. And nativecomp is already fast, compared to untyped SBCL. It's because SBCL also emits call instructions when it comes to no type info. However, once we declare the types, SBCL is able to compile a fast path for fix numbers, which makes its performance on par with speculative JIT engines (that is, Juicemacs), because, now both of us are now on fast paths. Additionally, if we are bold enough to pass this safety zero flag to SBCL, it will remove all the slow paths and type checks, and its performance is close to what you get with C. Well, probably we don't want safety zero most of the time. But even then, if nativecomp were to get fast paths for more constructs, there certainly is quite some room for performance improvement. Let's look at some more benchmarks. For example, for this inclist, or increment-list, benchmark, Juicemacs is really slow here. Partly, it comes from the cost of Java boxing integers. On the other hand, for Emacs nativecomp, for this particular benchmark, it actually has fast paths for all of the operations. And that's why it can be so fast, and that also proves the nativecomp has a lot potential for improvement. There is another benchmark here that use advices. So Emacs Lisp supports using advices to override functions by wrapping the original function, and an advice function, two of them, inside a glue function. And in this benchmark, we advice the Fibonacci function to cache the first ten entries to speed up computation, as can be seen in the speed-up in the Juicemacs results. However, it seems that nativecomp does not yet compile glue functions, and that makes advices slower. With these benchmarks, let's discuss this big question: Should GNU Emacs adopt speculative JIT compilation? Well, the hidden question is actually, is it worth it? And, my personal answer is, maybe not. The first reason is that, slow paths, like, floating numbers, are actually not that frequent in Emacs. And optimizing for fast paths like fix numbers can already get us very good performance already. And the second or main reason is that, speculative JIT is very hard. LuaJIT, for example, took a genius to build. Even with the help of GCC, we need to hand-write all those fast path or slow path or switching logic. We need to find a way to deoptimize, which requires mapping machine registers back to interpreter stack. And also, speculation needs runtime info, which also costs us extra memory. Moreover, as is shown by some benchmarks above, there's some low-hanging fruits in nativecomp that might get us better performance with relatively lower effort. Compared to this, a JIT engine is a huge, huge undertaking. But, for Juicemacs, the JIT engine comes a lot cheaper, because, we are cheating by building on an existing compiler framework called Truffle. Truffle is a meta-compiler framework, which means that it lets you write an interpreter, add required annotations, and it will automatically turn the interpreter into a JIT runtime. So for example, here is a typical bytecode interpreter. After you add the required annotations, Truffle will know that, the bytecode here is constant, and it should unroll this loop here, to inline all those bytecode. And then, when Truffle compiles the code, it knows that: the first loop here does: x plus one, and the second does: return. And then it will compile all that into, return x plus 1, which is exactly what we would expect when compiling this pseudo code. Building on that, we can also easily implement speculation, by using this transferToInterpreterAndInvalidate function provided by Truffle. And Truffle will automatically turn that into deoptimization. Now, for example, when this add function is supplied with, two floating numbers. It will go through the slow path here, which might lead to a compiled slow path, or deoptimization. And going this deoptimization way, it can then update the runtime stats. And now, when the code is compiled again, Truffle will know, that these compilation stats, suggests that, we have floating numbers. And this floating point addition branch will then be incorporated into the fast path. To put it into Java code... Most operations are just as simple as this. And it supports fast paths for integers, floating numbers, and big integers. And the simplicity of this not only saves us work, but also enables Juicemacs to explore more things more rapidly. And actually, I have done some silly explorations. For example, I tried to constant-fold more things. Many of us have an Emacs config that stays largely unchanged, at least during one Emacs session. And that means many of the global variables in ELisp are constant. And with speculation, we can speculate about the stable ones, and try to inline them as constants. And this might improve performance, or maybe not? Because, we will need a full editor to get real world data. I also tried changing cons lists to be backed by some arrays, because, maybe arrays are faster, I guess? But in the end, setcdr requires some kind of indirection, and that actually makes the performance worse. And for regular expressions, I also tried borrowing techniques from PCRE JIT, which is quite fast in itself, but it is unfortunately unsupported by Java Truffle runtime. So, looking at these, well, explorations can fail, certainly. But, with Truffle and Java, these, for now, are not that hard to implement, and also very often, they teach us something in return, whether or not they fail. Finally, let's talk about some explorations that we might get into in the future. For the JIT engine, for example, currently I'm looking into the implementation of nativecomp to maybe reuse some of its optimizations. For the GUI, I'm very very slowly working on one. If it ever completes, I have one thing I'm really looking forward to implementing. That is, inlining widgets, or even other buffers, directly into a buffer. Well, it's because, people sometimes complain about Emacs's GUI capabilities, But I personally think that supporting inlining, like a whole buffer inside another buffer as a rectangle, could get us very far in layout abilities. And this approach should also be compatible with terminals. And I really want to see how this idea plays out with Juicemacs. And of course, there's Lisp concurrency. And currently i'm thinking of a JavaScript-like, transparent, single-thread model, using Java's virtual threads. But anyway, if you are interested in JIT compilation, Truffle, or anything above, or maybe you have your own ideas, you are very welcome to reach out! Juicemacs does need to implement many more built-in functions, and any help would be very appreciated. And I promise, it can be a very fun playground to learn about Emacs and do crazy things. Thank you!

Questions or comments? Please e-mail kana@iroiro.party

Back to the talks Previous by track: One year progress update Schemacs (formerly Gypsum) Next by track: Swanky Python: Interactive development for Python Track: Development