Back to the talks Previous by track: Some problems of modernizing Emacs Next by track: Weightlifting tracking with Emacs on Android Track: General

An introduction to the Emacs Reader

Divyá - IRC: divya, Website: https://www.phimulambda.org Mastodon: https://mathstodon.xyz/@divyaranjan, divya@subvertising.org

Format: 35-min talk ; Q&A: BigBlueButton conference room Etherpad: https://pad.emacsconf.org/2025-reader
Etherpad: https://pad.emacsconf.org/2025-reader
Status: TO_REVIEW_QA

Talk

00:00.720 An introduction to the Emacs reader 00:44.760 Yet another document viewer in Emacs? 02:05.760 Architecture of Emacs Reader 06:00.280 A word on dynamic modules 07:39.560 Features of Emacs Reader 07:56.760 Memory efficiency 11:18.720 Performance and speed 14:23.680 Scanned PDFs 17:08.960 System-level multi-threading 23:44.240 Native Emacs integrations 25:10.340 (Naive) dark mode 26:01.140 Challenges and further improvements 29:14.272 What Emacs can learn? 32:32.300 Contributing to the development 33:35.520 Acknowledgements

Duration: 34:37 minutes

Q&A

Listen to just the audio:
Duration: 20:12 minutes

Description

https://codeberg.org/divyaranjan/emacs-reader

This talk will introduce a new document reader that I have been building for Emacs since the last few months. I will showcase the basic features of the document reader, how well it integrates with Emacs and the performance and other improvements that it provides in comparision to the existing document viewing options such as DocView, PDF Tools and others.

I will also describe the core architectural decisions that were made, specficially the fact that it is a dynamic module and the pains and pleasures of interfacing Emacs with C and vice-versa.

I will give a high-level tour of the codebase, which is pretty small as of now (<3K LOC), so that if a fellow Emacs developer wishes to contribute, they know where and how to get started.

In conclusion, I'll summarize the current features we're in the process of developing, what challenges we're facing in diong so, and what we wish to work on for the upcoming versions of the package.

About the speaker:

I’m Divyá from India. My background has been as a mathematics teacher and now I'm a programmer. I’ve been hacking on free software as a hobby and, for the past three to four years, living inside Emacs. I loved reading PDFs in Emacs via pdf-tools, but poor hardware and maintenance gaps pushed me to build a faster reader. I learned MuPDF and Emacs Dynamic Modules and wrote The Emacs Reader: a dynamic-module-based, high-performance, resource-friendly document viewer for Emacs that supports PDFs and other formats (EPUB, CBZ) while integrating natively with Emacs.

Discussion / notes

  • Q: Is there scope for integrating the C library into Emacs itself with muPDF becoming an optional dependency?
    • A: That will entail having a pdf engine integrated into Emacs source-tree, not sure if that\'s a good idea.
  • Q: The dynamic modules sound great, and it\'s amazing they\'re there since 2017. Why is it so slow to take off, do you think? Is there prior art with them? 
    • A: Mostly because Elisp is so nice to use for almost everything you need to do in Emacs. It\'s only in very specific cases that you require to care about real-time latency and memory efficiency. And packages like libvterm and others do use it for such purposes.
  • Q: How is pdf-tools difficult to install? I install it using the built in package manager. Looking at the emacs-reader installation instructions, I don\'t see how install it that easily. I don\'t use use-package or straight. Question answered in presentation.
  • A: Just the list of dependencies required to build epdfinfo itself makes it difficult, and when you install pdf-tools it does a huge autotools build as well. Emacs Reader only (and will always) depends on MuPDF (and Emacs too.).
  • Q: What tool(s) did you use to measure the memory usage between the three packages?
    • A: Valgrind\'s massif + massif-visualizer
    • I've been using perf and then visualizing with hotspot when debugging FFI in Common Lisp... it's felt successful
    • (I've never had success with valgrind, but I've not dug deep with it)
    • I discovered perf only recently when I had a deep need for low-level optimisation (which is not something I often need) - it's a really nice workflow!
  • Q: How is the conversion between ELisp and the foreign language type system done? For example when interfacing with a C++ library that makes heavy use of C++ object system and templates?
    • A: Basically, dynamic modules make you write Emacs Lisp in your language. Consult the blogpost above for a more elaborate and complete explanation.
  • Q: pdf-tools renders high quality images. Does emacs-reader do that?
    • A: Yes! We can render high-quality images just fine!
  • Q: Can one look at pdf metadata with emacs-reader? Can annotations be added? Does it understand forms? Can it handle encrypted pdfs?
    • A: Support for all of this is planned.
  • Q: I installed emacs-reader already. It is as promised :) Great job! How can I associate odt files to open with emacs-reader?
    • A: It should just work with the find-file command.
  • Q: If a pdf file is open in emacs-reader and I regenerate the pdf with some changes, does emacs-reader actually refresh the pdf on its own or do I have to reload the pdf?
    • A: Yes if it\'s a complete file with the same filename, but no if it\'s still being created with LaTeX - we need SyncTeX for that
  • Q: What are the challenges with integrating with SyncTeX and AucTeX? This would be great to see as pdf-tools handles this well.
    • A: Planned, no major obstacles anticipated. The only reason we haven\'t done it yet is more important highlighting and text selection features planned.
  • Q:  Loved that presentation! Will you be giving another talk on the architecture you went over? A deep-dive there would be awesome.
  • Q: Is there search functionality? Something like isearch and occur?
    • A: Not as such yet. But it is HIGH PRIORITY.
  • Q: Does dynamic module prevent customization that Elisp usually provide? (Advices, Hooks, etc).
    • A: No, you can do everything on the Elisp side that you want. On the dynamic side, it\'s a bit more tricky, not much support there right now.
  • Q: Follow-up on dynamic module: Do you usually create an Elisp shim from the FFI and then use them with Elisp code?
    • A: Yes, we usually wrap dynamic module functions in Elisp to make sure the foreign function gets called when it\'s needed.
  • Q: Is searching on the roadmap? Or is it already available as a feature? Thanks!
    • A:  YES! HIGH PRIORITY \<3
  • Q:  Will there be occur like searching?
    • A:  YES! Basically everything in PDF Tools is planned!
  • Q:  What is your timing expectation for it to appear on ELPA?
    • A: By next major release, which will be in 1-2 months.
  • Q: is this essentially FFI?
    • A: Yup
  • Q: interesting.. is that how webkit integration works?
    • (audience): not really.
  • Q: thank you. Are there other packages that use dynamic modules?
    • (audience): Yes, vterm also uses a C module
  • Q: Has any work been done to make org-noter work with emacs reader?
    • (audience): not yet. It is in the plans, though.
    • (audience): Nice. That would allow me to switch, because I use org-noter quite a bit.
  • Q: Are any of you doing simple editing of PDFs in Emacs? I'm thinking about form-filling, adding signatures, that sort of thing
    • A: that's something I'd also like to integrate once we have other basic features ready
    • (audience): not yet. We do want to add annotation support though. Not sure if that's the same thing.
    • I don't believe it is, but I could be mistaken
    • A: slightly, but not exactly, mupdf does support forms and signatures so we shouldn't have much issues except making it work with emacs.
    • I'm not fond of annotations, because it mutates the original PDF.
  • Q: From the example when calling page 56, is there another thread immediately fetching the next 5 pages for cache?
    • A: yeah indeed I'll talk about it later in the slides, you'll have to build mupdf from scratch in that case.
    • A: there are multiple threads competing to fill the cache window, depending upon how long it takes to render each slice.
  • Q: here is a question that I am a bit embarassed to ask... is there an easy way to install emacs-reader with package-vc without use-package?
    • yes!
    • I am asking just because I've been postponing learning use-package for ages
      • you can use just package-vc totally. use-package uses package-vc under the hood anyway
        • I tried this - (package-vc-install "https://codeberg.org/divyaranjan/emacs-reader")
        • you need to do a build step. you can just go and do it manually
  • Q: how does the dark mode compare to midnight mode in pdf-tools?
    • it is simply inverting the colors
    • Q: so it does not support setting the foreground and background colours?
      • It does not.
    • pdf-tools has pdf-view-themed-minor-mode that tries to match a PDF with the current theme. It's kinda nice. I'm glad Emacs Reader is aiming to have something similar in the near future.
    • dark mode intergration with the dark themes would be cool
  • Q: that is a question that I've had for ages - how do we handle that PDFs that (La)TeX generates one page at a time, and that will be broken until the last page is generated...
    • using synctex
  • Q: I will try to attend one of your streamings... where can I find info about them?
    • A: https://tv.dyne.org/c/phimulambda
    • we also have an IRC at #phi-mu-lambda you're also welcomed there edrx
  • Q: Oh, emacs reader can open epubs? I use nov.el for that, and it has trouble sometimes with complex epubs.

    • yup it can :D
    • it comes for free with mupdf
    • I mean if mupdf supports it, emacs reader will to
    • the only thing we don't support is djvu, but i have plans of making it supported in upstream mupdf
    • Oh cool, I didn't know about the other formats!
  • https://codeberg.org/divyaranjan/emacs-reader

  • Love this arch diagram step through
  • The linked blog for dynamic modules: https://www.phimulambda.org/blog/emacs-dynamic-module.html
  • You can set the maximum ram usage in Okular settings. If it is configured to load the entire document scrolling is instant even in large documents [not the author: this is only relevant to hugely beefy machines, surely? And it sounds like the Reader does it naturally, if you will]

  • 16GB of RAM goes a long way even for huge documents like technical datasheets or photo pdfs. Just be careful not to open too many of them while also having a lot of browser tabs.

  • Looks like Reader simply displays the old page until the new one is ready. While better than showing a white page it\'s not instant like Okular
  • A: Emacs reader doesn\'t require 16GB RAM to do that same, because it doesn\'t need to cache everything at once, unlike Okular. The talk shows a video demonstration of it getting stuck when not configured to use maximum ram (in which it caches the pages.).
  • I didn't realize pdf-tools was using so much RAM.
    • I definitely noticed, lol
    • The memory graphs are already a very compelling point for emacs reader. I will definitely give this a try.
    • If you are dealing with large PDFs you really feel the difference
    • For doc-view yes. For pdf-tools I have noticed it, but very rarely.
    • But yeah, it is interesting how much memory it's taking
    • I will definitely try this
    • pdf tool caches the pages. once you are moving too fast, it is almost as slow as doc view, because it doesn't have it cached anymore
  • This look promising! Dealing with PDFs is an important part of the things I do with Emacs (academic work). Certainly that Pdf-tools was already a much needed improvement to Doc-View - for example, not being able to select text was quite an issue. So when you get text selection and annotations working, I will be certainly looking forward your library.
  • Great talk about emacs-reader. Looking forward to using it.
  • Thank you for making this! :-)
  • Really tasteful typography on your slides divya
    • A: Thank you it's all in Org :)
  • I like pdf-tools, but I'm open to trying something new.
  • pdf-tools mostly works well, but these points are very much valid
  • also have been fairly satisfied with pdf-tools and avoided installation pain because the package manager of choice makes it easy, intrigued by reader and curious to try regardless though
  • I've been using emacs-reader for some time - I have packaged it for nix, although I haven't submitted it to nixpkgs in the hope that emacs-reader will one day end up in ELPA/MELPA. https://git.sr.ht/~johnhamelink/nix/tree/master/item/home/modules/emacs/src/epkgs/reader.nix
  • Love this arch diagram step through
  • Org-noter integration would be very good
    • I am also a big org-noter user
    • A: I used pdf-tools for several years, indeed I love org-noter as well. certainly a priority integration for us.
  • I like the use of diagrams in this talk. It makes it easier to understand.
    • I just realized the thread pool has a bunch of "threads" in it!
    • That's what those were! I was thinking, "What does 'S' stand for?"
  • very impressive work, I'll have to try this later
  • Very nice. I'm definitely giving this a try.
  • On my OS: MuPDF version 1.19.0 too old. Require ≥ 1.26.0.
    • yueah that is a major problem
  • Already looks very promising and the upcoming features are likely to make me switch completely from pdf-tools.
  • Exciting project!
  • very cool!
  • Playing with mupdf standalone, epubs look very nice!
  • very nice talk.
  • great talk divya! great talk indeed!
  • Great talk, thank you!
  • I'm sold on Emacs Reader. (looking forward to org-noter support)
  • fantastic talk! many claps! =)
  • Super cool, emacs-reader on the list to try, we'll see if I can get it installed before the ELPA release

  • I'm super excited about this 😊

  • great presentation 👏
  • Great talk about emacs-reader. Looking forward to using it.

Transcript

[00:00:00.720] An introduction to the Emacs reader
Hello EmacsConf! Today I'm here to introduce you to the Emacs Reader. It is a general-purpose document viewer that lives inside our beloved Emacs. It tries to prioritize memory and performance efficiency as much as possible even when you're using a lower-end hardware. And, most importantly, it tries to do things in an Emacs manner. That is, it tries to integrate with existing packages as much as possible instead of reinventing the wheel. And architecturally, it tries to take the advantage of dynamic or native modules which were introduced back in 2015 into Emacs.
[00:00:44.760] Yet another document viewer in Emacs?
You would ask, why exactly do we need another document viewer in Emacs? Don't we already have the built-in DocView and the notorious pdf-tools? Well, the built-in DocView has unusable latency, and I'm going to show you this later when I compare this with Emacs Reader. The famous pdf-tools has actually multiple issues. One, it is extremely memory-hungry regardless of what kind of PDFs you're reading. And, well, it can only read PDFs. Poppler, the library which pdf-tools uses, is actually sub-optimal, especially relative to MuPDF, which is what Emacs Reader is based on. pdf-tools is also extremely painful to install. If you've ever installed pdf-tools, you know that it has a bunch of dependencies, including a server that is supposedly packaged. across package managers, system package managers. It's extremely difficult to install and painful to install. And of course, pdf-tools since the last couple of years has not been maintained as much. There's huge PRs that have been unnoticed and unmerged.
[00:02:05.760] Architecture of Emacs Reader
Architecturally, Emacs Reader takes a distance from both DocView and pdf-tools. So how DocView works is that it basically wraps around a tool called mutool. mutool is actually a command line tool from MuPDF itself. It relies on mutool and a bunch of other similar command line tools, and basically makes process calls from Elisp to the CLI tools. That's how DocView works, and that's why it sort of has latency issues because that's the best you can do by literally calling CLI tools and outputting the images into Emacs. How pdf-tools works is that it tries to have a server-client model. So the client is Emacs and the server is basically something they call epdfinfo. It's supposed to render the images using Poppler and then send the images to Emacs which then tries to display. I think the server client model is terrible. One, for latency purposes, and two, it makes things unnecessarily more complicated. Here is where we come and introduce dynamic modules. So Emacs Reader is based on the concept of dynamic modules which I'm going to talk about in a bit. But how it works is that we have C modules. So we have the emacs-module.h, that's the dynamic module header which every dynamic module package must have. And then we have our C files. And these C files essentially define functions that are going to be used in Emacs but in C. We then load these C modules using simple (require ...) in our Elisp modules. And then whenever we call something in the Emacs runtime, say I'm going to open PDF files in (find-file) or (reader-open-doc), what it does is that it tries to use one of the functions that is wrapped in Elisp, but actually tries to call a function in C. And then the C module is actually going to make calls to the MuPDF. Here the MuPDF system package, this is actually a system package that is dynamically linked to the C modules. So we're basically just using it as a shared library. So you have the fz_load_page, for example, it's a MuPDF function that we're going to be using in the C modules. So it's going to make a shared dynamic call to MuPDF and then render the page and then show this to Emacs. This pipeline, I argue, is much better and leaner and efficient than a server-client model. One, because we don't really need the server-client model. So back when Politza first introduced pdf-tools, that was like 10 years ago in 2015, the concept of dynamic modules were not integrated into Emacs. I think they came around like one or two years late, 2017. So that's the best he could go with. We don't really have to, today, because, since we can use MuPDF as a shared library which can render things in real-time and just give us the rendered images which we can then display, there's no reason for a server to do things for us. So that's the main architectural difference that Emacs Reader introduces compared to pdf-tools and DocView.
[00:06:00.280] A word on dynamic modules
What exactly are dynamic modules? Well, I can't really give you a full-fledged explanation, but essentially dynamic modules let you evaluate native compiled code in other languages like C, C++, Rust that behaves like regular Emacs Lisp. So when our Emacs C modules, the render-core.c or render-theme.c, when all of these are compiled, and they're called from the Elisp modules. They behave like Elisp even though they're as fast as a C function because they're compiled C code. But you essentially call them just like Elisp functions. You can find them using C-h f and so on. So you can call any function from any language that supports the C ABI, which is virtually everything, without leaving Emacs and without losing any performance. This is extremely helpful when you want to use existing libraries like MuPDF or any other cryptographic library that is written in C and you don't want to rewrite the entire thing in Elisp, but you can just use it as a native library. You can read more on how dynamic modules work and how you can write one in this blog. This is something that I wrote myself just after starting this package and it will give you a bit more guidance on how to use dynamic modules more efficiently. I think dynamic modules should be used more and more in Emacs and I think their advantages have not been exploited as much as they should.
[00:07:39.560] Features of Emacs Reader
Now we're going to talk a bit about the core features of Emacs Reader. And these are the following features that we're going to talk about. And finally, to talk about some challenges that we faced.
[00:07:56.760] Memory efficiency
First is memory efficiency. I already told you that Emacs Reader's first priority is to make sure that we are not slow and we are not taking a bunch of memory unnecessarily. So here's a graph of the heap memory size as it grows for DocView. So this is again in emacs -Q. So this is a fresh Emacs session with just DocView. It grows up to 900MB for a very small PDF that is a LaTeX PDF. No scanned huge PDF. It's a 2MB PDF. But when I scrolled from the beginning of the PDF to the end, it went up to 900MB. That's the memory heap size. Does pdf-tools make this any better? It actually doesn't. So, pdf-tools pretty much does the same thing. if you look at it here just so if you're going to ask me are they two different graphs, or are you just showing me the same graph, they're actually two different graphs, because if you look at the DocView graph it uses cairo and it uses librsvg because docview by default converts the images into SVG. The rendered images are SVGs. pdf-tools doesn't, so you don't see any librsvg calls here or anything So this is pdf-tools and it basically takes up the same amount of memory, 900MB, and exactly the same operation, exactly the same PDF, exactly scrolling from first to the last. Where do we stand? Well, we actually do much better. So let me zoom in this. So if you see, we stand within at a peak of 72MB. Exactly the same PDF, exactly the same operation from the beginning to the end, around 285 pages scrolled. We take much less than 80 MB. And actually, to be very frank, the only memory that we're storing in Emacs, oh, sorry, not in Emacs, in the MuPDF heap is just about 30 MB. It's this dark red one. That's the cache that we're storing. That's the memory that we're interacting with in real time. This is stuff that Emacs adds on top of it and a bit of libmupdf. So you can see, in terms of memory, we're saving... we're literally down, what, a fraction of 10! This was a priority for us since the beginning, because when I was starting to use pdf-tools, it was unusable for me because I was on a lower-end hardware and I thought it should not be really that difficult for a document reader to not take a gigabyte of memory. It really shouldn't because you're not really doing that much, you're just displaying images. So that's how efficient we are in terms of memory. Let's see how efficient we are in terms of speed.
[00:11:18.720] Performance and speed
So Emacs Reader is actually as fast as pdf-tools, and it is actually way more faster than DocView. In some cases, it actually beats existing standalone document readers and browsers. So let's actually see this in action. So here we are with a few emacs -Q sessions. I'm using emacs -Q so as to give you... that this is actually as less overhead possible. So we have first DocView. All of these tests are going to be done on the same PDF. It's the documentation manual from MuPDF. So if I scroll, this is fine. I'm just pressing n and it seems to work fine. If I press and hold n, I have pressed n and I'm holding. And Emacs is stuck. And it's going to stay stuck because it's making calls to the CLI tool that I said, mutool. And after it's done getting stuck, it is going to get back. As you can see, if you go back, you're able to go back fine. It does not get stuck because what Emacs does is it basically calls mutool, like fetches a bunch of pages, essentially all the pages that you asked for it, and it puts them into the memory. And that's it. It puts them into the memory and then scrolls through it. So going back, you will most likely not have any stuck issues. Sometimes you do because some images do get GC'd. But that's the idea. Whenever there's no image in memory, it gets stuck. And it gets stuck good. That's DocView. pdf-tools is actually not problematic here. pdf-tools is extremely efficient and extremely fast. So we can go through the pages without any issues. We can zoom. The zoom did get stuck a bit, but that's relatively fine. Emacs Reader is exactly as fast as pdf-tools here. So this is pdf-view, this is Emacs Reader. Let's scroll through the pages. As you can see, nothing is getting stuck because we're not really waiting for any tool to send us any images. We just have a little cache and we're scrolling through them and rendering images in real time. Zooming also works fine. So, with regards to this, we're in parity with pdf-tools.
[00:14:23.680] Scanned PDFs
Now, where pdf-tools and actually a lot of readers have issues is when they're dealing with scanned PDF. So, we have this PDF which is notorious for being really difficult to render because this is entirely built with scanned images. This is the kind of PDF that you get from Internet Archive. This is essentially someone took photos of the book in a camera and literally turned them into a PDF. Emacs Reader actually does not have any issues rendering this. As you can see, it renders it smoothly and fine without any halts. I can change Emacs even while it's doing so, and it does not have any issues. pdf-tools are the same. PDF also does not have any issues. Sorry. Click pdf-view-mode. pdf-view (pdf-tools) is a bit slower but does not have any issues. It works. Here, actually, pdf-tools and Emacs Reader are more efficient than even browsers. So, if I try to open the same page in a browser, I'm trying to scroll. And after I've scrolled and I leave, scrolling is going to load for a bunch of seconds to give me the page. It's more than five seconds, as you can see, and this is actually totally not usable. If you're going to read this book, an electromagnetics book, you're going to have a terrible time reading this in a browser, which is supposed to be the fastest thing alive. You sort of have the same experience in Okular. So this is Okular. If I try to scroll through this, it will do the same thing. And while it is better than the browser, it still takes a while and it still has, like, if you zoom, you're going to have a bit of a delay. You don't really face that in Emacs Reader. We zoom in and out just fine. And even with using mouse, you can zoom in and out just fine. So this is how Emacs Reader performs in terms of speed with these other tools. Now we will go back to the original presentation.
[00:17:08.960] System-level multi-threading
Now, how exactly is Emacs Reader able to do a lot of this? I wish I could sort of spend an entire session just talking about this, but I can't. So I'm just going to make this short. When you load Emacs Reader, in the standard output, it's going to say this: that eight threads have been initialized. Now, what we did with Emacs here is that we enabled system-level multithreading. Now, Emacs is not multithreaded. We all know that notoriously. It is single-threaded. But we don't really need Emacs to be multithreaded, though. Emacs does not need to be multithreaded. What needs to be multithreaded is the rendering part because that's the most expensive part. In Emacs, we're only just displaying images. Emacs itself does not have a PDF engine that is rendering stuff. MuPDF is supposed to take care of that. So if I can do multithreading in the rendering pipeline, that is when I'm rendering pages instead of displaying them, that's fine for me because the rendering part most of the time, especially in scanned PDFs, is the most expensive part. So if you look at this graph, we have two parts here. We have the display pipeline and we have the rendering pipeline. In the display pipeline, we have just the Emacs session which has the reader loaded and that's the main thread. Then we have the rendering pipeline which has the MuPDF system package dynamically linked. So when you load Emacs Reader, we initialize a thread pool with eight threads. Now what you do is let's say we are at page 50. At page 50, the Emacs Reader maintains a cache. It's like a stack of pages that we keep in memory all the time. This cache is entirely outside of Emacs. It is not inside Emacs environment. It is in the C memory heap, in the MuPDF memory heap that is outside of Emacs environment. It does not make any calls to Emacs anything. It does not have a single Elisp line. So this cache is stored outside. Now when I want to retrieve anything from this cache, let's say, so I have cached up until 55, from 45 to 55. So what happens is that when you're at page 50, you always have a cache that's n + 5 and n - 5. So you have cache of 5 pages forward and 5 pages backward. But let's say I want to go to page 56. So I will ask an Emacs render page 56. And I'm not going to ask it to MuPDF directly. I'm going to ask it to the thread pool that do this job. And thread pool is going to assign one thread to it. Let's say the thread 1 which is going to render page 56. So this thread is going to make calls to MuPDF through our code dynamic module. And MuPDF after rendering it is going to store it in the cache. So we're going to add another 56 page to this. Now, while this is happening, Emacs Reader does not, like Emacs itself, the session is not going to be stuck because we just made a call to the thread. We just asked the thread. So like this, this call, like it's done. So you just assign something to a thread and then this is fine. Like, you're not waiting for the thread to complete or anything. Emacs is not waiting for the thread to complete. The dynamic module or the C side might wait to complete but that is entirely different from the Emacs session. So Emacs viewer can continue to display the page 50 while the rendering pipeline is still rendering the 56th page. And when Emacs asks to display page 56, it's going to ask it to a thread pool. Then thread pool is going to assign another thread, let's say this one, to retrieve page 56 from the memory cache. And then the 56 page is going to be sent to the Emacs to be displayed. Again, the retrieval part is entirely independent of Emacs. Emacs does not have to wait for it. Emacs only needs to wait to display it. So, the displaying part and the rendering pipeline are entirely asynchronous, so to speak. And in the diagram, if you see, all the arrows that are magenta in color, they are native to the Emacs runtime. That is, they are single-threaded. They are connected to Emacs. And all the arrows that are red in color, they are totally asynchronous. They can be multi-threaded if you want. They are multi-threaded by default because they interact only with the MuPDF shared library and the C heap. They do not touch anything in the Emacs runtime. This is how we're able to switch quickly between these huge scanned PDFs that have huge images in each of their pages because we don't really wait for each page to be rendered. And Emacs does not wait for that. So that's another architectural feature of Emacs Reader that we are system-level multithreaded. Now Emacs viewer also supports almost all document formats. It supports PDF, EPUB, MOBI, XPS, CPZ comics, and it even supports other non-ebook formats like document format, so you can open LibreOffice documents in it, and even stuff like PPT and Excel in it, even though they're not going to be supported in a as nice manner. And we can do that because MuPDF does this. MuPDF has support for all of this and it treats them just as it treats PDF. Nothing special. The only thing that we don't support right now is DejaVu, so that is not supported right now. I'm going to work on making it supported at the upstream MuPDF. That's going to take a long time, but it's in the plans.
[00:23:44.240] Native Emacs integrations
Now with Emacs Reader, we also integrate with existing Emacs packages as much as possible. So bookmarks, C-x r b, you can do it natively. So you can save a page as a bookmark just as you save anything else in Emacs as a bookmark. There's also saveplace integration. So you can scroll a PDF, close it, and then come back to it at the same page that you saved it at. Sorry, that you closed it at. And it's going to work just out of the box because of the saveplace package in Emacs that is built in. We also have imenu integration for table of contents. So if you see this, this is imenu and you can scroll through the contents just like you scroll through any imenu. You can also do it in the menu bar by clicking. It works just as nice. We also have something like the outline mode that pdf-tools has. So if you press O in a document, it's going to give you this outline. And these are buttons that are clickable. You can click them. You can press Enter at them. And this is the menu bar item that I was looking at. If you click here, index, it's going to show you the exact same thing but in a different interface.
[00:25:10.340] (Naive) dark mode
We also have a naive dark mode, which is not really as nice as we would like it to be, and dark mode fanatics I'm sure will have issues with it, but we're going to improve it in time. For now, this is what we have. And it can be enabled per document, so you can have one, like, one document that is in dark mode, but another one that is not. That is nice to have. Eventually we're going to work on more themes. You should be able to actually integrate it with Emacs themes as much as possible. You can make it default so that it inherits colors from the Emacs theme. That is one of the things that we also have planned.
[00:26:01.140] Challenges and further improvements
We did face a bunch of challenges while trying to implement these features. One of the initial challenges was that SVGs were actually a bad idea. They're huge, especially in scanned PDFs, and they make things much slower. So we chose to actually have PPMs, which is the simplest image format ever possible. Now, it was also very difficult to make reader-mode be window-specific. So, you know, while you're scrolling the same document in one window, the other window with the same document should not change. We should be able to have multiple pages in different windows of the same document. That was very difficult because as I told you about the cache, the cache works in an idiosyncratic manner and we needed to make it so that each window will have its own cache instead of having a global cache for each file. That took some rewrite. And now, because we needed to do this sort of multithreading, system-level multithreading, we needed to use a specific package of MuPDF that had a bug for this which got fixed. And that's 1.26.0. Because we did that, a lot of the GNU/Linux distributions did not really have this latest package. So we had to actually package it in-tree. as a git sub-module. That was a horror! But eventually... now I think most GNU/Linux distributions already have this [version]. The upcoming features that we have planned are the first one is that we need to rewrite the display mechanism entirely from scratch to use a tiled rendering approach. So right now we just take an image and display it inside an Emacs buffer just like that. But it will be changed so that the image will be displayed in the tiled manner so there will be multiple tiles but it'll be pixel perfect so you won't really see a difference. The reason to do this is to implement features for text selection, actually. So we can't really do text selection without running into a bunch of memory and other issues latency issues if we don't do tiling. So we need to do those two things, they are at the highest priority right now. And then, once we're done with that, we're going to support annotations, highlighting, everything that you're used to in pdf-tools and org-noter. And once we're done with that, we're going to also integrate with AucTeX and SyncTeX. Because right now, when a PDF gets updated, especially a LaTeX PDF, since there is no SyncTeX integration, it can't really do it nicely and it sometimes even crashes Emacs. So that's something that we will be planning to implement.
[00:29:14.272] What Emacs can learn?
Now, from this experiment, what exactly can Emacs, the Emacs core devs and others who are building packages can learn? Well, the first thing is that all of this should not be really this difficult because all we're asking from Emacs is to display images in real-time and update them in real-time. That should not be that difficult of a thing to do, but apparently it is. And that's why Emacs's graphical interface needs to be more modular, more composable, and flexible for real-time graphics. If it is supposed to have things like, again, a document reader, something like a video editor, and something like that, Emacs's graphical interface needs to grow and be more mature. One of the things that's stopping it from doing that is actually Emacs's overlay functionality. So right now, the way we display an image in a buffer is using an overlay, actually multiple overlays. Overlays are static in the sense that if I attach to one image to one overlay, I need to have an entirely different image updated for that overlay. So I need to create another different image, change it in the memory, and then display it to update it. I can't change the image data in real time of the overlay. And that is a big issue. I've actually made an emacs-devel mailing list thread about it. I talked to Eli about it as well. And he said there's a possibility that this can be changed, but it's going to take a certain amount of rewrite. There's also issues with Emacs GC. Emacs GC sometimes leaks memory when you update images too quickly. That is, when you have a bunch of images that are getting churned out too quickly, Emacs GC starts leaking and it just goes up to a huge number of gigabytes in RAM. That's also a huge problem. The dynamic module API, the emacs-module.h header, needs to have more helpers. It's really bare bones, and I like that it is bare bones so that other languages can use it, but at the same time, I think it'll be really good if we can have some helpers that can do better memory interaction, like strings and so on, which we also faced some issues with. Emacs's fractional scaling system seems to be broken across different toolkits. We have bug reports that say in pgtk in Wayland, something seems to render differently because they have fractional scaling enabled. So that's something that I think Emacs, overall, I think Emacs needs to focus on improving the graphical interface pipeline to be a much more mature one.
[00:32:32.300] Contributing to the development
And finally, how can you contribute to the development of Emacs Reader? Well, we are on Codeberg. We are not on GitHub, sorry. You can go there, you can look through the issues and send us a PR if you're interested. The next major release is going to go to GNU ELPA. Finally, we are not yet at GNU ELPA, so you can't really do M-x package-install and install our package. you would need to install it through use-package :vc. And since we're going to go to GNU ELPA, we request you to assign your copyright to Emacs because GNU ELPA is essentially part of GNU Emacs. So you would need to do copyright assignment if you make non-trivial contribution. You can join us at IRC at #phi-mu-lambda. And I also stream the development of this package bi-weekly on Sundays at PeerTube at the following channel. Feel free to join us.
[00:33:35.520] Acknowledgements
Finally, I want to thank Tushar, who has been persistently contributing to the project since 0.1.0, and I'm very, very thankful for him, for his suggestions, and for his code contributions as well. I would also like to thank Prom, who fixed a major bug in the Windows build, since I don't really use Windows anymore, so that was really nice, and for Teeoius, for fixing a pthread bug. I would also like to thank others who helped fix little things, who come to the stream to chat, who sort of see me bang my head across these C memory bugs. So thank you to all of those. And thank you finally to the viewers and to EmacsConf organizers as well. This is a splendid opportunity. Thank you.

Captioner: jay_bird

Q&A transcript (unedited)

The first question, and I'm reading from the etherpad here, is there a scope for integrating the C library to Emacs itself with MuPDF becoming an optional dependency? Right, so integrating the C library into Emacs itself is like having MuPDF inside Emacs source tree. I don't think Emacs devs would be inclined to do that, and I don't think we really need it. Um, I think as it is, uh, Emacs with doc view needs new tool, which is something you need to install from new PDF anyways. So, um, I think it is almost expected that you install new PDF from system package manager. Um, and I think that as it is, is better because we don't really need to have a whole PDF engine inside Emacs. Um, Next question also from the pad, the dynamic module some great, and it's amazing that they've been there since 2017. Why do you think they've been slowly so slow to get adopted? Is there a prior art with them? Right? That's a good question. Actually, I think 1 of the reasons is that. Most of the time, I think people love Emacs because they can do so much with Elisp. I think certainly there is a bias towards trying to do things with Elisp. I think there's only a sort of specific class of problems that you can solve with dynamic modules, such as this, where you want to use a native library to do something in a faster, better way. I use that quite a lot. There's of course libvterm, which uses a dynamic module and it does it really well. And I think there's another one, a plotting library or package in Emacs that was using something from Python. So, dynamic modules are good, but I think they don't really come to the surface level packages, your day-to-day packages, because most of the day-to-day packages that we use in Emacs can be done with Elisp. So, unless you really need something system-level efficient, Most of the time, you don't want to write C or C++ or something. But there is actually a really nice Rust crate for native modules, and there's a really nice Haskell package. So there's actually really good support for multiple languages. So it's there, it's just not used as much. Yeah. So what you're saying is if Elisp weren't so simple to learn and easy to use and so fully featured, we'd get a lot more mileage out of this super cool dynamic module feature. Yeah. Cool I'll take I'll bring in the next question. How how? How difficult is our PDF tools to install? The questioner is installing it using the built-in package manager looking at the Emacs reader installation instructions It doesn't necessarily cover how how to install that easily person is not using use package or straight and Okay. Oh, and they say that you didn't catch much of this in the presentation. Okay, so you want me to skip that or should I answer? It's your choice. If you would like to say more. Yeah, I think just as a thing, the reason I said PDF tools is difficult is PDF tools has a huge list of dependencies. The only thing Emacs Vita depends on is new PDF, nothing else. There's a single dependency. PDF tools depends on a lot of things and they have their own server, which is packaged as a system package, which you don't really find everywhere. And there's like systems, the new Linux systems where the package is very difficult to build because of so many dependencies. So my goal was to sort of reduce the number of dependencies. And then right now it's very, it's sort of a key to install Emacs Reader. Once we go to GNU Elpa, it's just going to be Emacs package install, just that. Right now you have to do package VC a bit. Boy, we get spoiled as Emacs users. Everything just gets so easy for us. It's like an IDE for our whole machine. What tools did you use to measure the memory usage between the three packages? Yeah, that's a good question. So during my development, I used mostly for debugging purposes Valgrind. So Valgrind is a a set of suite of debugging tools. And one of the tools that it has is Massive. It's a heap analyzer, heap profiler. So Valgrind plus Massive, and then there's a KDE package called Massive Visualizer. So I first get the Massive output using Valgrind, and then put that output into Massive Visualizer. That gives me the grasp. Are there Emacs integrations for those components at all? Does Valgrind have them? I don't think so. I don't think so. There's, yeah, there's I think a few packages which do something with Massive, but I don't think like they're maintained. Yeah. Gotcha. Cool. Awesome opportunity there for someone spunky. How is conversion between Elisp and foreign language types? For example, when interfacing with the C++ library that makes heavy use of the C++ object system and templates. Yeah, that's a good question. So the go-to answer is the blog post that I wrote, which is an extensive explanation on how the internals of dynamic modules work. The short answer is that basically what happens is anything that is compatible with C-ABI When you compile that language code, so when I compile C++ code, I would have a particular API. So we have a dynamic module API, which is the emacs-module.h, the file that I showed. You have to put that into your C++ package program and then link it to... So emacs-module.h is basically going to... like use things in your Emacs installation to interact with this C++ language. So it's basically FFI. And what this gives you is that you can have things in C++. So let's say you want to do multi-threading the way I did system level multi-threading. You can have C++ be responsible for the multi-threading. but you want the output of the multithreading to go into Emacs. So then you write like a piece of C++ function, which is going to be a dynamic module function. A dynamic module function is written in the language that you target, that is C++ or C or Rust. And then that is going to be compiled into a share library like SO. shared object, and then that shared object is going to be loaded into Emacs system using require. So when I do require render core in one of the slides that I showed, I'm basically loading that shared object, and that shared object already has the compiled dynamic module functions and so on. But my blog will explain that better. Gotcha. I thought that was pretty clear. I'm looking forward to seeing that blog post and understanding what I glossed over trying to understand from that explanation. That was great. Can one look at PDF metadata with Emacs Reader? Can you do annotations? Does it understand forms? Can it handle encrypted PDFs? In other words, I think reading between the lines, wow, this is awesome. Is there anything I can't do? You're right. So Emacs Reader will be able to do all of those things. It can do annotations. It will be able to do forms. And we have an issue open for interpret PDFs. The thing is, right now we are struggling with making Emacs Reader be very efficient in terms of highlighting and text selection because of the challenges that I mentioned in the slides, so it will be able to do all that. Once we tackle the basic features down in an efficient manner. Gotcha. Um. Comment or questioner says, I installed Emacs Reader already as promised. Great job. How can I associate ODT files to open with Emacs Reader? You don't really need to do anything. You should be just able to do find file, Control X, Control F, and open. And it should open with Emacs Reader because we have an auto mode list, a list that takes an ODT file and opens it with reader mode. So you should just be able to do find file. If you're not able to do that, you should open Embug report. And I'll just mention we've got about 10 minutes left of our live Q&A, but if you're watching the stream, it's possible that we'll just keep going. The questions just keep coming, which I just love that. So feel free to join the BBB link that should have shown in the IRC chat. Jump in and we can take questions as long as Divya has steam for that. If a PDF file is open in Emacs Reader and I reintegrate the PDF with some changes, does the Emacs Reader refresh the PDF on its own or do I reload it? Right, that's also a really good question. So one answer is that it depends on how you change the PDF. So for example, if I just replaced the PDF with something else of the same name, Emacs will update it immediately. If you have auto revert mode on, it'll just revert the buffer and it'll reload the PDF really nicely. But if you're doing it something like LaTeX, where you're writing something in LaTeX and LaTeX is continuously producing the PDF, that needs SyncTeX integration. Because LaTeX, while it's producing the PDF, it does a lot of funky things. It does not provide a sort of renderable PDF all the time. So Emacs will sort of crash trying to basically render a PDF that is not ready yet. So we need SyncTex to sync with LaTeX to do that really nice. Okay, so we have to do some care and feeding of the exact timing if we have more of a continuous behind the curtains, so to speak. That makes a lot of sense to me. What are the challenges with integrating synctex and AucTex? This would be great to see as PDF handles as well, or PDF tools handles as well. Yeah, yeah. So, we have Synctex and Auctex planned. I don't really see any major obstacles for doing that, to be very honest. I think we can do it in a much simpler way than PDF Tools does. The only reason we haven't done it yet is because, again, we have more important highlighting and text selection and those features planned, but it's anticipated. Yeah. All right. This next question I love your presentation. Will you be giving another talk on the architecture you went over a deep dive on? That would be awesome. I'm not sure if an EmacsConf talk will be appropriate for this, but I do stream bi-weekly. So you're always welcome to come on my stream and ask, and I would be very happy to go deep into this. I'm looking forward to catching that myself. Thank you for the shout. Is there search functionality, something like isearch and occur? Yeah, we don't really have it, but this is the most immediate feature after we have text selection. So once we have text selection, once we're able to select the text, then we can have iSearch so that it can highlight the text. Yeah. Um, all right. And then, um, there's, I'm just gonna, I'll read out this question and then I have to do a little bookkeeping on the pad. Um, does the dynamic module, uh, prevent customization that Emacs usually provides advice, hooks, et cetera, or does everything just kind of No, if you have a dynamic module, it doesn't limit you into doing anything. You can do everything on the Elisp side that you want, and you only take care of certain things on the dynamic module side. If you're asking whether you can do advices, hooks, and all of that on the dynamic module itself, from the dynamic module itself, that's a bit tricky because something like Calling a macro or doing macros and dynamic modules is not really that nice You have to pretty much manually expand the macro yourself in the dynamic module so if you want to do it from the dynamic module, there's not much support right now, but you can do everything on the elisp side without touching the dynamic module. Got it So those are the questions that I see. I'm just going to take a quick peek, but let me invite you if you want to. We've got just about 5 minutes left and I will get carried away sometimes and fail to make this invitation before we cut away live, especially if we do keep going a bit. that you have live onto the stream. Of course, you don't have to do that. You said a lot in your presentation. No, I think mostly that's fine. I'm just really happy that people are interested in the package, and I would be glad to have contributors and viewers or anything. That would be nice. Awesome. So here comes one more question, or actually a couple more questions coming in. Following up on dynamic modules, do you usually create an Elisp shim from foreign function interface and then use them with Elisp? Yeah, so basically how you do is you write, let's say I have a C function that I've written in the dynamic module. It's a dynamic module function. And then when I'm trying to call the dynamic module function, most of the time, I don't call it like that. I wrap it inside a proper Elisp function and then call that Elisp function. So that's how I think it's better to do that because You can take care of certain cases on when you want the dynamic module function to be called. Maybe sometimes you don't want the dynamic module function to be called immediately. So it's better to wrap it. Yeah. Okay. So timing issues. Yeah. For the purposes of managing timing issues, that elisp shim is preferred. Yeah. Makes sense. Um. Uh, so question question here is searching for the person is searching for a roadmap. Is that already available as a feature? Searching is on the roadmap. It is not available yet as a feature, but it's on priority. I think you may have may have touched on that. Sorry. All right. Those are the questions that I see. We've got just a couple of minutes. I'm not sure if you have more you wanted to say, but I have to say how much I appreciate your talk, especially you jumping in live with us and just taking everything on the fly. I think this is a big part of what adds the energy, you in particular, just really dynamic speaker. Thank you. Thank you. Thank you. I enjoyed it as well. A person is, and I think this may have been touched on already, but let's maybe get into it more specifically. We've said that search is kind of a next up type of feature as things, as the current iteration stabilizes. Question was, you know, occur like, how would you? Totally. There will be occur searches. There will be isearch enabled, isearch. used to with PDF tools, we would be like parity with the features, all the features that you're used to with PDF tools. Um, so, uh, certainly occur anything that is important in Emacs with text and that can be done with PDFs. We really want to do that because, um, I want the package to be as knitted into Emacs ecosystem as possible. Okay. We'll see if we can get in this last question here. Do you have a timing expectation for ELPA? Uh, yeah, next major release essentially. So next major release is most likely going to be within a month or two. So once we have the next major release, we're going to be. Uh, timing couldn't be more perfect. Maybe this is a good, good point to break. We'll be cutting away to the next talk in just a couple of minutes. So let me say one more time how much on behalf of all the attendees and all the volunteers and all everybody, um, how much we appreciate your talks and, uh, your awesome contribution to the Emacs world. Thanks, Corwin.

Questions or comments? Please e-mail divya@subvertising.org

Back to the talks Previous by track: Some problems of modernizing Emacs Next by track: Weightlifting tracking with Emacs on Android Track: General