Improving compiler diagnostics with Overlays

Jeff Trull (he/him) - Pronunciation: rhymes with "hull" and "dull", IRC: jaafar, @jaafar@hachyderm.io

Format: 21-min talk; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings

Talk

00:00.000 Introduction 00:33.560 Overlays and what they can do 02:02.500 Simple overlay example - creating an overlay 02:35.700 Adding properties 03:10.940 Deleting an overlay 03:24.660 Setting fonts the right way 03:59.540 More properties 04:12.580 Visibility 04:49.780 Adding text 05:27.820 Custom properties 05:45.380 Notes on properties 06:36.100 Improving C++ compiler output 08:17.680 The problem with C++ error messages 08:30.240 Many standard class templates have default arguments 08:47.520 Some types are aliases for longer things, too 09:20.960 Reporting type information accurately means long lines 10:18.240 Emacs can help - Treat C++ type names as just another kind of balanced expression 11:49.320 Add overlays to improve readability 12:22.400 Create a minor mode that runs during compilation 12:59.500 Parsing types as balanced expressions 14:16.100 Indent and fill with overlays - Use ancient "pretty printing" algorithms" 14:52.260 Overlays can mimic line breaks and indentation 15:14.520 Hiding details - Marking depths with overlays 17:12.660 Hiding to a target depth 18:04.900 Demo 20:10.220 Conclusion

Duration: 20:57 minutes

Q&A

Listen to just the audio:
Duration: 11:48 minutes

Description

Overlays are a feature of Emacs that allow changing the appearance of text while preserving its contents. They play a prominent role in packages like org-mode, which uses them to hide or reveal custom properties and display inline images, and magit, which uses them to highlight diffs.

The presenter will give a introduction to the features of overlays, demonstrating how to:

  • Create and use overlays in Emacs Lisp code
  • Query locations in an existing buffer to find out what overlays are present.

He will then demonstrate a new compilation minor mode for improving the readability of error messages, using overlays to flexibly reformat portions of the compiler output under user control.

Discussion

Questions and answers

  • Q: How did you draw the underbraces and overbraces?
  • Q: You've got a nice sounding keyboard. What kind is it?
    • A: Sorry about that. It's an ErgoDox EZ
  • Q: Do you find that the "invasive" reformatting interferes with navigation?
    • A: A bit. You can't move your cursor into the not-real buffer text (indentation). But the original text is still visible, so that works fine.
  • Q: Can you show us the keybindings of your minor map for editing overlays?
    • A: It's C-c - and C-c + but you can change it.
  • Q:Your examples were with c++, have you experimented with any other languages? Oh, thanks for the interesting talk by the way!
    • A: Other languages don't have the same unpleasant behavior :) I say this as a long time fan of C++. But it should be possible!
  • Q: Would it be possible to include overlays in the source file itself. There are some language modes (Rust, for instance) that do this.
    • A: [someone else] Sounds like enriched-mode. [Jeff] I'm not sure what this question means; it's the error messages that are the big issue
  • Q: What are your plans for tspew in the future?
    • A: Better future-proofing and more options for formatting
  • Q: What is your repository link https://github.com/jefftrull ?
  • Q: What IDEs do C++ programmers use?  If not emacs?  How do they deal with these error messages?
    • A: VSCode is quite popular, as well as CLion and also XCode. I think they simply display the error messages as is.
  • Q: Have you tried to use treesitter to parse the output?

    • A: I think it wants to parse an entire buffer. If I could write a grammar for a portion of the text and point it at that, that would be great. I could have maybe made a tsit grammar if I could have applied it to a small bit of the output
      • (not the speaker): ISTM that since you set up the syntax tables to recognize <> as parens/whatever that Emacs should be able to parse the effective lists as sexps, but I'm not an expert on that
    • (not the speaker) ye it's true, often you want to select what the root source node type would be an AFAIK you cannot change it
  • Q: "org-mode, which uses them to hide or reveal custom properties" I thought they used buffer-invisibility-spec or something like that

    • A: yes that's part of it
      • do you know of they also use text properties. org code is usually pretty messy, so I don't know much about it
      • A: org has been moving toward text properties but I think there is a flag that will use overlays instead (!). There's some controversy about performance that I will touch on in a bit
        • Interesting, does that initiative predate the recent performance improvements by Stefan Monnier?
          • A: I think so. They were known to be a problem for some time, but then that happened?
  • Q: Did you use, e.g. syntax-ppss to parse the depth using the syntax table?
    • A: No I tried to though... maybe there's a better way

Notes and discussion

  • The org file containing the presentation is here: https://media.emacsconf.org/2023/emacsconf-2023-overlay--improving-compiler-diagnostics-with-overlays--jeff-trull.org
  • Tony Aldon's Reddit post on visibility https://www.reddit.com/r/emacs/comments/t1r2wq/have_you_ever_wondered_how_orgmode_toggles_the/
  • Overlay performance (maybe) fixed https://www.reddit.com/r/emacs/comments/yg4mvt/the_noverlay_branch_was_merged_to_master_this/
  • I think I might need to change subed-waveform to use text properties instead of overlays or fix something else that I'm doing incorrectly, since the overlays get left behind when I kill text
    • A: yeah you have to track them yourself
  • Can you put the overlay object in a text property to track it?
    • A: I don't think you would mix properties and overlays in that manner. There are overlay search functions; people typically add a property that identifies them as theirs. or you can store references in a list or something
  • A: One of my reasons for doing this was frustration and people talking about how great VSCode was and I knew that Emacs was a good match for certain kinds of problems people don't even try to solve in IDEs
  • A: I actually edited this down I know it's still a lot of detail :)
  • This is really good!
  • Very impressive! And well explained. Thank you.
  • yeah try doing that in VSCode! yeah.
  • this is slick!
  • i'm not a fan of ligatures, but imho :: just begs for it
    • Same, I want to see the actual thing that'll be given to the compiler/interpreter/whatever.
  • That was great, showing how relatively easy it is to extend Emacs with features like that.
  • From the speaker: yantar92: your help was much appreciated in the weeks I spent putting this together :)

Transcript

[00:00:00.000] Introduction

Hi, I'm Jeff Trull, and today I'm going to talk to you about improving C++ compiler diagnostics using overlays and other features from Emacs. First an overview of my talk. I'm going to cover what overlays are and how you can use them in code, then I'm going to talk about C++ and why its compiler errors can be so onerous. Finally, we'll take that information and build a new minor mode using overlays and other Emacs features.

[00:00:33.560] Overlays and what they can do

First of all, overlays. What are they? They are objects consisting of a buffer range and a set of properties. That means that they cover a region in a buffer. The properties can be a certain set of special property names, in which case they can be used to cause special effects in the buffer, but they never change the underlying text. You can use them for things like hiding things. So, for example, overlays are working right now in this window. org-present, the technology I'm using for this presentation, is hiding the asterisk before every headline, as well as the things called emphasis markers; that is, those things that make things look monospaced for verbatim, or italic, or bold. The special characters we use to mark off those sections are also hidden by org-present using overlays. But those things are still in the buffer and they're still visible to code. So if I run this little snippet of code down here, it's going to go up to the headline "Overlays and what they can do," and it's going to tell us what's there in the buffer. Let's go down and run this. So according to this code, the contents of the buffer to the left of the headline is a star in a space, which means that even though we can't see that star, it's still there, because it's hidden by an overlay. And that's kind of the essence of what overlays are.

[00:02:02.500] Simple overlay example - creating an overlay

Let's do a simple overlay example. We have some text on the right here, which is a famous poem by William Carlos Williams, which has been the subject of many memes. Let's create an overlay that covers it. I'll go down here and use this snippet of code here. We'll go up to the top, and we'll mark everything between #+BEGIN_VERSE and #+END_VERSE. You can see we've created an overlay from position 74 to 224.

[00:02:35.700] Adding properties

Now we can take that overlay that we already created and add a property, in this case a face property, to change the appearance of the text. This is a poem, and it's currently using a face that is monospaced, and so it looks like a computer program, even though it's a poem. I think it would be nicer to use something with variable-width font, maybe with some serifs. So let's give that a try. Now you can see that the poem looks quite a bit different. It looks more like what we'd see in a book.

[00:03:10.940] Deleting an overlay

We can also delete overlays. So I've named this one. So we can just go down and run delete-overlay and get rid of it, and it'll go back to the appearance it had before. And there it is. It's back to normal.

[00:03:24.660] Setting fonts the right way

Now, if you're interested in changing all of the verses inside an Org Mode file to a different face or a different font family, this isn't the way you'd really do it. I'll just show you that real quick. The right way is probably to change the org-verse face, which is the face used for all of the verse blocks inside your Org Mode file. And so this is how you do it here: face-remap-add-relative. Let's give it a try. It worked!

[00:03:59.540] More properties

There are more advanced things that you can do other than just changing fonts. There's a whole long list of them in the manual, but let's talk about the ones we're going to use today.

[00:04:12.580] Visibility

You can make text invisible, just like org-present did. The simplest way is to set the invisible property to true, so here's a code snippet that will do that. What we're going to do is go and find the word "plums" inside the poem, and then we're going to make it invisible by creating an overlay that covers it, and then setting the invisible property to true. Boom! It's gone. We've eaten the plums. Visibility is a huge topic and very complicated. There are powerful mechanisms for using it. I suggest reading the manual if you'd like to know more about that.

[00:04:49.780] Adding text

Another thing we can do with properties is to add text either before or after an overlay. Since we've made the word "plums" invisible, or anything that you make invisible in the buffer, if you add text then afterwards, it looks like you've replaced the original words with new words. So let's add a property, a before-string property, to the overlay that we used before to make it seem as though we're eating cherries instead of plums. Boom! There it is. So that's how you can replace words using overlays.

[00:05:27.820] Custom properties

You can also have custom properties that you name and then use yourself. For example, you can use it to mark regions in the buffer. You can also use it to add information to regions in the buffer for your own tracking in a minor mode or something like that, which we will use.

[00:05:45.380] Notes on properties

Finally, two notes on properties. We've been talking about overlay properties, but there's also something called text properties. Text properties are attached to text in a buffer. When you copy that text, the properties come along with it. If you modify the properties, the buffer is considered modified. Org Mode makes heavy use of text properties, as we can see by running this little code snippet here, which is going to tell us the properties and the string attached to the "Some poetry" headline on the right. There's also some controversy regarding performance. It may be that text properties perform better than overlay properties, so do some research if you're going to make heavy use of them. I prefer overlays because they're just easier to use.

[00:06:36.100] Improving C++ compiler output

C++ compiler output. So my day job is C++ programmer, and although I've been an Emacser for many years, it can be a little bit of a chore dealing with errors. The error messages that come out of the compiler can be pretty hard to understand. This has often been a barrier, particularly for people who are new to C++. So let's see what that's like. I have an example which is generously supplied by Ben Deane of Intel. So let's see what it looks like when you compile a C++ program that has a difficult error in it. Okay. Okay. So you see we have a lot of fairly verbose messages. The most verbose one I think is probably here. This one here. These are pretty bad. I think there might be bigger ones. Oh, yeah. Here we go. Here's my favorite one. You can see... Let's look for specialization... Basically, this whole section of the buffer here, that is specifying the specific types that a function template was instantiated with. And it's a lot there. So if you're trying to figure out what's wrong with your program and you're looking at something like this, it can be really, really hard to understand. Okay. Back to our presentation.

[00:08:17.680] The problem with C++ error messages

So it's often this way in C++ because we compose types from other types. They can be long to begin with, but then a couple of other factors come into play.

[00:08:30.240] Many standard class templates have default arguments

First of all, we can have default template arguments. These are arguments you didn't write, but that are implicitly there and can sometimes refer to the arguments that you did write, which causes them to get a bit bigger, such as these allocator arguments here and here.

[00:08:47.520] Some types are aliases for longer things, too

Then there are type aliases. For example, std::string here expands to a type with three template arguments. So you can imagine, when we combine those two things together, our simple vector of maps from strings to ints becomes this humongous thing here, which... Let's run the comparison. Yeah.

[00:09:20.960] Reporting type information accurately means long lines

So in summary, to properly understand an error when you're a C++ programmer requires knowing the exact types that were supplied to your function. And types are built recursively, and therefore the types can-- the correct exact name for the type can just be really huge and have many levels and layers to it. So when I was trying to understand the things I'd done wrong, especially when I was a newer C++ programmer, but honestly still even recently, if I was having a really intractable problem, I would just copy the entire error message out, stick it in the scratch buffer, and then manually reformat it so I could see what it was telling me I'd actually called the function or whatever it was with, the exact type. I had to sit there and go through the whole thing. But there's a better way. Now, anyway.

[00:10:18.240] Emacs can help - Treat C++ type names as just another kind of balanced expression

So what can Emacs do to help us with this problem? First of all, if you think about a type name, it's a lot like what we call S-expressions or balanced expressions. Lisp code itself is an S-expression. It's basically things with parentheses and little atoms or symbols in it, or strings or numbers. But parenthesized balanced expressions are things that Emacs was actually built to deal with. They were... I found an old manual from 1981, and the two major modes that they recommended or that they actually documented in the manual were one, assembly language, and two, Lisp. They mentioned that there were other modes, but they didn't say anything about them. So Lisp is something with a really long history with Emacs. Balanced expressions and manipulating them and doing them efficiently is just a thing that Emacs knows how to do, and Emacs is good at it. There's just a legacy of algorithms and functions for doing it. So we take types, and we take the angle brackets in the types, and we get the symbols right. Then we can treat them as though they were balanced expressions or S-expressions, the same kind that Emacs is really good at handling.

[00:11:49.320] Add overlays to improve readability

Secondly, we can use overlays to improve the readability of errors. We can take long lines and break and indent them using before-strings, so the same thing I used to add "cherries" into the poem. We can use that to insert new lines followed by indentation and produce a much nicer-looking listing of a type. We can also use the invisible property to hide unwanted detail.

[00:12:22.400] Create a minor mode that runs during compilation

Last of all, we can create a minor mode. When we're compiling things in Emacs, we often use compilation-mode. compilation-mode allows you to install compilation filters that run when the compiler is producing output, and at that time, then, we can add our overlays. We can also add in minor-mode commands that do whatever we want to the keymap. In this case, we're going to show and hide lower-level details interactively so that we can see a simplified version or a more detailed version of a type, depending on our needs.

[00:12:59.500] Parsing types as balanced expressions

First of all, parsing types as balanced expressions. We need to be able to quickly locate the boundaries and the contents of parenthesized expressions, or in this case, expressions in angle brackets. We use a syntax table inside Emacs to allow movement functions like forward-list to jump between matching angle brackets. By default, they're just parentheses. First of all, let's look at our syntax table. We're going to add here syntax entries to handle angle brackets as though they were parentheses. Then we have a lot of types that have colons in them, and those are namespaces in C++. By default, Emacs does not recognize them as parts of symbols, so we're going to tell Emacs that a colon is something called a symbol constituent, that it can be part of a name. Once we do that, then we can use our functions like forward-list, backward-word, all of the navigation and movement functions that we have that do things, that do more complicated things like S-expressions and so on, can be used now with our angle brackets and inside of our types.

[00:00:02.899] on IRC. So I see 2 questions coming in already on the pad. So the first question is, how did you draw the under braces and over braces? Sorry, Jeff, you're muted on the blue button. everything twice. I'm hearing everything twice. So it's, it's about with about a 5 you're right Thank you so much I MPB is showing the the big blue button Okay, sorry everyone. Okay now. I'm together now Let's see How did I draw the over braces and under braces? LaTeX. That is a, that's a, yeah, and a SVG, I think, produced by LaTeX through a separate file. I tried to do like a LaTeX code block and didn't get around to it. Also, the code to produce it in TickSet was really, really long. So I didn't put it in sounding keyboard. What kind is it? sorry. It is an Ergodox split keyboard for my wrists. Sorry about the noise. I mean, I like to hear it. We like to hear it. I think a lot of us do. Let's see. Someone's asking for ligatures. Do you have any questions, Ben? Charles? do you find that the invasive, quote unquote, 3-formatting interferes with navigation? Let me see. Yeah, it's weird. The good news is that, oh, you know what? The first thing I did, my first attempt at this, I actually made all of the incoming text invisible and just replaced it with my own text. And that was actually a lot worse. The more of the input that is removed or made invisible, the harder the navigation becomes. So the fact that now I'm just inserting line breaks and spaces makes it a lot easier. And I can still search. And when I get to the destination of the search, I'm still in proper normal text. So it got a little better by changing my strategy a bit, but it's still a little bit of a problem. Let's see. I'll go look at the etherpad. Where is it? you'd like me to. And then If at any point you want to take the questions from IRC, then feel free to do that as well. of your minor map for editing overlays? Well, I have a minor mode key map for increasing or decreasing the level of detail. And the key bindings are like, I can't remember what it is. If you go and you look at the source on GitHub, you can see it there. I forgot what I bound them to. Something that I'm allowed to do. They have restrictions on what key bindings you can make in minor modes. And I carefully followed the directions. I don't remember what it was. It's like Control-C-P or something like that. Or yeah. Sorry. Your examples were with C++ if you experiment with any other languages. I haven't. I guess this is just a perennial pain point for C++ programmers. So that's kind of why my, and I am 1, and I guess that's why my focus was there. You probably have to rewrite some of the parsers to use something else. Would it be possible to include overlays in the source file itself? I actually don't understand this question. In the source file itself, there are language modes that do this. No, I'm not certain I understand that question. Maybe you could edit it a little bit more, overlays in the source file. What are your plans for TSP in the future? It's a little fragile. So it might be nice to investigate. I think you can get the compiler to output error messages in different formats, which might be more parsable or the parsing might be more maintainable. That might be an interesting thing to investigate. And the other thing is I have just 1 way of reformatting the output where everything on the same level is vertically aligned. But I think some people might want to make more use of the horizontal space on the screen and take the sort of sibling parts of the type and line them up straight across and take up a little bit less vertical space. Enriched mode. I don't know what enriched mode is. Interesting. Oh, what's my repository link? Let me get that then. I don't know how to format this properly, but it's just troll slash tspute. Yeah, it's on GitHub. Something like that. Let's see. This looks like the Etherpad. It looks like all the Etherpad questions. We have 1 here from Charles. Can overlays work as hypertext so you can link an error message back to the source? Yeah, actually, that's done by default in compilation mode. That's 1 of the features you get, which has been around for literally decades. Oh, yeah. Is it already there? Yes, it's already there. Let's see. Do we have anything on IRC? Let me see. OK, looks like it seems like we've run out of questions. Is that true? although we still have a couple more minutes, like maybe 3, 4 minutes on the stream. So yeah. And then, of course, once the stream does move on to the next talk. Folks are welcome to join Jeff here on BigBlueButton. If Jeff still has a few more minutes to just chat here or ask questions here, that works as well. if anyone's excited about the tool. Are the notes are available online, right? I uploaded an org file that was my talk, and I actually included some references. Like at the end, there's some links and stuff like that. Whenever you see like a underlined thing in my presentation, it's like I was kind of thinking people would have access to the actual presentation itself so they could go and see what it was I was linking to some PDF somewhere. How annoying is this for multiple compilers? It's annoying, Ben. I basically have separate parsers for Clang and GCC, and I'm not supporting MSVC at the moment. So yeah, that's where I do worry about its fragility, about the way I'm kind of parsing these error messages, which are idiosyncratic. Oh, yeah, great. Thank you, Amin. That's good. Should just follow that link, I guess. down a little bit underneath the video embedding itself. There's timestamps. And then below the timestamps, I see a bunch of links, including 1 that says download.org. Is that the right 1? Yeah, that's it. That's the 1. Yeah, you can also see all of my hacks to Org Present are in there as well. I followed the System Crafters thing and made a bunch of my own modifications. Org Present has this problem where every heading is a slide, which I don't like. I kind of want hierarchy. You know? Oh, no. Sorry. Every level 1 heading is a slide. And I kind of want hierarchy among the slides. And I had to sort of invent it in that system myself through navigation. It looks like things have quieted down. Shall we call it? great talk, Jeff. And also to the audience for questions and discussions. People are welcome to stay here on BBB if Jeff has time to continue the discussions and ask any questions they might have. Otherwise, yeah, we can wrap it. And I love this conference. I've been a happy attendee since like 2015 or something. So yeah, it's great. Thank you for your work. in large part, thanks to awesome people like you who give these amazing talks. So Thank you as well. conference. You 1 1 1 1 1 2 1 2 1 2 1 1 3 4 1 2 4 1 1 3 4 1 1 2 3 3 4 1 2 1 You

Questions or comments? Please e-mail emacsconf-org-private@gnu.org