Improving compiler diagnostics with Overlays
Jeff Trull (he/him) - Pronunciation: rhymes with "hull" and "dull", IRC: jaafar, @jaafar@hachyderm.io
Format: 21-min talk ; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings
Talk
Duration: 20:57 minutes00:00.000 Introduction 00:33.560 Overlays and what they can do 02:02.500 Simple overlay example - creating an overlay 02:35.700 Adding properties 03:10.940 Deleting an overlay 03:24.660 Setting fonts the right way 03:59.540 More properties 04:12.580 Visibility 04:49.780 Adding text 05:27.820 Custom properties 05:45.380 Notes on properties 06:36.100 Improving C++ compiler output 08:17.680 The problem with C++ error messages 08:30.240 Many standard class templates have default arguments 08:47.520 Some types are aliases for longer things, too 09:20.960 Reporting type information accurately means long lines 10:18.240 Emacs can help - Treat C++ type names as just another kind of balanced expression 11:49.320 Add overlays to improve readability 12:22.400 Create a minor mode that runs during compilation 12:59.500 Parsing types as balanced expressions 14:16.100 Indent and fill with overlays - Use ancient "pretty printing" algorithms" 14:52.260 Overlays can mimic line breaks and indentation 15:14.520 Hiding details - Marking depths with overlays 17:12.660 Hiding to a target depth 18:04.900 Demo 20:10.220 Conclusion
Q&A
Description
Overlays are a feature of Emacs that allow changing the appearance of text while preserving its contents. They play a prominent role in packages like org-mode, which uses them to hide or reveal custom properties and display inline images, and magit, which uses them to highlight diffs.
The presenter will give a introduction to the features of overlays, demonstrating how to:
- Create and use overlays in Emacs Lisp code
- Query locations in an existing buffer to find out what overlays are present.
He will then demonstrate a new compilation minor mode for improving the readability of error messages, using overlays to flexibly reformat portions of the compiler output under user control.
Discussion
Questions and answers
- Q: How did you draw the underbraces and overbraces?
- A: TikZ, the greatest drawing tool ever See https://tex.stackexchange.com/a/128096/105203. I went to some effort to match up the colors, font, and background to Emacs. I got quite close, I think.
- Q: You've got a nice sounding keyboard. What kind is it?
- A: Sorry about that. It's an ErgoDox EZ
- Q: Do you find that the "invasive" reformatting interferes with
navigation?
- A: A bit. You can't move your cursor into the not-real buffer text (indentation). But the original text is still visible, so that works fine.
- Q: Can you show us the keybindings of your minor map for editing
overlays?
- A: It's C-c - and C-c + but you can change it.
- Q:Your examples were with c++, have you experimented with any other
languages? Oh, thanks for the interesting talk by the way!
- A: Other languages don't have the same unpleasant behavior I say this as a long time fan of C++. But it should be possible!
- Q: Would it be possible to include overlays in the source file
itself. There are some language modes (Rust, for instance) that do
this.
- A: [someone else] Sounds like enriched-mode. [Jeff] I'm not sure what this question means; it's the error messages that are the big issue
- Q: What are your plans for tspew in the future?
- A: Better future-proofing and more options for formatting
- Q: What is your repository link https://github.com/jefftrull ?
- Q: What IDEs do C++ programmers use? If not emacs? How do they deal
with these error messages?
- A: VSCode is quite popular, as well as CLion and also XCode. I think they simply display the error messages as is.
Q: Have you tried to use treesitter to parse the output?
- A: I think it wants to parse an entire buffer. If I could write a
grammar for a portion of the text and point it at that, that would be
great. I could have maybe made a tsit grammar if I could have applied it to a small bit of the output
- (not the speaker): ISTM that since you set up the syntax tables to recognize <> as parens/whatever that Emacs should be able to parse the effective lists as sexps, but I'm not an expert on that
- (not the speaker) ye it's true, often you want to select what the root source node type would be an AFAIK you cannot change it
- A: I think it wants to parse an entire buffer. If I could write a
grammar for a portion of the text and point it at that, that would be
great. I could have maybe made a tsit grammar if I could have applied it to a small bit of the output
Q: "org-mode, which uses them to hide or reveal custom properties" I thought they used buffer-invisibility-spec or something like that
- A: yes that's part of it
- do you know of they also use text properties. org code is usually pretty messy, so I don't know much about it
- A: org has been moving toward text properties but I think there is a flag that will use overlays instead (!). There's some controversy about performance that I will touch on in a bit
- Interesting, does that initiative predate the recent performance improvements by Stefan Monnier?
- A: I think so. They were known to be a problem for some time, but then that happened?
- Interesting, does that initiative predate the recent performance improvements by Stefan Monnier?
- A: yes that's part of it
- Q: Did you use, e.g. syntax-ppss to parse the depth using the syntax table?
- A: No I tried to though... maybe there's a better way
Notes and discussion
- The org file containing the presentation is here: https://media.emacsconf.org/2023/emacsconf-2023-overlay--improving-compiler-diagnostics-with-overlays--jeff-trull.org
- Tony Aldon's Reddit post on visibility https://www.reddit.com/r/emacs/comments/t1r2wq/have_you_ever_wondered_how_orgmode_toggles_the/
- Overlay performance (maybe) fixed https://www.reddit.com/r/emacs/comments/yg4mvt/the_noverlay_branch_was_merged_to_master_this/
- I think I might need to change subed-waveform to use text properties instead of overlays or fix something else that I'm doing incorrectly, since the overlays get left behind when I kill text
- A: yeah you have to track them yourself
- Can you put the overlay object in a text property to track it?
- A: I don't think you would mix properties and overlays in that manner. There are overlay search functions; people typically add a property that identifies them as theirs. or you can store references in a list or something
- A: One of my reasons for doing this was frustration and people talking about how great VSCode was and I knew that Emacs was a good match for certain kinds of problems people don't even try to solve in IDEs
- A: I actually edited this down I know it's still a lot of detail
- This is really good!
- Very impressive! And well explained. Thank you.
- yeah try doing that in VSCode! yeah.
- this is slick!
- i'm not a fan of ligatures, but imho :: just begs for it
- Same, I want to see the actual thing that'll be given to the compiler/interpreter/whatever.
- That was great, showing how relatively easy it is to extend Emacs with features like that.
- From the speaker: yantar92: your help was much appreciated in the weeks I spent putting this together
Transcript
Hi, I'm Jeff Trull, and today I'm going to talk to you about improving C++ compiler diagnostics using overlays and other features from Emacs. First an overview of my talk. I'm going to cover what overlays are and how you can use them in code, then I'm going to talk about C++ and why its compiler errors can be so onerous. Finally, we'll take that information and build a new minor mode using overlays and other Emacs features.
First of all, overlays.
What are they?
They are objects consisting of a buffer range
and a set of properties.
That means that they cover a region in a buffer.
The properties can be a certain set
of special property names,
in which case they can be used to cause
special effects in the buffer,
but they never change the underlying text.
You can use them for things like hiding things.
So, for example, overlays are working right now
in this window. org-present
,
the technology I'm using for this presentation,
is hiding the asterisk before every headline,
as well as the things called emphasis markers;
that is, those things that make things look
monospaced for verbatim, or italic, or bold.
The special characters we use to mark off those sections
are also hidden by org-present
using overlays.
But those things are still in the buffer
and they're still visible to code.
So if I run this little snippet of code down here,
it's going to go up to the headline "Overlays
and what they can do," and it's going to tell us
what's there in the buffer.
Let's go down and run this.
So according to this code, the contents of the buffer
to the left of the headline is a star in a space,
which means that even though we can't see that star,
it's still there, because it's hidden by an overlay.
And that's kind of the essence of what overlays are.
Let's do a simple overlay example.
We have some text on the right here,
which is a famous poem by William Carlos Williams,
which has been the subject of many memes.
Let's create an overlay that covers it.
I'll go down here and use this snippet of code here.
We'll go up to the top, and we'll mark everything
between #+BEGIN_VERSE
and #+END_VERSE
.
You can see we've created an overlay
from position 74 to 224.
Now we can take that overlay that we already created
and add a property, in this case a face
property,
to change the appearance of the text.
This is a poem, and it's currently using
a face that is monospaced,
and so it looks like a computer program,
even though it's a poem.
I think it would be nicer to use something
with variable-width font, maybe with some serifs.
So let's give that a try.
Now you can see that the poem looks quite a bit different.
It looks more like what we'd see in a book.
We can also delete overlays.
So I've named this one.
So we can just go down and run delete-overlay
and get rid of it, and it'll go back to
the appearance it had before.
And there it is.
It's back to normal.
Now, if you're interested in changing all of the verses
inside an Org Mode file to a different face
or a different font family,
this isn't the way you'd really do it.
I'll just show you that real quick.
The right way is probably to change the org-verse
face,
which is the face used for all of the verse blocks
inside your Org Mode file.
And so this is how you do it here:
face-remap-add-relative
.
Let's give it a try.
It worked!
There are more advanced things that you can do other than just changing fonts. There's a whole long list of them in the manual, but let's talk about the ones we're going to use today.
You can make text invisible, just like org-present
did.
The simplest way is to set the invisible
property to true,
so here's a code snippet that will do that.
What we're going to do is
go and find the word "plums" inside the poem,
and then we're going to make it invisible
by creating an overlay that covers it,
and then setting the invisible property to true.
Boom!
It's gone.
We've eaten the plums.
Visibility is a huge topic and very complicated.
There are powerful mechanisms for using it.
I suggest reading the manual
if you'd like to know more about that.
Another thing we can do with properties
is to add text either before or after an overlay.
Since we've made the word "plums" invisible,
or anything that you make invisible in the buffer,
if you add text then afterwards,
it looks like you've replaced the original words
with new words.
So let's add a property, a before-string
property,
to the overlay that we used before
to make it seem as though we're eating cherries
instead of plums.
Boom!
There it is.
So that's how you can replace words using overlays.
You can also have custom properties that you name and then use yourself. For example, you can use it to mark regions in the buffer. You can also use it to add information to regions in the buffer for your own tracking in a minor mode or something like that, which we will use.
Finally, two notes on properties. We've been talking about overlay properties, but there's also something called text properties. Text properties are attached to text in a buffer. When you copy that text, the properties come along with it. If you modify the properties, the buffer is considered modified. Org Mode makes heavy use of text properties, as we can see by running this little code snippet here, which is going to tell us the properties and the string attached to the "Some poetry" headline on the right. There's also some controversy regarding performance. It may be that text properties perform better than overlay properties, so do some research if you're going to make heavy use of them. I prefer overlays because they're just easier to use.
C++ compiler output. So my day job is C++ programmer, and although I've been an Emacser for many years, it can be a little bit of a chore dealing with errors. The error messages that come out of the compiler can be pretty hard to understand. This has often been a barrier, particularly for people who are new to C++. So let's see what that's like. I have an example which is generously supplied by Ben Deane of Intel. So let's see what it looks like when you compile a C++ program that has a difficult error in it. Okay. Okay. So you see we have a lot of fairly verbose messages. The most verbose one I think is probably here. This one here. These are pretty bad. I think there might be bigger ones. Oh, yeah. Here we go. Here's my favorite one. You can see... Let's look for specialization... Basically, this whole section of the buffer here, that is specifying the specific types that a function template was instantiated with. And it's a lot there. So if you're trying to figure out what's wrong with your program and you're looking at something like this, it can be really, really hard to understand. Okay. Back to our presentation.
So it's often this way in C++ because we compose types from other types. They can be long to begin with, but then a couple of other factors come into play.
First of all, we can have default template arguments. These are arguments you didn't write, but that are implicitly there and can sometimes refer to the arguments that you did write, which causes them to get a bit bigger, such as these allocator arguments here and here.
Then there are type aliases.
For example, std::string
here expands to
a type with three template arguments.
So you can imagine, when we combine
those two things together,
our simple vector of maps from strings to ints
becomes this humongous thing here, which...
Let's run the comparison.
Yeah.
So in summary, to properly understand an error when you're a C++ programmer requires knowing the exact types that were supplied to your function. And types are built recursively, and therefore the types can-- the correct exact name for the type can just be really huge and have many levels and layers to it. So when I was trying to understand the things I'd done wrong, especially when I was a newer C++ programmer, but honestly still even recently, if I was having a really intractable problem, I would just copy the entire error message out, stick it in the scratch buffer, and then manually reformat it so I could see what it was telling me I'd actually called the function or whatever it was with, the exact type. I had to sit there and go through the whole thing. But there's a better way. Now, anyway.
So what can Emacs do to help us with this problem? First of all, if you think about a type name, it's a lot like what we call S-expressions or balanced expressions. Lisp code itself is an S-expression. It's basically things with parentheses and little atoms or symbols in it, or strings or numbers. But parenthesized balanced expressions are things that Emacs was actually built to deal with. They were... I found an old manual from 1981, and the two major modes that they recommended or that they actually documented in the manual were one, assembly language, and two, Lisp. They mentioned that there were other modes, but they didn't say anything about them. So Lisp is something with a really long history with Emacs. Balanced expressions and manipulating them and doing them efficiently is just a thing that Emacs knows how to do, and Emacs is good at it. There's just a legacy of algorithms and functions for doing it. So we take types, and we take the angle brackets in the types, and we get the symbols right. Then we can treat them as though they were balanced expressions or S-expressions, the same kind that Emacs is really good at handling.
Secondly, we can use overlays
to improve the readability of errors.
We can take long lines and break and indent them
using before-string
s, so the same thing
I used to add "cherries" into the poem.
We can use that to insert new lines
followed by indentation
and produce a much nicer-looking listing of a type.
We can also use the invisible
property
to hide unwanted detail.
Last of all, we can create a minor mode.
When we're compiling things in Emacs,
we often use compilation-mode
.
compilation-mode
allows you to install
compilation filters that run
when the compiler is producing output,
and at that time, then, we can add our overlays.
We can also add in minor-mode commands
that do whatever we want to the keymap.
In this case, we're going to show and hide
lower-level details interactively
so that we can see a simplified version
or a more detailed version of a type, depending on our needs.
First of all, parsing types as balanced expressions.
We need to be able to quickly locate
the boundaries and the contents
of parenthesized expressions,
or in this case, expressions in angle brackets.
We use a syntax table inside Emacs
to allow movement functions like forward-list
to jump between matching angle brackets.
By default, they're just parentheses.
First of all, let's look at our syntax table.
We're going to add here syntax entries
to handle angle brackets as though they were parentheses.
Then we have a lot of types
that have colons in them, and those are namespaces in C++.
By default, Emacs does not recognize them
as parts of symbols, so we're going to tell Emacs
that a colon is something called a symbol constituent,
that it can be part of a name.
Once we do that, then we can use our functions
like forward-list
, backward-word
,
all of the navigation and movement functions that we have
that do things, that do more complicated things
like S-expressions and so on,
can be used now with our angle brackets
and inside of our types.
Questions or comments? Please e-mail emacsconf-org-private@gnu.org