emacs-gc-stats: Does garbage collection actually slow down Emacs?
Ihor Radchenko (he) - Mastodon: https://emacs.ch/@yantar92, yantar92@posteo.net
Format: 34-min talk; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings
Description
Talk sources, PDF, raw data, and analysis are published at https://dx.doi.org/10.5281/zenodo.10213384 .
Is Emacs responsiveness really affected by slow garbage collector? Should `gc-cons-threshold' be increased during startup? Or maybe during the whole Emacs session?
I will try to answer these questions using the real data collected from Emacs users who installed https://elpa.gnu.org/packages/emacs-gc-stats.html package and submitted their results to https://lists.gnu.org/archive/html/emacs-gc-stats/.
About the speaker:
Materials science researcher, Org mode users since many years ago, Org mode (unofficial) co-maintainer
The talk is an excuse to sum up emacs-gc-stats data for later discussion of changing Emacs GC defaults: https://yhetil.org/emacs-devel/87v8j6t3i9.fsf@localhost/
Discussion
Questions and answers
- Q: Are the GC duration statistics correlated with users? I mean:
does the same user experience GCs of various durations, or do some
users experience GCs of >0.2 s exclusively while others never
experience GCs of >0.2 s?
- A: Some users have <0.1 GC time, while others struggle with near 1 sec. Really varies. But the number of people with >0.2sec is significant enough to make GC a big deal. You can check it yourself - there are GC stats plots for each individual user in https://zenodo.org/records/10213384.
- Q:Having recently been working on a high-performance smooth
scrolling mode, which needs to respond to scroll events
arriving >50-60 times per second, a 100ms delay is very
noticeable in this scenario. For normal buffer interation and
commands 0.1s a reasonable dividing line, but I'd estimate you can
easily feel a 20ms delay during varoius "fast" interactions. Do
you think there is hope to "spread out" GC latency to keep it
below say 15ms, even if more frequent (without just repeating many
short GC's in a row)?
- A: The only reasonable "spread out" is deferring GC to after that scrolling. Like (let ((gc-cons-threshold )) (do the scrolling)). This is also what recommended by Emacs devs (AFAIR).
- Q:Opinions about gcmh-mode?
- A: (Not Ihor): Ironically it uses too many timers, creating
garbage of its own. It should use
timer-set-time
instead of creating and throwing away timers after each command (viapost-command-hook
) Interesting! - A: (from Ihor): the problem is it ends up consuming a ton of memory, increasing GC time, and that most GCs occur when Emacs is being used intensively and there is no chance for Emacs to go on idle and perform the GC. Since GC cons threshold is raised to ~1G (gcmh-high-cons-threshold) while Emacs is used - you will face a really bad hang (seconds to tens of seconds regularly). Ends up not helping much, recommend increasing gc-cons-percentage=0.2 or so instead.
- A: (Not Ihor): Ironically it uses too many timers, creating
garbage of its own. It should use
- Q:
- A:
- Q: Is there some way to free up memory (such as via
unload-feature
) in Emacs? Often I only need a package loaded for a single task/short period but it persists in memory afterwards.- A: https://elpa.gnu.org/packages/memory-usage.html, and built-in M-x memory-report - most of the time, it is some history/cache variables of large buffers that are occupying memory. The library code itself is rarely affecting GC. (The other question is when libraries add timers/heavy mode-line constructs/post-command-hooks/etc - that's indeed a problem, but solved by disabling or not using a package; no need to unload)
- Q: Very nice presentation! I just experimented with the threshold
and lowered my gc-elapsed from 1.1 to 0.06 seconds (during startup).
Interestingly, going to 10MB increased the time, 4MB was the
sweet-spot for my system. What is the recommended way to lower the
value back to the default value after startup is complete?
- A: after-init-hook
- Q:what were you using to flip through the PNGs? (thanks for the
answer. look-mode on melpa does that too
- A: []{.underline} [https://feh.finalrewind.org/]{.underline}
- Q: What was the final point you were making regarding Emacs 30? You
got cut off...
- A: M-x malloc-trim
- Q: With 16-32G RAMs a minimal OS swapping, how about systematically doing this temporary deferral @yantar92 suggested and leave it down for a longer GC at night and whatnot? Or would cons/allocation also degrade too noticeably?
- Not the speaker: That would cause Emacs to use a lot more total memory
- Indeed. Essentially the question is at what point all my daily mostly-textual Emacs usage doesn't come close to using all the available memory on a 32G sys? (but my mind went more to being concerned about new cons/alloca and fragmentation for the intra-day use) I'll have to look into it more before being cogent. One more onto the todo list then
- A: for increasing thresholds up to RAM limits, do remember that individual GC time will increase - with 32Gb RAM you will likely make individual GC prohibitedly slow sooner than later. I'd say that it only makes sense to increase the thresholds when you have multiple agglomerated GCs. Going beyond this is of little use. (I am thinking about adding some kind of summary statistics command to emacs-gc-stats, so that one can look into GC duration, frequency, init time, and agglomeration and then adjust the settings according to the results)
- Not the speaker: That would cause Emacs to use a lot more total memory
Notes
- https://elpa.gnu.org/packages/emacs-gc-stats.html
- Data, presentation, and analysis: https://dx.doi.org/10.5281/zenodo.10213384
- This presentation is a direct continuation of emacs-devel thread:
- https://yhetil.org/emacs-devel/20230310110747.4hytasakomvdyf7i@Ergus/
- At some point, Eli asked to collect GC statistics - https://yhetil.org/emacs-devel/83y1n2n11e.fsf@gnu.org/
- https://elpa.gnu.org/packages/emacs-gc-stats.html and my talk summarizing the results are the answer to that request.
- Now, we can continue the discussion on emacs-devel with real
data at hand
- I hope to push for a temporary bump of
gc-cons-threshold' during Emacs init and possibly for increasing
gc-cons-percentage'.
- I hope to push for a temporary bump of
- Came for clear-cut magic bullet answers, left with nuanced analysis - and that, surprise, Eli was overall right? Now what to do with that viral gc init snippet that I've never taken time to measure myself but keep anyway...
- A: I do believe that temporarily raising thresholds is ok for init time. that's the only clear-cut conclusion, unortunately
- Thanks yantar92, both for the detailed investigation and exposition. I've been deferring to much-smarter-than-me Henrik for my default position (Doom has it in it's init), for lack for doing any measurements myself.
- Thanks for your work on this project. Very thorough.
- Definitely a huge extra thanks for the tireless Org-mode work yantar92!
- A: Do not take things Doom does blindly. I am still horrified by let-binding major-mode
- Good advice, thanks. I don't personally (more of a vanilla/DIY type myself), but I'd be remiss to leverage Henrik's insights nonetheless
- A: (fun fact: memory-info tries to get memory information on remote system when connected via TRAMP) ... not a problem (anymore; after that very surpising bug report) for emacs-gc-stats
Transcript
[00:00:00.000] Introduction
Hello everyone, my name is Ihor Radchenko, and you may know me from Org Mailing List. However, today I'm not going to talk about Org Mode. Today I'm going to talk about Emacs performance and how it's affected by its memory management code. First, I will introduce the basic concepts of Emacs memory management and what garbage collection is. Then I will show you user statistics collected from volunteer users over the last half year and I will end with some guidelines on how to tweak Emacs garbage collection customizations to optimize Emacs performance and when it's necessary or not to do.
[00:00:51.080] About garbage collection in Emacs
Let's begin. What is garbage collection? To understand what is garbage collection, we need to realize that anything you do in Emacs is some kind of command. Any command is most likely running some Elisp code. Every time you run Elisp code, you most likely need to locate certain memory in RAM. Some of this memory is retained for a long time and some of this memory is transient. Of course, Emacs has to clear this transient memory from time to time, to not occupy all the possible RAM in the computer. In this small example, we have one global variable that is assigned a value, but when assigning the value, we first allocate a temporary variable and then a temporary list and only retain some part of this list in this global variable. In terms of memory graph we can represent this as two variable slots, one transient, one permanent, and then a list of three cons cells, part of which is retained as a global variable but part of it which is a temporary variable symbol. The first term of the list is not used and it might be cleared at some point.
[00:02:09.760] Garbage collection in Emacs
So that's what Emacs does.
Every now and then, Emacs goes through all the memory
and identifies which part of the memory are not used
and then clear them so that it can free up the RAM.
This process is called garbage collection
and Emacs uses a very simple and old algorithm
which is called Mark & Sweep.
So doing this mark and sweep process
is basically two stages.
First, Emacs scans all the memory that is allocated
and then identifies which memory is still in use
which is linked to some variables, for example,
and which memory is not used anymore
even though it was allocated in the past.
The second stage [??] whenever a memory is not,
that is not allocated. During the process
Emacs cannot do anything now.
So basically, every time Emacs scans the memory,
it freezes up and doesn't respond to anything,
and if it takes too much time so that users can notice it,
then of course Emacs is not responsive at all,
and if this garbage collection is triggered too frequently,
then it's not just not responsive every now and then.
It's also not responsive all the time,
almost all the time,
so it cannot even normally type or stuff
or do some normal commands.
This mark and sweep algorithm is taking longer
the more memory Emacs uses. So basically,
the more buffers you open, the more packages you load,
the more complex commands you run, the more memory is used,
and basically, the longer Emacs takes
to perform a single garbage collection.
Of course, Emacs being Emacs
this garbage collection can be tweaked.
In particular users can tweak
how frequently Emacs does garbage collection
using two basic variables: gc-cons-threshold
and gc-cons-percentage
.
gc-cons-threshold
is the raw number of kilobytes
Emacs needs to allocate
before triggering another garbage collection,
and the gc-cons-percentage
is similar,
but it's defined in terms of fraction
of already-allocated memory.
If you follow various Emacs forums,
you may be familiar with people complaining about
garbage collection. There are many many suggestions
about what to do with it.
Most frequently, you see gc-cons-threshold
recommended to be increased,
and a number of pre-packaged Emacs distributions
like Doom Emacs do increase it.
I have seen suggestions which are actually horrible
to disable garbage collection temporarily
or for a long time.
Which is nice... You can see it quite frequently,
which indicates there might be some problem.
However, every time one user poses about this problem,
it's just one data point and it doesn't mean
that everyone actually suffers from it.
It doesn't mean that everyone should do it.
So in order to understand if this garbage collection
is really a problem which is a common problem
we do need some kind of statistics
and only using the actual statistics
we can understand if it should be recommended for everyone
to tweak the defaults or like whether
it should be recommended for certain users
or maybe it should be asked Emacs devs
to do something about the defaults.
And what I did some time ago is exactly this.
I tried to collect the user statistics.
So I wrote a small package on Elp
and some users installed this package
and then reported back these statistics
of the garbage collection for their particular use.
By now we have obtained 129 user submissions
with over 1 million GC records in there.
So like some of these submissions
used default GC settings without any customizations.
Some used increased GC cost threshold
and GC cost percentage.
So using this data we can try to draw
some reliable conclusions on what should be done
and whether should anything be done about garbage collection
on Emacs dev level or at least on user level.
Of course we need to keep in mind
that there's some kind of bias
because it's more likely
that users already have problems with GC
or they think they have problems with GC
will report and submit the data.
But anyway having s statistics is much more useful
than just having anecdotal evidences
from one or other reddit posts.
And just one thing I will do
during the rest of my presentation
is that for all the statistics
I will normalize user data
so that every user contributes equally.
For example if one user submits like
100 hours Emacs uptime statistics
and other users submit one hour Emacs uptime
then I will anyway make it so that they contribute equally.
Let's start from one of the most obvious things
we can look into is
which is the time it takes for garbage collection
to single garbage collection process.
Here you see frequency distribution of GC duration
for all the 129 users we got
and you can see that most of the garbage collections
are done quite quickly in less than 0.1 second
and less than 0.1 second is usually just not noticeable.
So even though there is garbage collection
it will not interrupt the work in Emacs.
However there is a fraction of users
who experience garbage collection
it takes like 0.2, 0.3 or even half a second
which will be quite noticeable.
For the purposes of this study
I will consider that anything that is less than 0.1 second
which is insignificant so like you will not notice it
and it's like obviously
all the Emacs usage will be just normal.
But if it's more than 0.1 or 0.2 seconds
then it will be very noticeable
and you will see that Emacs hang for a little while
or not so little while. In terms of numbers
it's better to plot the statistics not as a distribution
but as a cumulative distribution.
So like at every point of this graph
you'll see like for example here 0.4 seconds
you have this percent of like almost 90% of users
have no more than 0.4 gc duration.
So like we can look here if we take one
gc critical gc duration which is 0.1 second
0.1 second and look at how many users have
it so we have 56% which is like
44% users have less than 0.1 second gc duration
and the rest 56% have more than 0.1 second.
So you can see like more than half of users
actually have noticeable gc delay
so the Emacs freezes for some noticeable time
and a quarter of users actually have very noticeable
so like Emacs freezes such that you see an actual delay
that Emacs actually has
which is quite significant and important point.
But apart from the duration of each individual gc
it is important to see how frequent it is
because even if you do notice a delay
even a few seconds delay
it doesn't matter if it happens once
during the whole Emacs session.
So if you look into frequency distribution again here
I plot time between subsequent garbage collections
versus how frequent it is and we have very clear trend
that most of the garbage collections are quite frequent
like we talk about every few seconds a few tens of seconds.
There's a few outliers which are at very round numbers
like 60 seconds, 120 seconds, 300 seconds.
These are usually timers so like
you have something running on timer
and then it is complex command
and it triggers garbage collection
but it's not the majority.
Again to run the numbers
it's better to look into cumulative distribution
and see that 50% of garbage collections
are basically less than 10 seconds apart.
And we can combine it with previous data
and we look into whatever garbage collection
takes less than 10 seconds from each other
and also takes more than say 0.1 seconds.
So and then we see that
one quarter of all garbage collections
are just noticeable and also frequent
and 9% are not like
more than 0.2% very noticeable and also frequent.
So basically it constitutes Emacs freezing.
So 9% of all the garbage collection Emacs freezing.
Of course if you remember there is a bias
but 9% is quite significant number.
So garbage collection can really slow down things
not for everyone but for significant fraction of users.
Another thing I'd like to look into
is what I call agglomerated GCs.
What I mean by agglomerated is
when you have one garbage collection
and then another garbage immediately after it.
So in terms of numbers I took
every subsequent garbage collection
which is either immediately after
or no more than one second after each.
So from point of view of users is like
multiple garbage collection they add up together
into one giant garbage collection.
And if you look into numbers
of how many agglomerated garbage collections there are
you can see even numbers over 100.
So 100 garbage collection going one after another.
Even if you think about each garbage collection
taking 0.1 second we look into 100 of them
it's total 10 seconds.
It's like Emacs hanging forever
or like a significant number is also 10.
So again this would be very annoying to meet such thing.
How frequently does it happen?
Again we can plot cumulative distribution
and we see that 20 percent like 19 percent
of all the garbage collection are at least two together
and 8 percent like more than 10. So like you think about oh
each garbage collection is not taking much time
but when you have 10 of them yeah that becomes a problem.
Another thing is to answer a question
that some people complain about is that
longer you use Emacs the slower Emacs become.
Of course it may be caused by garbage collection
and I wanted to look into how garbage collection time
and other statistics,
other parameters are evolving over time.
And what I can see here is a cumulative distribution
of GC duration for like first 10 minutes of Emacs uptime
first 100 minutes first 1000 minutes.
And if you look closer then you see
that each individual garbage collection on average
takes longer as you use Emacs longer.
However this longer is not much it's like maybe 10 percent
like basically garbage collection gets like
slow Emacs down more as you use Emacs more but not much.
So basically if you do you see Emacs
being slower and slower over time
it's probably not really garbage collection
because it doesn't change too much.
And if you look into time
between individual garbage collections
and you see that the time actually increases
as you use Emacs longer which makes sense
because initially like first few minutes
you have all kind of packages loading
like all the port loading and then later
everything is loaded and things become more stable.
So the conclusion on this part is that
if Emacs becomes slower in a long session
it's probably not caused by garbage collection.
And one word of warning of course is that
it's all nice and all when I present the statistics
but it's only an average
and if you are an actual user like here is one example
which shows a total garbage collection time
like accumulated together over Emacs uptime
and you see different lines
which correspond to different sessions of one user
and you see they are wildly different
like one time there is almost no garbage collection
another time you see garbage collection
because probably Emacs is used more early
or like different pattern of usage
and even during a single Emacs session
you see a different slope
of this curve which means that
sometimes garbage collection is infrequent
and sometimes it's much more frequent
so it's probably much more noticeable one time
and less noticeable other time.
So if you think about these statistics of course
they only represent an average usage
but sometimes it can get worse sometimes it can get better.
The last parameter I'd like to talk about is
garbage collection during Emacs init.
Basically if you think about what happens during Emacs init
like when Emacs just starting up
then whatever garbage collection
there it's one or it's several times
it all contributes to Emacs taking longer to start.
And again we can look into the statistic
and see what is the total GC duration after Emacs init
and we see that 50% of all the submissions
garbage collection adds up more than one second
to Emacs init time and for 20% of users
it's extra three seconds Emacs start time
which is very significant
especially for people who are used to Vim
which can start in like a fraction of a second
and here it just does garbage collection
because garbage collection is not
everything Emacs does during startup
adds up more to the load.
Okay that's all nice and all
but what can we do about these statistics
can we draw any conclusions
and the answer is of course
like the most important conclusion here is that
yes garbage collection can slow down Emacs
at least for some people and what to do about it
there are two variables which you can tweak
it's because gcconce threshold gcconce percentage
and having the statistics I can at least look a little bit
into what is the effect of increasing these variables
like most people just increase gcconce threshold
and like all the submissions people did increase
and doesn't make much sense to decrease it
like to make things worse
of course for these statistics
the exact values of this increased thresholds
are not always the same
but at least we can look into some trends
so first and obvious thing we can observe
is when we compare
the standard gc settings standard thresholds
and increased thresholds for time between
subsequent gcs and as one may expect
if you increase the threshold
Emacs will do garbage collection less frequently
so the spacing between garbage collection increases
okay the only thing is that
if garbage collection is less frequent
then each individual garbage collection becomes longer
so if you think about increasing
garbage collection thresholds be prepared
that in each individual time Emacs freezes will take longer
this is one caveat when we talk about
this agglomerated gcs which are one after other
like if you increase the threshold sufficiently
then whatever happened that garbage collections
were like done one after other
we can now make it so that they are actually separated
so like you don't see one giant freeze caused by
like 10 gcs in a row
instead you can make it so that they are separated
and in statistics it's very clear
that the number of agglomerated garbage collections
decreases dramatically when you increase the thresholds
it's particularly evident when we look into startup time
if you look at gc duration during Emacs startup
and if we look into what happens
when you increase the thresholds
it's very clear that Emacs startup become faster
when you increase gc thresholds
so that's all for actual user statistics
and now let's try to run into
some like actual recommendations
on what numbers to set and before we start
let me explain a little bit about
the difference between these two variables
which is gc constant threshold and gc constant percentage
so if you think about Emacs memory
like there's a certain memory allocated by Emacs
and then as you run commands and turn using Emacs
there is more memory allocated
and Emacs decides when to do garbage collection
according these two variables
and actually what it does it chooses the larger one
so say you have you are late in Emacs session
you have a lot of Emacs memory allocated
then you have gc constant percentage
which is percent of the already allocated memory
and that percent is probably going to be the largest
because you have more memory
and memory means that percent of it is larger
so like you have a larger number cost
by gc constant percentage
so in this scenario when Emacs session is already running
for a long time and there is a lot of memory allocated
you have gc constant percentage
controlling the garbage collection
while early in Emacs there is not much memory placed
Emacs just starting up then gc constant threshold
is controlling how frequently garbage collection happens
because smaller allocated memory
means its percentage will be a small number
so in terms of default values at least
gc constant threshold is 800 kilobytes
and gc constant percentage is 10
so gc constant percentage becomes larger than that threshold
when you have more than eight megabytes of allocated memory
by Emacs which is quite early
and it will probably hold just during the startup
and once you start using your maximum
and once you load all the histories
all the kinds of buffers it's probably going to take
more than much more than eight megabytes
so now we understand this
we can draw certain recommendations
about tweaking the gc thresholds
so first of all I need to emphasize
that any time you increase gc threshold
an individual garbage collection time increases
so it's not free at all
if you don't have problems with garbage collection
which is half of the users don't have much problem
you don't need to tweak anything
only when gc is frequent and slow
when Emacs is really really present frequently
you may consider increasing gc thresholds only
and in particular I recommend
increasing gc constant percentage
because that's what mostly controls gc
when Emacs is running for long session
and the numbers are probably like
yeah we can estimate the effect of these numbers
like for example if you have a default value of 0.1 percent
for gc constant percentage 0.1 which is 10 percent
and then increase it twice
obviously you get twice less frequent gcs
but it will come at the cost of extra 10 percent gc time
and if you increase 10 times you can think about
10 less 10 x less frequent gcs
but almost twice longer individual garbage collection time
so probably you want to set the number closer to 0.1
another part of the users may actually
try to optimize Emacs startup time
which is quite frequent problem
in this case it's probably better to increase gc constant
but not too much so like
first of all it makes sense to check
whether garbage collection is a problem at all
during startup and there are two variables
which can show what is happening this garbage collection
so gc done is a variable that shows
how many garbage collection
like what is the number of garbage collections triggered
like when you check the value
or right after you start Emacs
you will see that
number and gc elapsed variable
which gives you a number of seconds
which Emacs spent in doing garbage collection
so this is probably the most important variable
and if you see it's large then you may consider tweaking it
for the Emacs startup we can estimate some bounds
because in the statistics I never saw anything
that is more than 10 seconds extra
which even 10 seconds is probably like
a really really hard upper bound so
or say if you want to decrease the gc contribution
like order of magnitude or like two orders of magnitudes
let's say like as a really hard top estimate
then it corresponds to 80 megabytes gc constant
and probably much less so like
there's no point setting it
to a few hundred megabytes of course
there's one caveat which is important to keep in
mind though that increasing the gc thresholds
is not just increasing individual gc time
there's also an actual real impact on the RAM usage
so like if you increase gc threshold
it increases the RAM usage of Emacs
and you shouldn't think that like okay
I increased the threshold by like 100 megabytes
then 100 megabytes extra RAM usage doesn't matter
it's not 100 megabytes
because less frequent garbage collection means
it will lead to memory fragmentation
so in practice if you increase the thresholds
to tens or hundreds of megabytes
we are talking about gigabytes extra RAM usage
for me personally when I tried to play with gc thresholds
I have seen Emacs taking two gigabytes like
compared to several times less
when with default settings so it's not free at all
and only like either when you have a lot of free RAM
and you don't care or when your Emacs is really slow
then you may need to consider this
tweaking these defaults so again don't tweak defaults
if you don't really have a problem
and of course this RAM problem is a big big deal
for Emacs devs because from from the point of single user
you have like normal laptop most likely like normal PC
with a lot of RAM you don't care about these things too much
but Emacs in general can run on like all kinds of machines
including low-end machines with very limited RAM
and anytime Emacs developers consider increasing
the defaults for garbage collection
it's like they always have to consider
if you increase them too much
then Emacs may just stop running on certain platforms
so that's a very big consideration in terms
of the global defaults for everyone
although I have to I would say that it might be related
to the safe to increase GCCons threshold
because it mostly affects startup and during startup
it's probably not the peak usage of Emacs
and like as Emacs runs for longer
it's probably where most of RAM will be used later
on the other hand GCCons percentage is much more debating
because it has pros and cons
it will increase the RAM usage
it will increase the individual GC time so
if we consider changing it it's much more tricky
and we have discussing probably measure the impact on users
and a final note on or from the point of view
of Emacs development is
that this simple mark-and-sweep algorithm
is like a very old and not the state-of-the-art algorithm
there are variants of garbage collection
that are like totally non-blocking
so Emacs just doesn't have to freeze
during the garbage collection
or there are variants of garbage collection algorithm
that do not scan all the memory just fraction of it
and scan another fraction less frequently
so there are actually ways just to change
the garbage collection algorithm to make things much faster
of course like just changing the numbers of variables
like the numbers of variable values
is much more tricky and one has to implement it
obviously it would be nice if someone implements it
but so far it's not happening so yeah it would be nice
but maybe not not so quickly
there is more chance to change the defaults here
to conclude let me reiterate the most important points
so from point of view of users you need to understand that
yes garbage collection may be a problem
but not for everyone so like
you should only think about changing the variables
when you really know that garbage collection
is the problem for you so if you have slow Emacs startup
slow Emacs startup and you know that it's caused by
garbage collection like by
you can check the GC elapsed variable
then you may increase GC count threshold
like to few tens of megabytes not more
it doesn't make sense to increase it much more
and if you really have major problems
with Emacs being slaggy
then you can increase GC count percentage
to like 0.2 0.3 maybe
one is probably overkill
but do watch your Emacs ROM usage it may be really impacted
for Emacs developers I'd like to emphasize
that there is a real problem with garbage collection
and nine percent of all the garbage collection
data points we have correspond
to really slow noticeable Emacs precision
and really frequent less than 10 seconds
I'd say that it's really worth
increasing GC count threshold at least during startup
because it really impacts the Emacs startup time
making Emacs startup much faster
ideally we need to reimplement
the garbage collection algorithm of course it's not easy
but it would be really nice
and for GC count percentage defaults it's hard to say
we may consider changing it but it's up to discussion
and we probably need to be conservative here
so we came to the end of my talk
and this presentation
all the data will be available publicly
and you can reproduce all the statistic graphs if you wish
and thank you for attention
Q&A transcript (unedited)
Thank you for your nice talk, I can say it's the Emacs GC. We have some questions on the pad and maybe before I would like to ask you something to the last 1 you have said, concerning changing the GC strategy, that it's unlikely that it will be happening in the next time. Yeah. Is there any discussion going on or why does the case it's not changing the strategy? I think, yesterday you heard from, 1 of the dev talks that like there was 1 small, short comment that, oh yeah, it would be nice to change this algorithm but it's hard. that hard but because it's a very low level code and it must be like very carefully weighted. So that can be, it needs to be made sure that the carousel will work. It's all bugs. If you have bugs and you can see that, so it's nothing to work anymore. Yeah. Maybe sometime. there was a branch on generational DC, if I remember correctly, but they didn't go anywhere, unfortunately. questions on the pad. So the first 1 is, are the GC duration statistics correlated with users? I mean, does the same user experience GC of various durations? Or Do some users experience GC of a greater 0.26 exclusively, while others never experience them? So is it correlated to user behavior? I guess you said it in your talk. then almost every user has like 1 or 2 occasions when GC takes more than 0.2 seconds, but it's like, maybe something else is using CPU and that's why, but in practice, there are users who don't have problem. Half of them that that's who that's what I looked from statistics. And dry users who have like really big problems, like 1 second GC time. on us in the talk, but could you like extract on if it's a package, that's a problem or we as a user behavior are there. okay. I'm sharing my screen now, different user statistics. So like you can see this duration for each individual user basically. So you can see like here for example it's like averages around 0.25 seconds which is noticeable and here is like 0.1 like someone is all over the place, probably some. Then like, what else can we see here? Yeah, some users like have sub 0.1, no problem at all. And I have seen some that really, really bad. I mean, seconds, 0.5 seconds. I don't know how that guy uses ZMax. Yeah. you can see it varies. usually the number of packages, like all kinds of timers going on under the hood. I think I tried to list... I'll go through this. I briefly outlined some important parts. Here, when you have something like an org agenda, it will most likely trigger a lot of GCs. When you have a lot of timers, when you have something calculated on modline, it will be frequently triggered. these packages are using a lot of memory. Like I remember I was surprised by this, package, home org that was, caching all the results. And for large org files, it was like several hundred megabytes of data. Well, it just becomes slower. Yeah. Someone asks, what software you're using for flipping through the PNGs. Maybe you could shortly throws it in. So To 1 statement you have made, there was a question concerning the timings. So you said, okay, everything above 0.1 second is fine. Maybe There's a short story of someone who asked a question. terms of trying to adjust the GC time. I mean, if you make GCs less frequent, you increase the individual GC time. If you make them more frequent, you decrease the individual GC time, but then they are more frequent. So what is the point? I think the way to go here is you can rise to see the short for the duration of scrolling, like just for a comment. I think it's a recommendation from Emacs devs. So like You do something along the lines. Yeah, I'm surely doing something on my screen and I forgot that I'm not sharing anything. So, basically, if you have some command that is very important that it should run very quickly. You temporary increase that threshold, you run that comment, then that's all. That's probably the best. So basically, the best you can do is to delay it after the command. its stuff. OK. The third 1 has been already answered, but I just want to get your information from it. Opinions on the GCMH mode. but that's more like a technical problem. But there's another problem there. Yeah, I prepared a small snippet here. So if you look at the GCMH mode, it has this concept of low threshold and high threshold and most of the time it's running high threshold and then when Emacs is idle, it falls back to lower threshold and then it does the GC while Emacs is not used. That's a good idea, of course. That's the core idea of GCMH mode. Unfortunately, the most annoying GC is when you're actively using max. And then you have this huge value of GC counter show and look at the doc stream. This would be sector value that makes GC unlikely but does not cost OSP Asian. So yeah, no wonder like if you don't do GC, your arm usage will skyrocket. So they don't, they cannot put it too much, but this is like already like, how much was it? 1 gigabyte, that's the default. And the problem is when you have 1 gigabyte to garbage collect, it causes really long GC time. So in GC image mode, when you're actually using Emacs, really heavily, the GCs become terrible, terribly slow. So it may help in case you don't have too much problems with GC, but I will say that in such situation, you can simply increase GC cost percentage, as I recommend, and it should do it. But in case of really big problems with garbage collection, no, I don't think that will help much. I used it myself and it didn't help much for my stuff. freeing up memory. Is there some way to free up memory such as via unload feature on Emacs? Often I only need a package loaded for a single task or short period by the persistent memory afterwards. a problem. I mean, the libraries, the problem is some extra, like some variable contents or some histories, some caches. That's what's eating most of the memory. There is a package called memory usage and built in MX memory report. They allow to see which variables take a lot of memory. And that way you can try to see which packages are actually problematic. So for example, I recall, and that was not exactly, I remember there was a package that was literally in command line, like prompt history. I think it was in command. And when you do like, when you save every message in your chart into prompt history, that can grow very fast and can go to several hundred megabytes just in that history. And that can cause major problems. So, yes, profiling the largest variables with the largest buffers that might give some clues. Again, there is no silver bullet. patterns. At first, very nice presentation. I just experienced with a threshold and lowered my GCE lapse from 1.1 to 0.06 seconds during startup. Interestingly, going to 10 megabytes increased the time. 4 megabytes was a sweet spot for my system. What is the recommended way to lower the value back to the default value after startup is completed? it temporary writes a gcconcert hold during startup and yeah after init hook the code is like it's 1 of the commonly suggested approaches and is I believe it's the right 1. So Peter, do you have any questions that you want to question? And maybe as a side note, we only have 4 minutes left and afterwards this happy weekend will still be open, but we will switch back to the talks. collection, but I just wanted to thank Ihor for his engagement in the community. And especially with, I'm a co-maintainer on orgnotor and he's helped us a lot with getting us up to date with newer versions of org and stuff like that. So just wanted to thank you in person. you had some bit talked about memory fragmentation. So is there any way to or is it fixed by Emacs itself? So you have like your OS. Yeah, Emacs releases the memory and then OS can rearrange it depending on the implementation of its memory manager. not so it could be that a mix is like right? Yeah. And you see, can release a part of this page just like here and there. And depending on the exact situation is your arm at each moment of time, or as may or may not be able to arrange really predict it. It really varies like you use Windows, you use Linux, you use like malloc, something else, but it has nothing to do with Emacs. It's just something you have to deal with. are giving the memory back to the operating system, not just holding it as used and then to our own memory, like stuff as Emacs that we do not need to interact with the operating but yeah, unfortunately, because nothing much can be done on Emacs. just holding it and when it needs more, then just get more from the OS. for example, there's something called image cache. And because Emacs stores images in uncompressed format, it can occupy quite a lot of memory. In particular, when you will like view PDFs, like you open 10, like 20 PDFs in 1 session, you may have like some image cache blowing up, But that's not common for people. So in the next This 1 command, which is, I think, Nemax 30, is called a malloc trim. A max malloc trim. It's interactive. So that can help to release some memory. I think the way it works is like forces OS to make use of the released memory. we are by the way, switch back to the next talk. But not release like, even Emacs says, okay, this memory is free, depending on the implementation, I might think, okay, but I still hold that memory associated with Emacs just in case Emacs needs more memories, and I can immediately put the data there without like more arrangement to allocate more. And this analog stream basically forces the OS to release it, like no matter what. Emacs, I have the feeling they are only using Emacs. So it would be kind of interesting if you just take like, I don't know, 2 gigabytes or something of memory and Emacs like does what it wants on that and the OS cannot really take it back. This was my idea when I it doesn't mean that OS cannot take it back. It may still like allocate certain portion, even technically free, but just for future. So this is where Malloc Dream works. It's like, it says, yes, OS, I really not going to hold this for this free memory. For sure. If you try this MX Malloc Gene, you will see like a few times to hundreds of megabytes of read immediately. I guess on the pad, there was Nothing else. I guess we can just close it. Thanks for the discussion. Thanks for answering the questions. And yeah, for your volunteer work. And yeah, for quietly panicking in the background, right? Yeah, I mean... You have to be quiet, you're panicking in the background.
Questions or comments? Please e-mail yantar92@posteo.net