Back to the talks Previous by track: Sunday opening remarks Next by track: This Year in Org Track: General

Results of the 2022 Emacs Survey

Timothy (he/him, IRC: tecosaur)

00:00.000 Introduction 00:26.040 The 2020 Emacs User Survey 01:54.360 The design of the survey 03:18.560 Survey frameworks 04:01.021 Writing a new survey framework in Julia 05:40.200 In practice 06:50.560 Results 07:39.600 Going forward 09:11.160 Responses 11:17.000 Geography 12:32.280 Gender 14:04.440 Occupations 16:11.320 Free and open source software 17:02.440 Emacs versions 17:56.360 Languages 19:25.800 Prose 20:03.400 Packages 21:04.920 Documentation 21:38.440 Moving forward 22:44.200 Time 23:26.200 How long the survey is open for 24:25.200 Plan going forward

Description

If I am giving this talk, then a month ago the 2022 Emacs Survey will have occurred! I will go through the motivations, implementation, results, and plans of the Emacs Survey.

Outline:

  • A quick overview of the main results of the 2022 Emacs Survey
  • Discussion of motivation, implementation, and future plans

Discussion

Notes

  • 2020: 7000 participants in the survey (Adrien Brochard)
  • 2022: new survey framework: julia (language) + Genie Framework
  • 2022: 6600 responses (1000 partial); 115 nations; 96% male (better among younger people)
  • the plots are using Makie, a Julia plotting library

From IRC

  • Magit
    • magit is more popular than org mode :-O
    • Magit was more popular in 2020 too, however the gap has narrowed in the 2022 results 😀
    • magit is a miracle
    • I've read about people who use Emacs for magit ONLY!
    • magit deserves every heart given
    • I love magit and I am not even a developer so I can only imagine how amazing it is for people who use it for actual work
    • Especially since for programmers, which is a large chunk of emacs users, version control is probably more important than org I would assume
    • ieure, i see why people might think that, but svn is non negotiable at work. my question was: does magit do anything useful for all merging VCS that could be included in VC?
    • The difference between terminal git and magit is huge for me, even as someone who doesnt use it every day
    • If your repos are mostly standard source and not dependent on huge binary assets or anything, you're missing out on better VCS first and foremost
    • but judging by the way users extol magit, it must do something better than vc that can be applied to all merging version control systems (non locking ones). if someone can tell me what it is and i find it useful, i might add it to vc
    • I use both VC and Magit actively and I've found it's mostly about the enhanced interactivity with Git. It's very visual and also allows you to pretty much interactively set/unset flags to various Git commands as you would on the terminal but much faster all thanks to transient.el.
    • +1 for transient being a big part of why people find magit easier to use.
  • New versions
    • We have to be careful about selection bias when it comes to versions used.
    • my worry is that people on older versions are less up to date with what is going on and might not hear about the survey :/
    • the way i see it packages don't really have to officially support versions of emacs older than the latest release
    • unless they're big and see frequent updates for new platforms
    • i.e. TRAMP
    • otherwise users of older versions will just backport those packages themselves

Feedback

  • Survey framework
    • oh yeah, the new framework was super pleasant to use, am a fan
    • Liked how there was a way the responses were saved (locally?), and there was a possibility to resume answering later!
    • from a user's perspective, the UX was amazing including download-options for my own answers. Impressive.
    • I loved this year's platform too! so another +1 here
  • found the pie charts a little hard to follow, e.g. what color related to that package etc. Maybe add more labeling to the chart itself?
    • on that topic, some of the colors were also very close (eg. Haskell and Java in the language graph)
  • I'd suggest that bar charts with more than 5 colors be labeled at the bar versus in the margin. (Also take mercy on the color blind)
    • i'd suggest using a different tiled pattern for each bar instead of colors. That makes them easier to follow, especially for the color blind, or for people who cannot see colors well at night (me)
    • Speaker: Hopefully this isn't too bad for the colour blind, I chose Paul Tol's colour schemes for that reason (among others)
  • i took part in that survey! proud of accomplishment
  • First, a general BIG THANK YOU for your survey!
  • nice, thanks for working on all of this
  • this is an amazing talk.
  • org mode huge, always forget how much of that pie it brings in
  • nice pie chart :)
  • lobste.rs getting bigger lately?
  • i am going to be interested in how the results differ when you factor out /r/emacs
  • the graphics are gorgeous
  • hence while I sprinkle in phrases like "within this more engaged subset of users ..." 😉
  • Hmmm, I didn't think about using Emacs as my chat / email client as counting as writing prose in Emacs. But it does.
  • (and the same with Org 😛)
  • magit applies to many mode, there is only one org-mode ;]
  • I don't think I can stop using magit
  • man who are these people filling the survey in less than 10 min?
  • Excellent talk! Thank you.
  • thanks for the talk btw, it was very interesting
  • Very informative. Nicely done!
  • Excellent work, looking forward to further analysis
  • Very nice presentation!
  • Thanks Timothy, great talk!
  • Thanks, great talk! clapclapclap
  • Great presentation
  • next year, I will join survey
  • Great talk and thanks for all this nice work!!
  • my compliments for your great analysis of the survey's data

Questions and answers

  • Q: will there be another survey next year as well?
    • That's the plan, and the year after, etc.
  • Q: Do Emacs developers take into account the survey results? I mean, they are volunteers working on what they find useful/interesting for them, which is of course great.
    • A: There's no obligation for emacs-devel or Emacs package maintainers to do anything in response to the survey results, but hopefully the results will be able to inform development choices they make.
  • Q: are you planning to have the software used beyond Emacs surveys?
    • A: It could well be, it's written as a general survey platform
  • Q: Is the survey software available in source code via Gitlab/Github/...? What's the license?
  • Q: Are the raw results available so we can run some data analysis ourselves?
  • Q: Any specific reason why you chose Julia as a language to code the survey (curious about it)?
    • A: I use it a lot, and like doing so :)
  • Q: Do you have any insight on the degree of selection bias (the respondents may represent a very particular segment of the overall users, in terms of motivation to respond). The nb of days of response after announce may indicate that respondents are very much in touch with emacs news.
    • A: We can try to look at the degree to which the survey referrer (r/emacs, HN, etc.) changes the survey results, but ultimately this is a hard question. At the end of the day, the way I view things is we can just do our best to investigate how much of an effect is seen in the results, but this is the best shot we have availible.
    • A2: That said, we can compare a few particular statistics to other surveys done in a wider population to gauge how close our results are to them, but that assumes that those other surveys (e.g. Stack Overflow's developer survey) are themselves representative of the Emacs user base, itself can be quite an assumption.
  • Q: Are the pies in gnuplot or something else?
    • A: The plots are using Makie, a Julia plotting library.
  • Q: Thoughts on an emacs package to fill out the survey? Just more work for you ;)
    • A: Hopefully more work for someone else ;)
  • Q: Is the survey framework open sourced already or still in the works?
  • Q: what did you use to draw the diagram in p.7?
    • Inkscape
  • Q: you might go into this in a second, but are there any specific questions you are looking to go into when you find the time?

Transcript

[00:00:00.000] Hello everyone and thanks for tuning in. I'm Timothy, and in this talk, we'll be going over the 2022 Emacs User Survey. Since this is the first time we're discussing this, we'll be going over the survey itself a bit, how it's being put together and run, and then we'll have a little taste of the results with more analysis to be published in the future.

[00:00:26.040] To start with though, a bit of background. So in 2020, we had an Emacs User Survey run by Adrien Brochard. Now this is, to the best of my knowledge, the first time that a large-scale Emacs User Survey has actually been run. About 7,000 people responded to the survey, so in many respects, it was quite successful. And what's significant about this is that with this being the first time that a large-scale survey has been run, it actually provided some insight into questions about how the community is using Emacs that allow for much better guesses than just speculation based on the small number of people who respond on the mailing list usually. So, why are we doing another survey? Well, to start with, in order to get the most value out of an Emacs User Survey, it's quite helpful if the information in it is recent. Furthermore, we can actually get some more value if we can examine trends, shifts in the way that people are using Emacs, where the pain points lie, what people are enjoying the most, etc. So in both of these respects, it's to our benefit if the survey is actually a regular event, instead of just something that's run once.

[00:01:54.360] Now, with this in mind, we ran the 2022 Emacs User Survey with the plan that this will actually become an annual event. In the design of the survey, there are a few goals here. The main one is of the user community. Now, user community is a rather nebulous phrase. In this case, what's meant in particular is value in questions, for example, things like pain points with Emacs, which versions people are using, which capabilities people are making the most use of, which could potentially be helpful to both emacs-devel but also our collection of Emacs package maintainers and the whole community. Actually, I think going beyond just the packages, we've also got the people who develop tutorials, guides, and all of that sort of surrounding activity, which can benefit from a clear understanding of how Emacs users use Emacs. Separately to that, I think as an Emacs user myself, that it's rather interesting to see how other people are using Emacs and what their experience is. So yes, basically, you've got utility and interest as the two separate driving factors as we try to pick questions, which actually can give us all of this without taking up too much of the respondents time.

[00:03:18.560] Now, last time in 2020, the Emacs survey that Adrien ran used, I think Google Forms, if I recall correctly, with an option to send in responses manually. This worked, but it's not great, particularly given that this is for a survey being run in an ardently FOSS community. Ideally, we actually want to find a survey framework that respects the priorities of users, is open source, ideally free and open source, and is a relatively pleasant experience. Unfortunately, looking at available options, it seems that one always has to compromise on at least one, if not all of those criteria, which is quite far from ideal.

[00:04:01.021] So what's the obvious solution? Okay, we should just write a new survey framework. Obviously, this is easier said than done. But around a year ago, I actually started doing exactly this. I've used the programming language Julia quite a bit on a day to day basis. And there just so happens to be a web framework for that called Genie. So I thought I'd give it a shot. And well, here we are today. I ended up putting something together, which could take a set of questions written in Julia and using a survey library, actually pass that into this helpful structure and then construct HTML forms based on that, and ingest results from the HTML forms, and just sort of handle that altogether. Now, all of this ends up being fed into an SQLite DB. So everything's there, even part responses. One of the goals with the actual design of this has been to just minimize what's actually done on the client side. So that means JavaScript, cookies, the whole lot. Basically, as far as this could reasonably be taken, we've just got static HTML being shoved to the user, or respondent rather. And then we just take an HTTP post request back and update the results that way. Now by doing things like actually paging the survey, we can allow for incremental saving of results and a few other niceties while essentially preserving an experience that doesn't really require any data of any particular capabilities, which is sort of a nice, clean, minimal experience as far as I'm concerned.

[00:05:40.200] So how does this actually look like in practice? Well, one of the nice things about this is because the question itself is written in Julia, we can get some nice features like custom validators and other fancy behavior and directly specify how we actually want questions to be registered in the database. So here we have, for example, two questions we had from this email survey. One is a multi-select. Another one is just putting in the number of years people have used Emacs for. I think this gives a brief overview of the capabilities. One of the things I'd like to draw particular attention to here is in the multi-select, you'll see an array of options, the first one of which actually maps for different value to be stored for convenience. And then the final one is a special one, :other, and you can see that's a bit different to the rest where it's got that colon function, it's a symbol, not a string. And this is quite a nice one because the way that this framework's been designed, when we have an :other value like that, instead of it just being a sort of tick box "Other", it actually provides the option to write your own different response to all of the above.

[00:06:50.560] Okay, so at the very end, we've now got a completely FOSS survey framework, rather nice. So the set of what were these... Decent array of input types. It would be nice to expand, but at the moment I think we could just about describe it as a rich set. Zero JavaScript required, but a little bit useful for progressive enhancement. As demonstrated, we can get some fancy validation going on. And then because we've got the results tied into this quite nicely, we can actually have them available live and in quite a number of formats. I'm not sure how much you saw in the architecture diagram, but we've got all sorts of things here. CSV, TSV, plain text, JSON, just grab a copy of the SQLite database, but only the relevant bits. Or something called JLD2, which preserves a lot of type information and a few other nice things.

[00:07:39.600] Now, what are we going to do going forward from here? Well, there are a few minor issues here. For example, there's a memory leak issue which is going on, resulting in the service being restarted, I think every day or two, while the survey was running. I actually have the suspicion that that's largely responsible for about 1% of respondents, which is about 75 people, who described the survey experience as not great. Overall though, the feedback has been quite positive. There's been some detailed written feedback, but just from the quick great/okay/not great options, we had about two-thirds of people saying that the user experience was great, which is really nice to hear the first time being run. A few other things would be nice to add, for example, in future control flow. By this, I mean the option to present different questions based on previous answers would be quite nice to streamline the experience. For example, having a set of questions for first-time respondents or people who are involved in the packaging side of things without actually cluttering the experience for everybody else. That'd be quite nice. Further to this, all of this, I think on top of the standard web interface, it'd be quite nice to actually write a server API. And the particular reason why I mentioned this is because this could potentially allow for basically an Emacs survey package. I mean, we already use Emacs for so many things, might as well fill the survey out from within it as well. Okay, so this is how the survey has been conducted.

[00:09:11.160] Now, what are the responses look like? Now, at this stage, I was actually hoping to get into some somewhat sophisticated analysis because there's quite a bit that you can dig out of the data responses that we've received. However, unfortunately, I've been much more limited on time than I'd hoped for, so that's going to have to come later. For now, we're just going to take a bit of a peek at some of the really basic answers. Well, it's not even really analysis. Expect to see lots of pie charts, basically. But there's still a bit of interest there, so we'll go through a bit of that and just give a bit of a tease as to what might come in the future. So to sum up for starters, we've had about 6,500 responses. It is worth noting that a thousand of those are partials, so people who gave up on the survey partway through. Given that the 2020 survey had about 7000 responses, I'll tell you we're basically on par here. This ran over a month and interestingly, about half of these respondents did not participate in the 2020 survey. I think at this point, it's not really clear what to make of that. There's been a two-year gap between the surveys. It's been done, well, it's been done quite differently, and yes, there's not enough, really, to say. What could be interesting though is actually, once this starts running regularly, we can see whether there's regular churn in the survey respondents, or if we have a consistent core with people who respond each year, and then just people who come by every now and then and go, "Oh, why not respond to this year's survey?" But we're going to have to wait a bit to actually see how people treat the survey. Now these responses came from quite a wide range of places we've got 115 nations represented here. Collectively, these ones have spent about a thousand hours giving us information. So I think, if nothing else, just from the effort that people have put into actually giving us useful data to work with, it's worth giving at least a good effort to actually trying to extract some value out of these responses.

[00:11:17.000] Now, overall we found a lot of responses came from America, no surprises there, but as mentioned, we've got a good mix around the globe. The usual suspects for the rest of the responses, a whole bunch in Europe, a whole bunch around Asia, a bit in Australasia as well and yes, there's nothing particularly surprising here, there's a lot of inline expectations. What I find a bit more interesting, though, is if we actually normalise the number of responses from each nation by the population of said nations, essentially giving a popularity of Emacs or at least of Emacs early respondents for each nation, we end up finding that Europe, particularly Scandinavia, becomes a bit of a hotspot. So I'm not sure what's going on in Sweden, Finland and Norway, but it seems to be particularly popular around there. It's also worth noting that we now find that the proportion of respondents in countries like America, Canada, Australia and most of Europe actually becomes quite comparable with each other, which yes, once again, sort of lines up with these responses, expectations from the last slide.

[00:12:32.280] Okay, getting into some of the other demographic information. The demographic information was new to this survey. In the 2020 survey, people were asked what they think of being asked about some demographic information in a future survey, and the overwhelming response is, "Sure, I don't really mind." And so that's what we've done here. One of the ones of somewhat interest is the age gender breakdown. So we expect Emacs to be used predominantly among people in software and programming and within the industry, I think it's quite widely documented to have about a sort of 75-25%, roughly, split between male and female. Interestingly, in Emacs, it's a much more aggressively-biased result. So we had about 96% of respondents are male with just 4% for the rest. Interestingly, though, if we look at the young respondents, say for example, under 25, we go from 96% male to 88%. So it's fair to say that the young respondents are in this respect, a somewhat more diverse group. Hopefully, as future surveys go on, we'll see this continue not die off to the sort of well, at this point, it's more like 99% if you look at the older ages. But we'll see.

[00:14:04.440] Occupations was an interesting slide as well. Interesting question as well. We've got the usual suspects here. I mean, it's a text editor, well, Lisp machine masquerading as a text editor, mainly used for programming, and so we expect lots of software development and that sort of thing. But that's only about just over half of the responses. We've got a huge chunk from academia, and then really just an odd bag of all sorts of other things, including things which you wouldn't really associate with programming and software at all. Things like creative writing, publishing, legal, yes. And then you've got this chunk of Other, which is I think here is the fourth most popular option here. And what we have here is about 500 different responses from a huge range of activities. It's really quite interesting to read things like I think, things like "naval officer", and just... All sorts of surprising occupations for Emacs. And I think this is a particular area because I imagine compared to other code editors, sort of your VS Code, remember like that Emacs may have a particularly diverse set of industry occupations represented in its users. Now, if you look at where the response actually came from, we've got the usual suspects up top, Hacker News and r/emacs. But then we actually get a much more graduated breakdown than in the 2020 survey. We do think familiar results here like IRC, Telegram, Emacs China, and Twitter. But now you've got a few new entries, things like the Fediverse, Discourse, Matrix, which didn't pop up previously. So I think this is yes, quite a nice sign in terms of actually hitting a wide range of pockets of Emacs users across different platforms, which bodes well for the potential representiveness of this survey.

[00:16:11.320] Unsurprisingly, if we're talking about Emacs and particularly people who are quite engaged in it, which are the respondents to this survey, we find that we also get quite a high degree of care for free and open source software. So if you have a look here, only about a quarter of users didn't express a strong preference towards FOSS software. In fact, we had over a quarter saying that they would accept significant or even any compromise to use a FOSS user software over a proprietary alternative, which given the nature of Emacs, not terribly surprising, but a strong showing nonetheless.

[00:17:02.440] Now, let's start getting to things which are actually useful for potential Emacs development and packaging. If you're thinking about supporting Emacs versions, it looks like you can do fantastically well in terms of hitting most users if you support Emacs 27+. That hits about 96% of respondents. Interestingly though, you can actually make an argument for being even more aggressive. I mean, if you have a look at Emacs 28+, that's still over three quarters of respondents. We've got, at this point, a quarter using the unreleased HEAD version, even though it's getting close to release. Obviously here, as stated, we're hitting a sort of more engaged with the community subset of Emacs users, but still, I think it's interesting to see that with Emacs's increasingly frequent update schedule, that users are actually picking up those updates quite promptly as they roll out.

[00:17:56.360] Continuing on with how people actually use Emacs: languages. We've got the usual suspects here: lots of Python, quite a bit of JavaScript and C, lots of shell. What I find quite interesting though is if we actually bring in the 2020 Stack Overflow language usage survey data, and that maps quite well to the array of language options we provided here. They had a general Lisp option, which I've folded into Common Lisp since they listed Clojure separately. I think that seems like a fairly safe bet. But other than that, the only languages that we missed are Scheme and Elisp. What we can do is we can look at the relative popularity of different languages from our Emacs user survey compared to Stack Overflows. What do we find? Well, Clojure and Common Lisp far above the rest, I imagine in no small part due to the fantastic SLIME and Cider packages. Following that, we see Haskell being particularly prominent, and then a collection of other languages, your Erlang, Elixir, Julia, Perl and the rest. And then lastly, if we have a look at the ones which have significantly diminished popularity compared to Stack Overflow, we end up with, I think, what I could probably cast as more enterprising languages. Things like C#, Java, Typescript and the like.

[00:19:25.800] So, that's interesting. Now, earlier when we were looking at the split of Emacs users, we found that we actually had a fair few in more creative areas, like writing and publishing. So if looking at prose, we'd expect a decent chunk to be using Emacs for prose, but it's actually more than just a little bit, it's a little slice. We've got a whopping about a third of users saying they frequently use Emacs for writing prose. I'd imagine that the availability of things like Org mode and AUCTeX probably help like this.

[00:20:03.400] Moving on to other packages, or more packages, we've actually got a very similar split here to the 2020 survey. Org has seen a bit of a growth in popularity. We've got some new arrivals here as well. For example, Vertico has popped onto the scene and overtaken Ivy here, along with a few other new packages like Consult. Other than that, quite comparable. What's rather interesting, though, I find here is that when you have people who listed a small number of packages, they actually predominantly listed packages other than the most common set. So if we have a lot of people who only listed one package, basically two-thirds of that, or actually three-quarters of those responses were saying other packages, despite the fact that overall packages other than the highlighted selection here only constitute a quarter of responses. So there might be something a bit more to look at there.

[00:21:04.920] Now when people are using packages, we also asked what types of documentation people would like to see more of on package READMEs. Basically we've got a big mix here. It seems like generally people are interested in seeing more in various forms, whether it be tutorials, overviews, screenshots, comparisons, or clips and videos. So full READMEs with a lot of context seem to be quite desirable from this.

[00:21:38.440] Now moving forward, what are we going to do? So 800 people gave some detailed feedback on the survey. That's quite nice. I'm going to be taking a good read of all of those responses and use that to improve the process and also the set of questions. Now all of you can also give some feedback on the questions, both that you found most useful in this survey, ones that you think might not add much value, and/or new questions that you think might be a good addition. Once I've done a bit more analysis, particularly the more sophisticated analysis which I'm planning, which will probably come out actually maybe in the first quarter of next year, we can see which questions there seem to have provided the most interesting or surprising results and those are probably worth keeping. Lastly, once we actually have an API and potentially even an Emacs package, we could automate a large number of the questions, things like Emacs version, set of packages used, and that could just streamline the experience of actually filling out the survey, make it a bit more frictionless.

[00:22:44.200] Now talking of the question of questions, a quick survey is a good survey. If we're asking people to dedicate their time to fill out this, it's good to try to get as much value without asking them to donate much of their time. How has the survey done in this respect? I'm actually very happy with how it's done. We get a few comments from the feedback saying that it was a bit of a long side, but the median time was about 12 minutes, which doesn't seem too bad, and most commonly we saw people completing it in about 8 minutes. For a once-per-year survey, I think this seems fairly reasonable. Getting closer to a 5-10 minute range would be nice, but this isn't far off.

[00:23:26.200] Lastly, we're also going to be considering how long the survey is open for. So from the initial opening date, what we have here is a plot of the page which people ended up on and when they started the survey. So what we can see is a huge spike in the first few days. I've just realised that this plot is actually labelled incorrectly. Please disregard the minutes to complete the survey. This should be days after survey opening that a response is actually submitted. And what we have here is a big spike in popularity in the first week basically, and then it trickles down to a fairly consistent level after that. I'm about to publish a last call for survey responses, so I'll see if any final bump happens, but this indicates that we can probably just have the survey open for a week or two and that should be sufficient.

[00:24:25.200] Alright, so what's the general plan going forwards? Well, as stated earlier, the idea is to run this annually and then consistently improve the questions, the experience, and the analysis that's done. This year has been the hardest by far because a lot had to be set up from scratch. The hope is that moving on from here, a lot of it can be reused. For example, with my comments about more sophisticated analysis being down the line, once that's all worked out, as long as nothing changes too drastically, we should be able to reuse a lot of that work quite easily in future years. Alright, that's it for now. Hopefully, you've found this an interesting peek into how the survey is operated and some of the initial results, and hopefully, I'll see you around next year for the 2023 survey. Thanks for listening.

Captioner: sachac

Questions or comments? Please e-mail emacsconf-org-private@gnu.org

Back to the talks Previous by track: Sunday opening remarks Next by track: This Year in Org Track: General