LLM clients in Emacs, functionality and standardization
Andrew Hyatt (he/him) - ahyatt@gmail.com - https://urbanists.social/@ahyatt - http://github.com/ahyatt
Format: 21-min talk ; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings
Talk
Duration: 20:26 minutes00:00.000 Intro to the Talk 00:25.080 What are LLMs? 01:56.360 Power of LLMs (Magit Demo) 03:32.240 Drawbacks of LLMs (regex demo) 05:20.120 Embeddings 07:32.800 Image Generation 08:48.480 Fine-tuning 11:08.160 Open Source 12:02.840 The Future 14:08.200 LLMs in Emacs - existing packages 18:15.960 Abstracting LLM challenges 19:04.080 Emacs is the ideal interface for LLMs 20:01.960 Outro
Q&A
Description
As an already powerful way to handle a variety of textual tasks, Emacs seems unique well poised to take advantage of Large Language Models (LLMs). We'll go over what LLMs are and are used for, followed by listing the significant LLM client packages already for Emacs. That functionality that these packages provide can be broken down into the basic features that are provided through these packages. However, each package currently is managing things in an uncoordinated way. Each might support different LLM providers, or perhaps local LLMs. Those LLMs support different functionality. Some packages directly connect to the LLM APIs, others use popular non-Emacs packages for doing so. The LLMs themselves are evolving rapidly. There is both a need to have some standardization so users don't have to configure their API keys or other setup independently for each package, but also a risk that any standardization will be premature. We show what has been done in the area of standardization so far, and what should happen in the future.
About the speaker:
Andrew Hyatt has contributed the Emacs websocket package, the triples (making a triple-based DB library) and the ekg package (a tag-based note-taking application). He has been using various other LLM integrations, and ss part of extending ekg, he's been working on his own.
Discussion
Questions and answers
- Q: What is your use case for Embedding? Mainly for searching?
- A:
- I got you. It's kinda expand our memory capcity.
- A:
- Q: What do you think about "Embed Emacs manual" VS "GPTs Emacs
manual?
- A:
- yes GPTS actually how it's kind of embedding your document into its memory and then using the logic that provided by GPT-4 or other versions. I never tried that one but I'm just wondering if you have ever tried the difference
- A:
- Q: When deferring commit messages to an LLM, what (if anything) do
you find you have lost?
- A:
- Q: Can you share your font settings in your emacs config? (Yeah,
those are some nice fonts for reading)
- A: I think it was Menlo, but I've sinced changed it (I'm experimenting with Monaspace
- Q: In terms of standardisation, do you see a need for a
medium-to-large scale effort needed?
- A:
- I mean, as a user case, the interface is quite simple because we're just providing an API to a server. I'm not sure what standardization we are really looking at. I mean, it's more like the how we use those callback from the llm.
- A:
- Q: What are your thoughts on the carbon footprint of LLM useage?
- A:
- Q: LLMs are slow in responding. Do you think Emacs should provide
more async primitives to keep it responsive? E.g. url-retrieve is
quite bad at building API clients with it.
- A:
- Gptel.el is async. And very good at tracking the point.
- A:
- Q: Speaking of which, anyone trained/fined-tuned/prompted a model
with their Org data yet and applied it to interesting use cases
(planning/scheduling, etc) and care to comment?
- A:
- I use GPTS doing weekly review. I'm not purely rely on it. It's help me to find something I never thought about and I just using as alternateive way to do the reviewing. I find it's kind of interesting to do so.
- A:
Notes and discussion
- gptel is another package doing a good job is flexible configuration and choice over LLM/API
- I came across this adapter to run multiple LLM's, apache 2.0 license too! https://github.com/predibase/lorax
- It will turn out the escape-hatch for AGI will be someone's integration of LLMs into their Emacs and enabling M-x control.
- i don't know what question to ask but i found presentation extremely useful thank you
- I think we are close to getting semantic search down for our own files
- yeah, khoj uses embeddings to search Org, I think
- I tried it a couple of times, latest about a month ago. The search was quite bad unfortunately - did you try the GPT version or just the PyTorch version? - just the local ones. For GPT I used a couple of other packages to embed in OpenAI APIs. But I am too shy to send all my notes :D - Same for me. But I really suspect that GPT will be way better. They now also support LLama, which is hopeful - I keep meaning to revisit the idea of the Remembrance Agent and see if it can be updated for these times (and maybe local HuggingFace embeddings)
- I think Andrew is right that Emacs is uniquely positioned, being a unified integrated interface with good universal abstractions (buffers, text manipulation, etc), and across all uses cases and notably one's Org data. Should be interesting...!
- Speaking of which, anyone trained/fined-tuned/prompted a model with their Org data yet and applied it to interesting use cases (planning/scheduling, etc) and care to comment?
- The ubiquitous integration of LLMs (multi-modal) for anything and everything in/across Emacs and Org is both 1) exciting, 2) scary.
- I could definitely use semantic search across all of my stored notes. Can't remember what words I used to capture things.
- Indeed. A "working group" / "birds of a feather" type of thing around the potential usages and integration of LLMs and other models into Emacs and Org-mode would be interesting, especially as this is what pulls people into other platforms these days.
- To that end, Andrew is right that we'll want to abstract it into the right abstractions and interfaces. And not just LLMs by vendor/models, but what comes after LLMs/GPTs in terms of approach.
- I lean toward thinking that LLMs may have some value but to me a potentially wrong result is worse than no result
- I think it would depend on the use case. A quasi-instant first approximation that can readily be fixed/tweaked can be quite useful in some contexts.
- not to mention the "summarization" use cases (for papers, and even across papers I've found, like a summarization across abstracts/contents of a multiplicity of papers and publications around a topic or in a field - weeks of grunt work saved, not to mention of procrastination avoided)
- IMHO summarization is exactly where LLMs can't be useful because they can't be trusted to be accurate
- https://dindi.garjola.net/ai-assistants.html; A friend wrote this https://www.jordiinglada.net/sblog/llm.html; < https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/>
- I have a feeling this is one of the 'em "if you can't beat them join them" scenario. I don't see that ending with a bit global rollback due to such issues anytime soon...
- (discussion about LLMs, copyright, privacy)
- I spent more time than I was hoping to setting up some custom Marginalia(s?) the other day, notably for cases where the "category" is dynamic, the annotation/affixation function varies, the candidates are an alist of key-value pairs and not just directly the value, and many little specificities like that. Idem for org-ql many moons back, org-agenda, etc. That sort of workflow always involves the same things: learning/reading, examples, trials, etc. I wonder if LLMs could be integrated at various points in that recurring exercise, to take just a sample case.
- that's yet another great use case for LLMs : externalizing one's thinking for its own sake, if only to hear back the echo of one's "voice", and do so with an infinitely patient quasi-omniscient second party.
- oooh, might be a good one for blog post writing: generate some follow-up questions people might have
- Yeah, a "rubber duck" LLM could be very handy
- I'm sure there would be great demand for such a thing, to dry-run one's presentations (video or text) and generate anticipate questions and so on. Great take.
- I've seen some journaling prompts along those lines. I think it'll get even more interesting as the text-to-speech and speech-to-text parts get better. Considering how much people bonded with Eliza, might be interesting to see what people can do with a Socratic assistant...
Transcript
Captioner: bala
Q&A transcript (unedited)
I think this is the start of the Q&A session. So people can just ask me questions here. Or I think maybe these questions are going to be read by someone. Yes, thank you. Should I start doing that? I also know that there's questions in the either pad room, so I could start out answering those as well. If you prefer to read the questions yourself, by all means, or if you would prefer me to read them to you, that also works. I think it'll just be more interesting then. what is your use case for embedding, mainly for searching? searching. And I think it is very useful when you're searching for something in a vague way. Just to give you an example, I have a note system called EKG. I type all my notes on it. You can find it on GitHub and Melba. But I wrote something at some point a year ago or something. I wrote something that I just vaguely remembered. Oh, this was about a certain kind of communication. I wanted communicating to large audiences. There's some interesting tip that I wrote down that was really cool. And I was like, well, I need to find it. So I did an embedding search for something like, you know, tips for communicating. Like those words may not have been in what I was trying to find at all, But it was able to find it. And that is something that's very hard to do in other ways. Like, you know, if you had to do this with normal search, you have to do synonyms. And like maybe those synonyms wouldn't cover it. Like with embedding, you can basically get at like the vague sentiment. You're like, you know, you're, you know, you can really query on like what things are about as opposed to what words they have. Also, it's super good for similarity search. So you could say, look, I have a bunch of things that are encoded with embeddings that I want to show. For example, you can make an embedding for every buffer. You'd be like, well, show me buffers that are similar to this buffer. That doesn't sound super useful, but this is the kind of thing you could do. And so if you have a bunch of notes or something else that you want to search on, you'd be like, what's similar to this buffer? Or what notes are similar to each other? What buffers are similar to each other? It's super good for this sort of thing. And it's also good for this kind of retrieval augmented generation, where you sort of, you retrieve things and the purpose is not for you to see them, but then you pass that to the LLM. And then it's able to be a little bit more accurate because it has the actual text that you're trying to, that is relevant, and it can cite from and things like that. And then it could give you a much better answer that's kind of, you know, not just from its own little neural nets and memory. next question. What do you think about embed Emacs manual versus GPT's Emacs manual? trying to say. So I mean, if someone wrote that and wants to expand on it a little bit, but I think that maybe you're saying like you could embed, have embeddings for like various, like every paragraph or something of the Emacs manual. But it's also the case that like GPT is already for sure already read it, right? And so you could ask questions that are about Emacs and our ELISP or whatever part of the manual you want to find. And it will do a reasonably good job, especially the better models will do a reasonably good job of saying you something that is vaguely accurate. But if you do this retrieval augmented generation with embeddings, you can get something that is very accurate. At least I think. I haven't tried it, but this is a technique that works in other similar cases. So you can also imagine like, oh, this whole thing I said, like, oh, you can query for vague things and get parts of the manual, perhaps. I'm not exactly sure if that would be useful, but maybe. Usually when I'm looking things up in the Emacs manual or Elist manual, I have something extremely specific and I kind of know where to look. But having other ways to get at this information is always good. if you would like to read that yourself, or would you like me to read it for you? I've never tried. Yeah, the question is like OK, there is a difference between the kind of thing as I just described. I have not tried the difference with the EMAX manual itself. It'd be interesting to see what this is, but I would expect like these techniques, the retrieval augmented generation is generally pretty good. And I suspect it would, I would bet money on the fact that it's gonna give you, you know, better results than just, you know, doing a free form query without any retrieval augmented generation. When deferring commit messages to an LLM, what, if anything, do you find you might have When deferring anything to a computer, like, you know, I used to have to remember how to get places, and now, you know, on the few occasions which I drive, like, It could just tell me how to get places. So similar things could occur here where like, okay, I'm just leaving the LLM. And so I'm kind of missing out on some opportunity to think coherently about a particular commit. Particular commits are kind of low level. I don't think it's usually relatively obvious and what they're doing. And in this case, I think there's not much loss. But for sure, in other cases, if you're starting to get into situations where it's writing your emails and all this stuff. First of all, it's in 1 sense, I'm not sure you might be losing something by delegating things. On the other hand, you know, when you're interacting with these LLMs, you have to be extremely specific about what you want, or else it's just not going to do a good job. And that might actually be a good thing. So the question might be that maybe you might gain things by using an LLM to do your work. It might not actually even save you that much time, at least initially, because you have to kind of practice again super specific about what you want to get out of the output it's going to give you so like oh I'm you know maybe you know you're on the emacs devel mailing list and you're like okay write this email about this about this And here's what I want to say. And here's the kind of tone I want to use. And here's the like, oh, you might want to specify like everything that you kind of want to get into this. Usually it's easier just to write the email. But I think that practice of kind of understanding what you want is not something you normally do. And I think it's going to be an interesting exercise that will help people understand. That said, I haven't done that much of that, so I can't say, oh, yeah, I've done this and it works for me. Maybe. I think it's an interesting thing to explore. Let's see. Can you share your font settings in your Emacs config? Those are some nice fonts for reading. Unfortunately, I don't save those kinds of things, like a history of this. I've kind of switched now to, what was that? I think I wrote it down in the, I switched to MunaSpace, which just came out like a week or 2 ago, and is also pretty cool. So I think it's Menlo. The internal question, what font are you using? as well that it might be Menlo. OK, Cool. Yeah, next question. In terms of standardization, do you see a need for the medium to large scale effort needed? And then they also elaborate about it. I don't know if it's large scale, but at least it's probably medium scale. There's a lot of things that are missing that we don't have right now in emacs when you're dealing with LLMs. 1 is, a prompting system. And by that, I mean, you know, prompts are just like big blocks of text, but there's also senses that like prompts need to be composable and you need to be able to iterate on parts of the prompt. And so it's also customizable. Users might want to customize it. On the other hand, it's not super easy to write the prompt. So you want really good defaults. So the whole prompt system is kind of complicated. That needs to be kind of standardized, because I don't think there's any tools for doing something like that right now. I personally use my system, my note system for EKG. I don't think that's appropriate for everyone, but it does, I did write it to have some of these capabilities of composability that I think are useful for a prompt generation. It'd be nice to have a system like that, but for general use. I don't, this is something I've been meaning to think about, like how to do it, but like this, you know, if someone's interested in getting this area, like, I would love to chat about that or, you know, I think there's a lot of interesting ideas that we could have to have a system that allows us to make progress here. And also, I think there's more to standardization to be done. 1 thing I'd also like to see that we haven't done yet is a system for standardizing on getting structured output. This is gonna be super useful. I have this for open AIs API, cause they support it. And it's really nice, cause then you can write elist functions that like, okay, I'm going to call the LLM. I'm gonna get structured output. I know what that structure is going to be. It's not going to be just a big block of text. I could turn it into a, you know, a P list or something. And then I could get the values out of that P list. And I know that way I could do, I could write actual apps that are, you know, very, very sort of, you know, useful for very specific purposes and not just for text generation. And I think that's 1 of the most important things we want to do. And I have some ideas about how to do it. I just haven't pursued those yet. But if other people have ideas, I think this would be really interesting to add to the LLM package. So contact me there. So I'm not sure how long we're going to be on stream for, because this is the last talk before the break. If we are on the stream long-term, then great. But if not, folks are welcome to continue writing questions on the pad. And hopefully, Andrew will get to them at some point. Or if Andrew maybe has some extra time available and wants to stay on BigBlueButton here, then folks are also welcome to join here and chat with Andrew directly as well. Okay, awesome. So yeah, the next question is, what are your thoughts on the carbon footprint of LLM usage? I don't have any particular knowledge or opinions about that. It's something I think we should all be educating ourselves more about. It is really, I mean, there's 2 parts of this, right? They take a, there's a huge amount of carbon footprint involved in training these things. Then running them is relatively lightweight. So the question is not necessarily like once it's trained, like I don't feel like it's a big deal to keep using it, but like training these things is kind of like the big carbon cost of it. But like right now, the way everything's going, like every, you know, all, you know, the top 5 or 6 tech companies are all training their LLMs, and this is all costing a giant amount of carbon probably. On the other hand these same companies are pretty good about using the least amount of carbon necessary you know they have their own their tricks for doing things very efficiently. responding. Do you think Emacs should provide more async primitives to keep it responsive? Like the URL retrieve is quite bad at building API clients with it. Building API clients with it? people should be using the LLM client. And So right now, 1 thing I should have mentioned at the top is that there are new packages that I recorded this talk that you just saw several months ago. And so like Elama, there's this package Elama that came out that is using the LM package. And so for example, it doesn't need to worry about this sort of thing because it just uses LLM and package and the LLM package worries about this. And while I'm on the subject of things I forgot to mention, I also should just mention very quickly that there is now an open source model, Mistral. And so that's kind of this new thing on the scene that happened after I recorded my talk. And I think it's super important to the community and important that we have the opportunity to use that if we want to. Okay, but to answer the actual question, there has been some talk about the problems with URL retrieve in the URL package in general in EmacsDevEl. It's not great. I would like to have better primitives. And I've asked the author of Please PLZ to kind of provide some necessary callbacks. I think that's a great library. And I'd like to see that kind of like, It's nice that we have options, and that is an option that uses curl on the back end, and that has some benefits. So there's this big debate about whether we should have primitives or just use curl. I'm not exactly sure what the right call is, but there has been discussions about this. is async and apparently very good at tracking the point. to LLM, although I believe it's going to move to LLM itself sometime soon. anyone trained or fine-tuned or prompted a model with their org data yet and applied it to interesting use cases like planning, scheduling, et cetera, and maybe care to comment? I think it is interesting. Like this is what I kind of mentioned at the very end of the talk. There is a lot of stuff there like you could you know if you especially mean an LLM can kind of work as sort of like a secretary kind of person that could help you prioritize Still it's a slightly unclear how what the best way to use it is So I think there's more of a question for the community about like what people have been trying. I see someone has mentioned that they are using it for weekly review. And it's kind of nice to like, maybe you could read your agenda or maybe this for like weekly review. It could like read all the stuff you've done and ask you questions about it. And like, what should happen next? Or like, is this going to cause a problem? Like, I can, I can understand if that could happen? That's like, that's kind of nice. And this kind of people have had good success out of using these LLMs to bounce ideas off of are, you know, for, you know, I've seen people say that like they want, they use it for reading and they kind of dialogue with the LM to kind of like do sort of active reading. So you can imagine doing something similar with your tasks where it's sort of you're engaged in dialogue about like planning your tax with some with a alum that could kind of understand what those are and ask you some questions I think it. You know, if it'd be nice. So, the problem is like there's no great way to share all this stuff. I guess if you have something like this, put it on Reddit. If you don't have Reddit, I don't know what to do. I would say put it somewhere. At the very least, I could maybe open up like an LLM discussion session on the LLM package GitHub, But not everyone likes to use GitHub. I don't know. It'd be nice if there's a mailing list or IRC chat for this sort of thing. But there isn't at the moment. of the questions on the pad so far. There was also some discussion or some chatter, I believe, on IRC. I'm not sure. Andrew, are you on IRC right has the chatter. So if there's chatter, then I'm not seeing it. channel. Oh, yes. I mean, I could see the channel, but I missed whatever came before. So if there's anything you want to kind of call out, I can try to answer it here. who are participating in the discussion there who have also joined here on BigBlueButton, Codin Quark and AeonTurn92. So you folks, if Andrew is still available and has time, you're welcome to chat here and ask questions or discuss here as well. and thank you for reading all the questions. great talk and the discussion. there's any questions. If not, I will log off after a few minutes. there was a small chat about local alarms. Because chat dpt is nice, no, but privacy concerns, and it's not free and stuff. Which, so The question is, what is the promise for local models? Misral, which you could run. The LLM package allows you to use, I think there's 3 kind of local things you could use. Like many of these things, there's like many kind of ways to do the same sort of thing. So LLM is supporting OLAMMA and LLAMMA-CPP. And let's see, 1 other. Which 1 is it? And maybe that's it. Maybe the, oh, GPT for all. So each 1 of these kind of has slightly different functionality. For example, I think GPT for all doesn't support embeddings. And I hear that Olama's embeddings are kind of currently broken. But basically they should support everything. And the open source models are, so the local models are reasonably good. Like I don't think you'd use them and be like, what is this horrible nonsense? Like it's, it gives you relatively good results. Like it's not gonna be at the level of like GPT 3.5 or 4, but it's not far away from GPT 3.5, I think. for connecting the actual working servers for Olama? what you could do is you could like for example you could download Olama which is just a way of setting up local models and running local models on your machine. So typically what it does, you like download a program, let's say Olama. Then Olama will have the ability to download models. And so you could choose from just a host of different models. Each 1 of these things has a bunch of different models. So it downloads all these things to your machine. But I would say that the key problem here is that it requires a fairly beefy machine. Why I was asking, because you briefly mentioned that there are some Israeli servers. I understand that they run it like a government or stuff like that? No, no, sorry. People want everyone? that sounded like Israeli servers. I know. Although, I'm sure the governments are working on their own LLMs, et cetera. But yeah, basically your choices are spend a, I mean, if you use open AI or something or anything else, you're really not spending any money. Like I've never been able to spend any money on OpenAI. Like unless you're doing something very intensive and really are using it to, you know, if you're using it for your personal use, it's just hard to spend any money. But on the other hand, it's not free. So you can, you know, There's no question about that. The problem is that it has a bad track record on privacy. This is probably the number 1 reason why you might want to use a local AI, a local LLM. Another 1 is like, you may not agree with the decisions. You know, there's a lot of trust and safety stuff that these companies have to do. Like they don't want like the LMs to kind of like give you, like tell you how you can make meth or how you can make a bomb, which they would do. They would totally do it. So, But each time you kind of restrict what is happening with what you can get out of the LM, it gets a little worse. So some people I guess even open source language modules will soon have HR spaces because it's simply a legal issue. probably will be, although I don't know of any offhand, that will are completely uncensored. I know people are interested and are running uncensored models. I don't know how to do it. I think it's a little bit dubious, but some people do want to do it. There's another reason for using local servers. Do you have any recommendation for models to run locally and also comments on whether a GPU is required? Usually a GPU, well, you can run it without a GPU, but it does run much better. Like for example, I think when I used, Lama is sort of like a standard. This was the model for that Facebook came out with for local use. And It was, yeah, it's good. It's, but it's now it's I think, Mistral is kind of like has a better performance, But there's also different model sizes. There's 7B, like the Lama 7B is OK. The Mistral 7B, 7 billion, are like, basically it'll take like, you can run it with like 16 gigs of RAM, is pretty good. It's probably about as equal to the LLAMA13B. Those are the number of parameters, if I remember correctly. And then there's a 7B, which I've never been able to run. And even if the 7B, if you run it without a GPU, it takes quite a while to answer. I think I've had experiences where it took literally like several, like 5 minutes before it even started responding, but you do eventually get something. And it could be that like things have gotten better since the last time I tried this, because things are moving fast. But it is super recommended to have a GPU. This is the problem. It's kind of like, yes, free software is great. But if free software is requiring that you have these kind of beefy servers and have all this hardware, that's not great. I think there's a case to be made. Yeah, yeah, that's right. it would be nice if FSL for all things could run something for open source model. And not free, but the key point is that it's Libre? I'll have to look it up, but I haven't explored this yet. But Google's server, which LLM does support, supports arbitrary models. So you can run LLMA or things like that. The problem is that even if you're running Mistral, which has no restrictions. So this is the kind of thing that like the Free Software Foundation cares a lot about. Like you want it to be like no restrictions, legal restrictions on you as you run the model. So even if it's running Mistral, just by using the server, the company server, it will impose some restrictions on you probably, right? There's gonna be some license that you have to, or something you have to abide by. So I think, yes, it depends on how much you care about it, I guess. I should find out more about that and make sure that it's a good point that I should, you know, people should be able to run free models over the server. So I should make sure we support that in the LLM package. So, is there any other questions Or is otherwise we can end the session. Yeah, all right. Thank you. Thank you. Thank you everyone who listened. I'm super happy like I, the interest is great. I think there's great stuff to be done here and I'm kind of super excited what we're going to do in the next year, so hopefully, like next year, and the conference we have something even more exciting to say about LLM and how they can be used with Emacs. So thank youQuestions or comments? Please e-mail ahyatt@gmail.com