Back to the talks Previous by track: Emacs, editors, and LLM driven workflows Next by track: Common Lisp images communicating like-a-human through shared Emacs slime and eev Track: Development

Emacs and private AI: a great match

Aaron Grothe (he/him) - Pronunciation: Air-un Grow-the, https://www.grothe.us LinkedIn: https://www.linkedin.com/in/aaron-grothe/, ajgrothe@yahoo.com

Format: 40-min talk ; Q&A: BigBlueButton conference room Etherpad: https://pad.emacsconf.org/2025-private-ai
Etherpad: https://pad.emacsconf.org/2025-private-ai
Status: Q&A to be extracted from the room recordings

00:00.000 Introduction
00:48.180 Overview of talk
01:08.060 Why private AI?
03:16.020 What do I need for private AI?
05:16.348 Emacs and private AI
06:13.220 Pieces for an AI Emacs solution
07:56.340 Config file
08:52.100 Demo: Who was David Bowie?
10:21.700 Hallucinations
10:42.180 Next question: What are sea monkeys?
11:57.180 Writing Hello World in Emacs Lisp
12:32.580 Pieces for a better solution
13:36.900 What about the license?
14:56.580 Are there open source data model options?
15:14.520 Things to know
20:07.420 Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways?
21:30.740 Q: What specific tasks do you use local AI for?
22:16.880 Q: Have you used any small domain-specific LLMs? What are the kinds of tasks they specialize in, and how do I find and use them?
22:46.540 Q: Are the various models updated regularly? Can you add your own data to pre-built models?
23:48.056 Q: What is your experience with RAG? Are you using them and how have they helped?
24:38.834 Q: Thoughts on running things on AWS/digital ocean instances, etc?
25:31.078 Q: What has your experience been using AI for cyber security applications? What do you usually use it for?
26:59.660 Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be?
28:14.126 Q: What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size?
29:52.380 Q: Are there "Free" as in FSF/open source issues with the data?
31:09.557 Q: Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have?
32:18.060 Comment: File size is not going to be the bottleneck, your RAM is.
34:46.900 Q: Have you used local models capable of tool-calling?
35:44.860 Q: Will the models reach out to the web if they need to for more info?
36:31.300 Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow?
37:36.578 Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though.
38:41.660 Wrapping up

Duration: 39:34 minutes

Description

When experimenting with using AI with Emacs, many users have concerns. A few of the concerns that people have are the possibility of their information being shared with the AI provider (either to train newer models, or as a potential revenue source), the possibility of running up unpredictable costs with their cloud provider, and the potential environmental impact of using cloud AI. Using Private/Local AI models provide an AI environment that the user can fully control. User can add to it incrementally over time as their skills and experience grows. This talk will be a quick intro to using Ollama Buddy, Ellama, and gptel to add the ability to have a private AI integrated into your Emacs session. We’ll start with the basics and show people how they can add AI to their workflow safely and securely. Hopefully, people will come away from the talk feeling better about our AI futures.

The talk will start with a simple implementation: Ollama and Ollama Buddy and a couple of models. After that it will build on that for the rest of the 20 minutes.

The goal is show the users multiple ways of using AI with Emacs and let them make their own choices.

About the speaker:

AI is everywhere and everyone is trying to figure out how to use it better. This talk will be a quick introduction to showing some of the tools and techniques that a user can do to integrate AI privately and securely into their Emacs workflow. The goal is to help people take the first steps on what will hopefully be a productive journey.

Discussion / notes

Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways?
- A: Big fan, firstly; also Deepseek will tend to have errors and I'm familiar with the data so easy to spot halucinations
- A: First off, huge fan of David Bowie. But I came down to it really taught me a few things about how the models work in terms of things like how many kids he had, because Deepseek, which is a very popular Chinese model that a lot of people are using now, misidentifies him having three daughters, and he has like one son and one, one, I think, two sons and a daughter or something like that. so there's differences on that, and it just goes over... there's a whole lot of stuff because his story spans like 60 years, so it gives good feedback. That's the real main reason I asked that question because I just needed one... That sea monkeys, I just picked because it was obscure, and just always have, write, I used to have it write hello world in forth because I thought was an interesting one as well. It's just picking random ones like that. One question I ask a lot of models is, what is the closest star to the Earth? Because most of them will say Alpha Centauri or Proxima Centauri and not the sun. And I have a whole 'nother talk where I just argue with the LLM trying to say, hey, the sun is a star. And he just wouldn't accept it, so.
Q: What specific tasks do you use local AI for?
- A: refactoring for example converting python 2 to python 3, cybersecurity researching
- A: I like to load a lot of my code into and actually have it do analysis of it. I was actually going through some code I have for some pen testing, and I was having it modified to update it for the newer version, because I hate to say this, but it was written for Python 2, and I needed to update it for Python 3. And the 2 to 3 tool did not do all of it, but the actual tool was able to do the refactoring. It's part of my laziness. But I use that for anything I don't want to hit the web. And that's a lot of stuff when you start thinking about if you're doing cyber security researching. and you have your white papers and stuff like that and stuff in there. I've got a lot of that loaded into RAG in one model on my Open WebUI system.
Q: Have you used any small domain-specific LLMs? What are the kinds of tasks they specialize in, and how do I find and use them?
- A: On the todo list but not something I have used very much yet
Q: Are the various models updated regularly? Can you add your own data to pre-built models? +1
- A:
Q-piggy-back: Will the models reach out to the web if they need to for more info?
- A: haven't
Q: What is your experiance with RAG? are you using them and how have they helped?
- A:
Q: Thoughts on running things on AWS/digital ocean instances, etc?
- A: prefer not to have the data leave home; AWS and DO works okay, oracle has some free offerings but tend to work locally most often
Q: What has your experience been using AI for cyber security applications? What do you usually use it for?
- A: Yeah, really, for cybersecurity, what I've had to do is I've dumped logs to have it do correlation. Keep in mind, the size of that Llama file we were using for figuring out David Bowie, writing the hello world, all that stuff, is like six gig. How does it get the entire world in six gig? I still haven't figured that out in terms of quantization. So I'm really interested in seeing the ability to take all this stuff out of all my logs, dump it all in there, and actually be able to do intelligent queries against that. Microsoft has a project called Security Copilot, which is trying to do that in the Cloud. But I want to work on something to do that more locally and be able to actually drive this stuff over that. That's one also on the long-term goals.
Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be?
- A: Paid models, I don't mind them. I think they're good, but I don't think they're actually economically sustainable under their current system. Because right now, if you're paying 20 bucks a month for Copilot and that goes up to 200 bucks, I'm not going to be as likely to use it. You know what I mean? But it does do some things in a way that I did not expect. For example, Grok was refactoring some of my code in the comments and dropped an F-bomb. which I did not see coming, but the other code before that I had gotten off GitHub had F bombs in it. So it was just emulating the style, but would that be something I'd want to turn in a pull request? I don't know. But, uh, there's, there's a lot of money going into these AIs and stuff, but in terms of the ability to get a decent one, like the llama, llama 3.2, and load your data into it, you can be pretty competitive. You're not going to get all the benefits, but you have more control over it. So it's a balancing act.
Q: What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size? The top tier paid models are up to 200k now.
- A: By default, the context size is I think 1024. But I've upped it to 8192 on this box, the Pangolin, because it seems to be, for some reason, it's just a very... working quite well. But the largest ones I've loaded have been in the... have not been that huge. I've loaded this... the last biggest one I've done... That's the reason why I'm planning on breaking down and buying a Ryzen. Actually, I'm going to buy an Intel i285H with 96 gig of RAM. Then I should be able to load a 70 billion parameter model in that. How fast will it run? It's going to run slow as dog, but it's going to be cool to be able to do it. It's an AI bragging rights thing, but I mostly stick with the smaller size models and the ones that are more quantitized because it just tends to work better for me.
Q: Are thre "Free" as in FSF/open source issues with the data?
- A: Yes. Where the data is coming from is a huge issue with AI and will be an issue long term.
- A: Yes, where's the data coming from is a huge question with AI. It's astonishing you can ask questions to models that you don't know where it's coming from. That is gonna be one of the big issues long-term. There are people who are working on trying to figure out that stuff, but it's, I mean, if you look at, God, I can't remember who it was. Somebody was actually out torrenting books just to be able to build it into their AI system. I think it might've been Meta. So there's a lot of that going on. The open source of the stuff is going to be tough. There's going to be there's some models like the mobile guys have got their own license, but where they're getting their data from, I'm not sure, so that's a huge question. That's a talk in itself. But yeah, if you train on your RAG and your data, you know what it's come, you know, you have a license that... but the other stuff is just more lines of supplement if you're using a smaller model.
Q: Have you used local models capable of tool-calling?
- A: I'm scared of agentic. I'm going to be a slow adopter of that. I want to do it, but I just don't have the, uh, four decimal fortitude right now to do it. I've had to give me the commands, but I still run the commands by hand. I'm looking into it and it's on once again, it's on that list, but I just, that's a big step for me.
Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow?
- A: Air-gap; based on experiece in the defense industry
- A: In terms of that, I would just control what it's able to talk to, what machines, I would actually have it be air gap. I work for a defense contractor, and we spend a lot of time dealing with air gap systems, because that's just kind of the way it works out for us. So agentic, it's just going to take a while to get trust. I want to see more stuff happening. Humans screw up stuff enough. The last thing we need is to multiply that by 1000. So in terms of that, I would be restricting what it can do. If you look at the capabilities, if I created a user and gave it permissions, I would have a lockdown through sudo, what it's able to do, what the account's able to do. I would do those kind of things, but it's going to be, it's happening. It's just, I'm going to be one of the laggards on that one. So air gap, jail, extremely locked down environments, like we're talking about separate physicals, not Docker. Yeah, hopefully.
Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though.
- A: Yes, local models can do a lot of that stuff. It's their capabilities. If you load LM studio, you can do a lot of wonderful stuff with that or with Open Web UI with ollama. It's a lot of capabilities. It's amazing. Open Web UI is actually what a lot of companies are using now to put their data behind that. They're curated data and stuff like that. So works well. I can confirm that from my own professional experience. Excellent.
Q: Really interesting stuff, thank you for your talk Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have? Apologies if this is outside of the scope of your talk
- A: I'm not a lawyer, but it is really getting complicated. It is getting to the point, I asked a question from, I played with Sora a little bit, and it generated someone, you can go like, oh, that's Jon Hamm, that's Christopher Walken, you start figuring out who the people they're modeling stuff after. There is an apocalypse, something going to happen right now. There is, but this is once again, my personal opinion, and I'm not a lawyer, and I do not have money. So don't sue me, is there's going to be the current administration tends is very AI, pro AI. And there's very a great deal of lobbying by those groups. And it's on both sides. And it's going to be, it's gonna be interesting to see what happens to copyright the next 510 years. I just don't know how it keeps up without there being some adjustments and stuff.
https://grothe.us/ <-- speaker's online presence
Thanks for your demo and for encouragement. I'll actually give it a try.
I remember seeing the adverts for sea monkeys in old comic books as a kid -- that was a blast from the past!
Super inspired! And very well done as a live prezi!
respect his commitment to privacy
https://aws.amazon.com/what-is/retrieval-augmented-generation/ <- What is RAG? (an explanation)
File size is not going to be the bottleneck, your RAM is. You're going to need 16 GB of RAM to run the smallest local models and ~512 GB RAM to run the largest ones. You'll need a GPU with this much memory (VRAM) if you want it to run fast.
- A: It also depends upon how your memory is laid out. Like example being the Ultra i285H I plan to buy, that has 96 gig of memory. It's unified between the GPU and the CPU share it, but they go over the same bus. So the overall bandwidth of it tends to be a bit less, but you're able to load more of it into memory. So it's able to do some additional stuff with it as opposed to come off disk. It's all balancing act. If you hit Ziskind's website, that guy's done some great work on it. I'm trying to figure out how big a model you can do, what you can do with it. And some of the stuff seems to be not obvious, because like example, being that MacBook Air, for the five minutes I can run the model, it runs it faster than a lot of other things that should be able to run it faster, just because of the way the ARM cores and the unified memory work on it. So it's a learning process. But if you want to, Network Chuck had a great video talking about building his own system with a couple really powerful Nvidia cards and stuff like that in it. And just actually setting up on his system as a node and using a web UI on it. So there's a lot of stuff there, but it is a process of learning how big your data is, which models you want to use, how much information you need, but it's part of the learning. And you can run models, even on Raspberry Pi 5s, if you want to, they'll run slow. Don't get me wrong, but they're possible.
Great talk/info. Thanks.
it went very well!
(from the audience perspective)
respect his commitment to privacy
Very interesting talk! Thanks!
AI, you are on notice: we want SBOMs, not f-bombs!
thanks for the presentation

Transcript (unedited)

[00:00:00.000] Introduction

Hey, everybody. Welcome from frigid Omaha, Nebraska. I'm just going to kick off my talk here, and we'll see how it all goes. Thanks for attending. So the slides will be available on my site, https://grothe.us, in the presentation section tonight or tomorrow. This is a quick intro to one way to do private AI in Emacs. There are a lot of other ways to do it. This one is really just more or less the easiest way to do it. It's a minimal viable product to get you an idea of how to get started with it and how to give it a spin. Really hope some of you give it a shot and learn something along the way.

[00:00:48.180] Overview of talk

So the overview of the talk broke down these basic bullet points of why private AI, what do I need to do private AI, Emacs and private AI, pieces for an AI Emacs solution, a demo of a minimal viable product, and the summary.

[00:01:08.060] Why private AI?

Why private AI? This is pretty simple. Just read the terms and conditions for any AI system you're currently using. If you're using the free tiers, your queries, code, uploaded information is being used to train the models. In some cases, you are giving the company a perpetual license to your data. You have no control over this, except for not using the engine. And keep in mind, the terms are changing all the time on that, and they're not normally changing for our benefit. So that's not necessarily a good thing. If you're using the paid tiers, you may be able to opt out of the data collection. But keep in mind, this can change, or they may start charging for that option. Every AI company wants more and more data. They need more and more data to train their models. It is just the way it is. They need more and more information to get it more and more accurate to keep it up to date. There's been a story about Stack Overflow. It has like half the number of queries they had a year ago because people are using AI. The problem with that is now there's less data going to Stack Overflow for the AI to get. Vicious cycle, especially when you start looking at newer language like Ruby and stuff like that. So it comes down to being an interesting time. Another reason why to go private AI is your costs are going to vary. Right now, these services are being heavily subsidized. If you're paying Claude $20 a month, it is not costing Claude, those guys, $20 a month to host all the infrastructure to build all these data centers. They are severely subsidizing that at a very much a loss right now. When they start charging the real costs plus a profit, it's going to change. Right now, I use a bunch of different services. I've played with Grok and a bunch of other ones. But Grok right now is like $30 a month for a regular Super Grok. When they start charging the real cost of that, it's going to go from $30 to something a great deal more, perhaps, I think, $100 or $200 or whatever really turns out to be the cost when you figure everything into it. When you start adding that cost into that, a lot of people are using public AI right now are going to have no option but to move to private AI or give up on AI overall.

[00:03:16.020] What do I need for private AI?

What do you need to be able to do private AI? If you're going to run your own AI, you're going to need a system with either some cores, a graphics processor unit, or a neural processing unit, a GPU or an NPU. I currently have four systems I'm experimenting with and playing around with on a daily basis. I have a System76 Pangolin AMD Ryzen 7 78040U with a Radeon 7080M integrated graphics card. It's got 32 gigs of RAM. It's a beautiful piece of hardware. I really do like it. I have my main workstation, it's an HP Z620 with dual Intel Xeons with four NVIDIA K2200 graphics cards in it. Why the four NVIDIA K2200 graphics card on it? Because I could buy four of them on eBay for $100 and it was still supported by the NVIDIA drivers for Debian. So that's why that is. A MacBook Air with an M1 processor, a very nice piece of kit I picked up a couple years ago, very cheap, but it runs AI surprisingly well, and an Acer Aspire 1 with an AMD Ryzen 5700H in it. This was my old laptop. It was a sturdy beast. It was able to do enough AI to do demos and stuff, and I liked it quite a bit for that. I'm using the Pangolin for this demonstration because it's just better. Apple's M4 chip has 38 teraflops of MPU performance. The Microsoft co-pilots are now requiring 45 teraflops of MPU to be able to have the co-pilot badge on it. And Raspberry Pi's new AI top is about 18 teraflops and is $70 on top of the cost of Raspberry Pi 5. Keep in mind, Raspberry recently raised the cost of their Pi 5s because of RAM pricing, which is going to be affecting a lot of these types of solutions in the near future. But there's going to be a lot of local power available in the future. That's what it really comes down to. A lot of people are going to have PCs on their desks. They're going to run a decent private AI without much issue.

[00:05:16.348] Emacs and private AI

So for Emacs and private AI, there's a couple popular solutions. Gptel, which is the one we're going to talk about. It's a simple interface. It's a minimal interface. It integrates easily into your workflow. It's just, quite honestly, chef's kiss, just a beautifully well-done piece of software. Ollama Buddy has more features, a menu interface, has quick access for things like code refactoring, text-free formatting, et cetera. This is the one that you spend a little more time with, but you also get a little bit more back from it. Ellama is another one, has some really good features to it, more different capabilities, but it's a different set of rules and capabilities to it. Aidermac, which is programming with your AI and Emacs. The closest thing I can come up to comparing this to is Cursor, except it's in Emacs. It's really quite well done. These are all really quite well done. There's a bunch of other projects out there. If you go out to GitHub, type Emacs AI, you'll find a lot of different options.

[00:06:13.220] Pieces for an AI Emacs solution

So what is a minimal viable product that can be done? A minimal viable product to show what an AI Emacs solution is can be done with only needing two pieces of software. Llamafile, this is an amazing piece of software. This is a whole LLM contained in one file. And the same file runs on Mac OS X, Linux, Windows, and the BSDs. It's a wonderful piece of kit based on these people who created this thing called Cosmopolitan that lets you create and execute while it runs on a bunch of different systems. And Gptel, which is an easy plug-in for Emacs, which we talked about in the last slide a bit. So setting up the LLM, you have to just go out and just hit a page for it and go out and do a wget of it. That's all it takes there. Chmodding it so you can actually execute the executable. And then just go ahead and actually running it. And let's go ahead and do that. I've already downloaded it because I don't want to wait. And let's just take a look at it. I've actually downloaded several of them, but let's go ahead and just run llama 3.2-1b with the 3 billion instructions. And that's it firing up. And it is nice enough to actually be listening in port 8080, which we'll need in a minute. So once you do that, you have to install gptel and emacs. That's as simple as firing up emacs, doing the M-x install-package, and then just typing gptel, if you have your repository set up right, which hopefully you do. And then you just go ahead and have it.

[00:07:56.340] Config file

You also have to set up a config file. Here's my example config file as it currently set up, requiring, ensuring Gptel is loaded, defining the Llamafile backend. You can put multiple backends into it, but I just have the one defined on this example. But it's pretty straightforward. Llama local file, name for it, stream, protocol HTTP. If you have HTTPS set up, that's obviously preferable, but a lot of people don't for their home labs. Host is just 127.0.0.1 port 8080. Keep in mind, some of the AIs run on a different port, so you may be 8081 if you're running OpenWebView at the same time. The key, we don't need an API key because it's a local server. And the models just, uh, we can put multiple models on there if we want to. So if we create one with additional stuff or like rag and stuff like that, we can actually name those models by their domain, which is really kind of cool. But, uh, that's all that takes.

[00:08:52.100] Demo: Who was David Bowie?

So let's go ahead and go to a quick test of it. Oops. Alt-X, gptel. And we're going to just choose the default buffer to make things easier. Going to resize it up a bit. And usually the go-to question I go to is, who was David Bowie? This one is actually a question that's turned out to be really good for figuring out whether or not AI is complete. This is one that some engines do well on, other ones don't. And we can just do, we can either do the alt X and send the gptel-send, or we can just do C-c and hit enter. We'll just do C-c and enter. And now it's going ahead and hitting our local AI system running on port 8080. And that looks pretty good, but let's go ahead and say, hey, it's set to terse mode right now. Please expand upon this. And there we go. We're getting a full description of the majority of, uh, about David Bowie's life and other information about him. So very, very happy with that.

[00:10:21.700] Hallucinations

One thing to keep in mind is you look at things when you're looking for hallucinations, how accurate AI is, how it's compressed is it will tend to screw up on things like how many children he had and stuff like that. Let me see if it gets to that real quick. Is it not actually on this one? Alright, so that's the first question I always ask one.

[00:10:42.180] Next question: What are sea monkeys?

The next one is what are sea monkeys? It gives you an idea of the breadth of the system. It's querying right now. Pulls it back correctly. Yes. And it's smart enough to actually detect David Bowie even referenced see monkeys in the song sea of love, which came at hit single. So it's actually keeping the context alive and that which is very cool feature. I did not see that coming. Here's one that some people say is a really good one to ask. Rs in "strawberry." All right, now she's going off the reservation. She's going in a different direction. Let me go ahead and reopen that again, because it went down a bad hole there for a second.

[00:11:57.180] Writing Hello World in Emacs Lisp

Let me ask it to write hello world in Emacs Lisp. Yep, that works. So the point being here, that was like two minutes of setup. And now we have a small AI embedded inside the system. So that gives you an idea just how easy it can be. And it's just running locally on the system. We also have the default system here as well. So not that bad.

[00:12:32.580] Pieces for a better solution

That's a basic solution, that's a basic setup that will get you to the point where you can go like, it's a party trick, but it's a very cool party trick. The way that Gptel works is it puts it into buffers, it doesn't interfere with your flow that much, it's just an additional window you can pop open to ask questions and get information for, dump code into it and have it refactored. Gptel has a lot of additional options for things that are really cool for that. But if you want a better solution, I recommend Ollama or LM Studio. They're both more capable than Llamafile. They can accept a lot of different models. You can do things like RAG. You can do loading of things onto the GPU more explicitly. It can speed stuff up. One of the things about the retrieval augmentation is it will let you put your data into the system so you can start uploading your code, your information, and actually being able to do analysis of it. Open WebUI provides more capabilities. It provides an interface that's similar to what you're used to seeing for ChatGPT and the other systems. It's really quite well done. And once again, gptel, I have to mention that because that's the one I really kind of like. And Ollama Buddy is also another really nice one.

[00:13:36.900] What about the license?

So what about the licensing of these models? Since I'm going out pulling down a model and doing this stuff. Let's take a look at a couple of highlights from the Meta Llama 3 community license scale. If your service exceeds 700 million monthly users, you need additional licensing. Probably not going to be a problem for most of us. There's a competition restriction. You can't use this model to enhance competing models. And there's some limitations on using the Meta trademarks. Not that big a deal. And the other ones are it's a permissive one designed to encourage innovation, open development, commercial use is allowed, but there are some restrictions on it. Yeah, you can modify the model, but you have to rely on the license terms. And you can distribute the model with derivatives. And there are some very cool ones out there. There's people who've done things to try and make the Llama be less, what's the phrase, ethical if you're doing penetration testing research and stuff like that. It has some very nice value there. Keep in mind licenses also vary depending on the model you're using. Mistral AI has the non-production license. It's designed to keep it to research and development. You can't use it commercially. So it's designed to clearly delineate between research and development and somebody trying to actually build something on top of it.

[00:14:56.580] Are there open source data model options?

And another question I get asked is, are there open source data model options? Yeah, but most of them are small or specialized currently. MoMo is a whole family of them, but there tend to be more specialized, but it's very cool to see where it's going. And it's another thing that's just going forward. It's under the MIT license.

[00:15:14.520] Things to know

Some things to know to help you have a better experience with this. Get ollama and Open WebUI working by themselves, then set up your config file. I was fighting both at the same time, and it turned out I had a problem with my ollama. I had a conflict, so that was what my problem is. Llamafile, gptel is a great way to start experimenting just to get you an idea of how it works and figure out how the interfaces work. Tremendous. RAG loading documents into it is really easy with open web UI. You can create models, you can put things like help desk developers and stuff like that, breaking it out. The Hacker Noon has a how to build a $300 AI computer. This is for March 2024, but it still has a lot of great information on how to benchmark the environments, what some values are like the Ryzen 5700U inside my Acer Aspire, that's where I got the idea doing that. Make sure you do the ROCm stuff correctly to get the GUI extensions. But it's just really good stuff. You don't need a great GPU or CPU to get started. Smaller models like tinyllama can run on very small systems. It gets you the ability to start playing with it and start experimenting and figure out if that's for you and to move forward with it. The AMD Ryzen AI Max+ 395 is a mini PC makes it really nice dedicated host. You used to be able to buy these for about $1200. Now with the RAM price increase, you want to get 120 gig when you're pushing two brands, so it gets a little tighter. Macs work remarkably well with AI. My MacBook Air was one of my go-tos for a while, but once I started doing anything AI, I had a five-minute window before the thermal throttling became an issue. Keep in mind that's a MacBook Air, so it doesn't have the greatest ventilation. If you get the MacBook Pros and stuff, they tend to have more ventilation, but still you're going to be pushing against that. So Mac Minis and the Mac Ultras and stuff like that tend to work really well for that. Alex Ziskind on YouTube has a channel. He does a lot of AI performance benchmarking, like "I load a 70 billion parameter model on this mini PC" and stuff like that. It's a lot of fun and interesting stuff there. And it's influencing my decision to buy my next AI style PC. Small domain specific LLMs are happening. An LLM that has all your code and information, it sounds like a really cool idea. It gives you capabilities to start training stuff that you couldn't do with like the big ones. Even with in terms of fine-tuning and stuff, it's remarkable to see where that space is coming along in the next year or so. HuggingFace.co has pointers to tons of AI models. You'll find the one that works for you, hopefully there. If you're doing cybersecurity, there's a whole bunch out there for that, that have certain training on it, information. It's really good. One last thing to keep in mind is hallucinations are real. You will get BS back from the AI occasionally, so do validate everything you get from it. Don't be using it for court cases like some people have and run into those problems. So, That is my talk. What I would like you to get out of that is, if you haven't tried it, give Gptel and LlamaFile a shot. Fire up a little small AI instance, play around with a little bit inside your Emacs, and see if it makes your life better. Hopefully it will. And I really hope you guys learned something from this talk. And thanks for listening. And the links are at the end of the talk, if you have any questions. Let me see if we got anything you want, Pat. You do. You've got a few questions. What an awesome talk this was, actually. If you don't have a camera, I can get away with not having one too. Yeah, so there are a few questions, but first let me say thank you for a really captivating talk. I think a lot of people will be empowered from this to try to do more with less, especially locally. concerned about the data center footprint, environmentally concerned about the footprint of LLM inside data centers. So just thinking about how we can put infrastructure we have at home to use and get more done with less. because there was a study a while ago. Someone said every time you do a Gemini query, it's like boiling a cup of water. I don't know how much direction you want. I'd be very happy to read out the questions for you. I'm having trouble getting to that tab. so you can follow along if you'd like.

[00:20:07.420] Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways?

a good one to start with? Does it have interesting failure conditions or what made you choose that? But I came down to it really taught me a few things about how the models work in terms of things like how many kids he had, because Deepseek, which is a very popular Chinese model that a lot of people are using now, misidentifies him having three daughters, and he has like one son and one, one, I think, two sons and a daughter or something like that. so there's differences on that, and it just goes over... there's a whole lot of stuff because his story spans like 60 years, so it gives good feedback. That's the real main reason I asked that question because I just needed one... That sea monkeys, I just picked because it was obscure, and just always have, write, I used to have it write hello world in forth because I thought was an interesting one as well. It's just picking random ones like that. One question I ask a lot of models is, what is the closest star to the Earth? Because most of them will say Alpha Centauri or Proxima Centauri and not the sun. And I have a whole 'nother talk where I just argue with the LLM trying to say, hey, the sun is a star. And he just wouldn't accept it, so. What? Oh, I can... You're there.

[00:21:30.740] Q: What specific tasks do you use local AI for?

and actually have it do analysis of it. I was actually going through some code I have for some pen testing, and I was having it modified to update it for the newer version, because I hate to say this, but it was written for Python 2, and I needed to update it for Python 3. And the 2 to 3 tool did not do all of it, but the actual tool was able to do the refactoring. It's part of my laziness. But I use that for anything I don't want to hit the web. And that's a lot of stuff when you start thinking about if you're doing cyber security researching. and you have your white papers and stuff like that and stuff in there. I've got a lot of that loaded into RAG in one model on my Open WebUI system.

[00:22:16.880] Q: Have you used any small domain-specific LLMs? What are the kinds of tasks they specialize in, and how do I find and use them?

any small domain specific LLMs? What kind of tasks? If so, what kind of tasks that they specialize in? And you know, how? for cybersecurity and stuff like that, that I really need to dig into that's on my to do list. I've got a couple weeks off at the end of the year. And that's a big part of my plan for that.

[00:22:46.540] Q: Are the various models updated regularly? Can you add your own data to pre-built models?

Can you add your own data to the pre-built models? You can add data to a model in a couple of different ways. You can do something called fine-tuning, which requires a really nice GPU and a lot of CPU time. Probably not going to do that. You can do retrieval augmentation generation, which is you load your data on top of the system and put inside a database, and you can actually scan that and stuff. I have another talk where I go through and I start asking questions about, I load the talk into the engine and I ask questions against that. If I would have had time, I would have done that, but it comes down to how many... That's RAG. RAG is pretty easy to do through Open WebUI or LM studio. It's a great way, you just, like, point it to a folder and it just sucks all that state into... and it'll hit that data first. You have like helpdesk and stuff and... The other options: there's vector databases, which is, like, if you use PostgreSQL, it has a pg vector that can do a lot of that stuff. I've not dug into that yet, but that is also on that to-do list I've got a lot of stuff planned for...

[00:23:48.056] Q: What is your experience with RAG? Are you using them and how have they helped?

I don't even know what that means. Do you know what that means? Do you remember this question again? What is your experience with RAGs? That loads your data first, and it hits yours, and it'll actually cite it and stuff. There's a guy who wrote a RAG in 100 lines of Python, and it's an impressive piece of software. I think if you hit one of my sites, I've got a private AI talk where I actually refer to that. But retrieval augmentation, it's easy, it's fast, it puts your data into the system, Yeah, start with that and go then iterate on top of that. That's one of the great things about AI, especially private AI, is you can do whatever you want to with it and build up with it as you get more experience.

[00:24:38.834] Q: Thoughts on running things on AWS/digital ocean instances, etc?

on AWS, DigitalOcean, and so on? The DigitalOcean, they have some of their GPUs. I still don't like having the data leave my house, to be honest, or at work, because I tend to do some stuff that I don't want it even hitting that situation. But they have pretty good stuff. Another one to consider is Oracle Cloud. Oracle has their AI infrastructure that's really well done. But I mean, once again, then you start looking at potential is saying your data is private, I don't necessarily trust it. But they do have good stuff, both DigitalOcean, AWS, Oracle Cloud has the free service, which isn't too bad, usually a certain number of stuff. And Google's also has it, but I still tend to keep more stuff on local PCs, because I'm just paranoid that way.

[00:25:31.078] Q: What has your experience been using AI for cyber security applications? What do you usually use it for?

Do you want to get into that, using AI for cybersecurity? You might have already touched on this. what I've had to do is I've dumped logs to have it do correlation. Keep in mind, the size of that Llama file we were using for figuring out David Bowie, writing the hello world, all that stuff, is like six gig. How does it get the entire world in six gig? I still haven't figured that out in terms of quantization. So I'm really interested in seeing the ability to take all this stuff out of all my logs, dump it all in there, and actually be able to do intelligent queries against that. Microsoft has a project called Security Copilot, which is trying to do that in the Cloud. But I want to work on something to do that more locally and be able to actually drive this stuff over that. That's one also on the long-term goals. Those are the questions that I see. I want to just read out a couple of comments that I saw in IRC though. jrootabaga says, it went very well from an audience perspective. And GGundam says, respect your commitment to privacy. And then somebody is telling us we might have skipped a question. So I'm just going to run back to my list. Updated regularly experience. I just didn't type in the answer here's and there's a couple more questions coming in so

[00:26:59.660] Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be?

Is there a disparity where you go to paid models because they are better and what problems? You know what would drive you to? That's a good question. Paid models, I don't mind them. I think they're good, but I don't think they're actually economically sustainable under their current system. Because right now, if you're paying 20 bucks a month for Copilot and that goes up to 200 bucks, I'm not going to be as likely to use it. You know what I mean? But it does do some things in a way that I did not expect. For example, Grok was refactoring some of my code in the comments and dropped an F-bomb. which I did not see coming, but the other code before that I had gotten off GitHub had F bombs in it. So it was just emulating the style, but would that be something I'd want to turn in a pull request? I don't know. But, uh, there's, there's a lot of money going into these AIs and stuff, but in terms of the ability to get a decent one, like the llama, llama 3.2, and load your data into it, you can be pretty competitive. You're not going to get all the benefits, but you have more control over it. So it's a balancing act.

[00:28:14.126] Q: What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size?

What is the largest parameter size for local models that you've been able to successfully run locally and do you run into issues with limited context window size? The top paid models will tend to have a larger ceiling. By default, the context size is I think 1024. But I've upped it to 8192 on this box, the Pangolin, because it seems to be, for some reason, it's just a very... working quite well. But the largest ones I've loaded have been in the... have not been that huge. I've loaded this... the last biggest one I've done... That's the reason why I'm planning on breaking down and buying a Ryzen. Actually, I'm going to buy an Intel i285H with 96 gig of RAM. Then I should be able to load a 70 billion parameter model in that. How fast will it run? It's going to run slow as dog, but it's going to be cool to be able to do it. It's an AI bragging rights thing, but I mostly stick with the smaller size models and the ones that are more quantitized because it just tends to work better for me. but I'm just anticipating that we're going to be going strong at the 10 minute mark. So I'm just, just letting, you know, we can go as long as we like here at a certain point. I may have to jump away and check in with the next speaker, but we'll post the entirety of this, even if we aren't able to stay with it all. Okay. And we've got 10 minutes where we're still going to stay live.

[00:29:52.380] Q: Are there "Free" as in FSF/open source issues with the data?

So next question coming in, I see, are there free as in freedom, free as in FSF issues with the data? It's astonishing you can ask questions to models that you don't know where it's coming from. That is gonna be one of the big issues long-term. There are people who are working on trying to figure out that stuff, but it's, I mean, if you look at, God, I can't remember who it was. Somebody was actually out torrenting books just to be able to build it into their AI system. I think it might've been Meta. So there's a lot of that going on. The open source of the stuff is going to be tough. There's going to be there's some models like the mobile guys have got their own license, but where they're getting their data from, I'm not sure, so that's a huge question. That's a talk in itself. But yeah, if you train on your RAG and your data, you know what it's come, you know, you have a license that but the other stuff is just more lines of supplement if you're using a smaller model. I'll read them out in order here. Really interesting stuff. Thank you for your talk.

[00:31:09.557] Q: Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have?

Given that large AI companies are openly stealing intellectual property and copyright and therefore eroding the authority of such laws and maybe obscuring the truth itself, can you see a future where IP and copyright flaw become untenable? I think that's a great question. I'm not a lawyer, but it is really getting complicated. It is getting to the point, I asked a question from, I played with Sora a little bit, and it generated someone, you can go like, oh, that's Jon Hamm, that's Christopher Walken, you start figuring out who the people they're modeling stuff after. There is an apocalypse, something going to happen right now. There is, but this is once again, my personal opinion, and I'm not a lawyer, and I do not have money. So don't sue me, is there's going to be the current administration tends is very AI, pro AI. And there's very a great deal of lobbying by those groups. And it's on both sides. And it's going to be, it's gonna be interesting to see what happens to copyright the next 510 years. I just don't know how it keeps up without there being some adjustments and stuff.

[00:32:18.060] Comment: File size is not going to be the bottleneck, your RAM is.

file size is not going to be a bottleneck. RAM is. You'll need 16 gigabytes of RAM to run the smallest local models and 512 gigabytes of RAM to run the larger ones. You'll need a GPU with that much memory if you want it to run quickly. Like example being the Ultra i285H I plan to buy, that has 96 gig of memory. It's unified between the GPU and the CPU share it, but they go over the same bus. So the overall bandwidth of it tends to be a bit less, but you're able to load more of it into memory. So it's able to do some additional stuff with it as opposed to come off disk. It's all balancing act. If you hit Ziskind's website, that guy's done some great work on it. I'm trying to figure out how big a model you can do, what you can do with it. And some of the stuff seems to be not obvious, because like example, being that MacBook Air, for the five minutes I can run the model, it runs it faster than a lot of other things that should be able to run it faster, just because of the way the ARM cores and the unified memory work on it. So it's a learning process. But if you want to, Network Chuck had a great video talking about building his own system with a couple really powerful Nvidia cards and stuff like that in it. And just actually setting up on his system as a node and using a web UI on it. So there's a lot of stuff there, but it is a process of learning how big your data is, which models you want to use, how much information you need, but it's part of the learning. And you can run models, even on Raspberry Pi 5s, if you want to, they'll run slow. Don't get me wrong, but they're possible. so I'll just bam for another second. We've got about five minutes before we'll, before we'll be cutting over, but I just want to say in case we get close for time here, how much I appreciate your talk. This is another one that I'm going to have to study after the conference. you guys putting on the conference. It's a great conference. It's well done. with the brains of the project, which is you.

[00:34:46.900] Q: Have you used local models capable of tool-calling?

Have you used local models capable of tool calling? I'm scared of agentic. I'm going to be a slow adopter of that. I want to do it, but I just don't have the, uh, four decimal fortitude right now to do it. I've had to give me the commands, but I still run the commands by hand. I'm looking into it and it's on once again, it's on that list, but I just, that's a big step for me. Well, maybe it's, let me just scroll through because we might have missed one question. Oh, I see. Here was the piggyback question. Now I see the question that I missed. So this was piggybacking on the question about model updates and adding data.

[00:35:44.860] Q: Will the models reach out to the web if they need to for more info?

And will models reach out to the web if they need more info? Or have you worked with any models that work that way? There's there was like a group working on something like a package updater that would do different diffs on it, but it's so... Models change so much, even who make minor changes and fine-tuning, It's hard just to update them in place. So I haven't seen one, but that doesn't mean they're not out there. Curious topic though. Well, it's probably pretty good timing. Let me just scroll and make sure. And of course, before I can say that, there's one more question. So let's go ahead and have that. I want to make sure while we're still live, though, I give you a chance to offer any closing thoughts.

[00:36:31.300] Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow?

So what scares you most about the agentic tools? How would you think about putting a sandbox around that if you did adopt an agentic workflow? In terms of that, I would just control what it's able to talk to, what machines, I would actually have it be air gap. I work for a defense contractor, and we spend a lot of time dealing with air gap systems, because that's just kind of the way it works out for us. So agentic, it's just going to take a while to get trust. I want to see more stuff happening. Humans screw up stuff enough. The last thing we need is to multiply that by 1000. So in terms of that, I would be restricting what it can do. If you look at the capabilities, if I created a user and gave it permissions, I would have a lockdown through sudo, what it's able to do, what the account's able to do. I would do those kind of things, but it's going to be, it's happening. It's just, I'm going to be one of the laggards on that one. So air gap, jail, extremely locked down environments, like we're talking about separate physicals, not Docker. Yeah, hopefully.

[00:37:36.578] Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though.

such as giving models the ability to search the web before answering your question, you know, write access, execute access. I'm interested to know if local models are any good at that. It's their capabilities. If you load LM studio, you can do a lot of wonderful stuff with that or with Open Web UI with ollama. It's a lot of capabilities. It's amazing. Open Web UI is actually what a lot of companies are using now to put their data behind that. They're curated data and stuff like that. So works well. I can confirm that from my own professional experience. Excellent. if you want to give us like a 30-second, 45-second wrap-up. Aaron, let me squeeze in mine. Thank you again so much for preparing this talk and for entertaining all of our questions. This is a great one. I've enjoyed a lot of it. I've only had a couple of talks so far, but I'm looking forward to hitting the ones after this and tomorrow.

[00:38:41.660] Wrapping up

But the AI stuff is coming. Get on board. Definitely recommend it. If you want to just try it out and get a little taste of it, what my minimal viable product with just Llamafile and gptel will get you to the point where you start figuring out. Gptel is an amazing thing. It just gets out of your way, but it works so well with Emacs's design because it doesn't take your hands off the keyboard. It's just another buffer, and you just put information in there. It's quite a wonderful time. Let's put that way. That's all I got. So I'll stop the recording and you're on your own recognizance. if anybody has any questions or anything my email address is ajgrothe@yahoo.com or at gmail and thank you all for attending, and thanks again for the conference Okay, I'm gonna go ahead and end the room there, thank you. Excellent, thanks, bye.

Questions or comments? Please e-mail ajgrothe@yahoo.com

Back to the talks Previous by track: Emacs, editors, and LLM driven workflows Next by track: Common Lisp images communicating like-a-human through shared Emacs slime and eev Track: Development