Using SQLite as a data source: a framework and an example
Andrew Hyatt (he/him)
In this talk, Andrew Hyatt shows how to use SQLite to store and retrieve information using the triples package. Afterwards, he wil handle questions over BigBlueButton.
The following image shows where the talk is in the schedule for Sat 2022-12-03. Solid lines show talks with Q&A via BigBlueButton. Dashed lines show talks with Q&A via IRC or Etherpad.
Format: 22-min talk followed by live Q&A (done)
Etherpad: https://pad.emacsconf.org/2022-sqlite
Discuss on IRC: #emacsconf-dev
Status: TO_CAPTION_QA
Saturday, Dec 3 2022, ~11:00 AM - 11:25 AM MST (US/Mountain)
Saturday, Dec 3 2022, ~10:00 AM - 10:25 AM PST (US/Pacific)
Saturday, Dec 3 2022, ~6:00 PM - 6:25 PM UTC
Saturday, Dec 3 2022, ~7:00 PM - 7:25 PM CET (Europe/Paris)
Saturday, Dec 3 2022, ~8:00 PM - 8:25 PM EET (Europe/Athens)
Saturday, Dec 3 2022, ~11:30 PM - 11:55 PM IST (Asia/Kolkata)
Sunday, Dec 4 2022, ~2:00 AM - 2:25 AM +08 (Asia/Singapore)
Sunday, Dec 4 2022, ~3:00 AM - 3:25 AM JST (Asia/Tokyo)
Talk
00:00.000 Introduction 00:56.220 Why SQLite 02:29.870 The triples package 04:50.209 Exercise: Emacs bookmarks 07:40.740 Creating bookmarks 09:31.366 Retrieving bookmarks 11:10.034 Backlinks 13:59.380 Extensible entities 15:57.008 EKG package 18:12.772 The code 19:43.780 Renaming tags 20:52.147 Conclusion
Q&A
00:29.040 Is this built into Emacs? Multiple schemas, multiple databases? 01:22.756 What about collaborative editing with this? 02:22.913 What about using this on multiple computers? How do you synchronize your data? 03:16.640 Are you planning to further develop EKG? 04:45.303 Is it then possible to combine the triples DB with some custom tables in the same SQLite file? 06:01.734 What are your thoughts on adding a timestamp attribute to triples so that the database becomes append-only and by default you return the latest fact for a subject-object pair? 08:37.040 With EKG what about views like org roam node mind map view? Or org mode virtual view for integration with other org packages? 09:46.220 Can ordinary lisp data types (lists, symbols, etc) be stored in the database? 13:41.041 Beyond note-taking, what kind of packages do you think would benefit from triples library? 15:03.200 Are you trying to create a PIM with EKG? What information do you primarily want to manage? 17:10.021 What about using other databases programs: PostgreSQL, MongoDB, etc.? 17:55.899 What is your preferred reference to understand triples/graph dbs? (e.g. think better about schema design) 18:49.200 Will it slow down with the growth of a database? 20:39.728 What are your thoughts on allowing for a true graph DB backend? 22:58.960 Challenges with recording
Description
Emacs can now be built with SQLite, giving native support for reading and writing to a database. With this, we can start seriously considering a SQLite-first approach: instead of storing data on the filesystem, and using various ad-hoc solutions for metadata, we can use SQLite to store and search our data. This is essentially a tradeoff between the power and speed of SQLite and the universality of the filesystem. If we accept that this approach is useful, then a standard way to store information in database, may be useful and promote package interoperability, just as our single filesystem does. The triples packages is a RDF-like database for supplying such a flexible system for storing and retrieving data from SQLite. A sample application, ekg, a replacement for org-roam, is shown using this, and the advantages of the triple design are explained.
For more information and the packages discussed here, see the triples and ekg pages.
Discussion
- https://www.reddit.com/r/emacs/comments/zexv6b/using_sqlite_as_a_data_source_a_framework_and_an/
- https://news.ycombinator.com/item?id=33853509
Notes
- https://github.com/ahyatt/triples
- https://github.com/ahyatt/ekg
- https://www.gnu.org/software/hyperbole/man/hyperbole.html#Implicit-Buttons - discussed in the next talk could really help simplify access to your triples and ekg primitives; have a look when you have time.
- You had some delicious recipes in there. Made ME hungry!
- Thanks, that's insightful
Questions and answers
- Q: To what extent did Datomic influence the design of triples?
https://www.datomic.com/
- A: I wasn't aware of Datomics, but I think both triples and Datomics are influenced by RDF & trends in Knowledge Graph construction in the industry. I took a look at that page, and one interesting thing they do is (AFAICT) store edits to the database instead of actual values, allowing you to reconstruct the database or go forward or backward in time. This is very interesting, and I'm thinking of ways to make the database safer, but not sure if I can get to such sophisticated properties in this implementation.
- Q:built into Emacs? nice. multiple schemas (so to say
differentiation of "databases" (if you will so) are possible with
that built-in instance?
- A: Yes, with emacs 29. Full-featured with multiple databases, transactions, etc.
- Q:What about collaborative editing whith this? Multiple computers
with multiple emacs like crdt.el with org mode?
- A: Database are great for more async collaboration, multiple people / processes can add to the repository at the same time. But I think it's not going to be a great solution for multiple users modifying the same buffer.
- Q:What about using this on multiple computers? How would you
syncronize the data? (This is a minor problem in org-roam where if
you share files between multiple computers, their SQLite databases
get out of sync and require M-x org-roam-db-sync to rebuild the
SQLite database.)
- A: This is an unsolved problem, one that I'm interested in looking into. There possibly are standardized db solutions to this, but I don't yet know what they are.
- Q: With EKG what about views like org roam node mind map view? Or
org mode virtual view for integration with other org packages?
- A: This is possible, it just needs to interface with the database in a different way. It's all graphs, so really any triple library might be a good fit for this.
- Q: Are you planning to further develop EKG? It is highly
interesting to me, I do prefer SQLite over text.
- A: Andrew is using it; not ready for general use but quite soon -- towards the end of the month! Still in exploratory mode though. Still thinking about (some of) the fundamental concepts.
- Q: Is it then possible to combine the triples DB with some custom
tables in the same SQLite file? (e.g. to build a log table next to
the triples tables for quick query of event data)
- A: You could do that. AT the moment it's just one table (triples). It's designed to be one table in one DB -- beware of consistency issues if you add further tables, which you can definitely do.
- Q: What are your thoughts on adding a timestamp attribute to triples
so that the DB becomes append-only and by default you return the
latest fact for a subject/object pair?
- Q+ -> Use is to keep a record - you don´t delete? e.g. you get all past addresses of a person or all past versions of a given fact. Even version control for notes.
- A: I haven't thought of that / not seen in other triple stores
- A: Be ware that these DBs already take quite a bit of space
- A: may make synchronization easier
- Q: can ordinary lisp data types (lists, symbols, etc) be stored in
the data base
- A: Yes, if you don't specify, it defaults to a list. Not sure that was the right design choice; lists map to rows with a hidden index column. emacs-sql and this also represent most things as strings.
- Q: beyond note taking what kind of packages do you think would
benefit from triples library?
- A: Anything where you have lisp forms stored in a file would probably be better implemented via a database. And the triples library makes this easy and standardized. So, for example BBDB, which is a "database" should actually be in a database. And then you might want to annotate, tag, etc, so having be in one big database makes a lot of sense to me, because I think all this kind of info wants to live together.
- Q: Are you trying to create a PIM with EKG? What information do you
primarily want to manage?
- A: Yes, I think many uses of emacs is in line with PIM, and those use-cases are a good fit for triples / ekg. Notes, people, projects, maybe even being able to integrate with org and manage TODOs there as well.
- Q: What about using other databases programs Postgres mongoDB etc..
[see last q too, but I guess you refer to other relational, e.g.
Postgresql]
- A: Those could work, maybe the triples library would work via emacsql with those, but I haven't tested it. I'm not sure what the benefit would be, typically these database tend to be simple and small, so a more full-featured DB is probably overkill. MongoDB and those kinds of row-oriented databases probably wouldn't be a good fit, but I haven't tried it.
- Q: What is your preferred reference to understand triples/graph dbs?
(e.g. think better about schema design)
- A: I know from using them / talking about them. I will come back with some references!
- Q: Will it slow down with a growth of database?
- A: there is a tradeoff -- triples gives you a standard schema, but you lose the power of getting things in one SQL expression - which makes it slow. But I have a bunch of data (2yrs of org-roam usage) and it is still very fast. These limits exist, but the usual rate of content creation is not large enough to hit the limits.
- Q: What are your thoughts on allowing for a "true" graph-db
backend? (whatever the current best free software alternative to
neo4j is, I guess).
- A: In my usage, the graph DBs tended to be slow and somewhat clumsy in usage, we returned every time to SQL [please reword if appropriate]. At the moment not a glaring need (for the quantities of data people manage in Emacs).
- Q: How hungry did you get while writing and recording this?
- A: I forgot that I used recipes as an example in my demo! org-roam / ekg and other things are a great way to cook better, BTW. When you make a recipe, write a org-roam daily (or in ekg, an entry tagged with the date and the recipe) with notes about how it went, what could be better, etc. Then you can later see the recipe and notes on it at once, which helps you further refine the recipe.
- Q: beyond note taking what kind of packages do you think would benefit from triples library
Other discussions from IRC:
- I like the font he's using in his org doc.
- In some way, triples (turtle, RDF and similar things) describe a directed graph, where echa edge is like: Subject--Predicate-->Object. A datalog can describe this too (for example: with predicates like triple(S,P,O)).
Transcript
[00:00:00.000] Hello. I'm Andrew Hyatt. I've been working on Emacs, with Emacs, and to some extent on Emacs for a while. I've written the WebSockets library and Emacs calc tutorials. I've enjoyed use of many of everyone's incredible packages. So my thesis for this talk, why I'm giving this talk, is that I'm interested in SQLite. I think we should be exploring SQLite for applications in ways I think the community has shied away from. I'd like to introduce the triples package as a way, both easy and with interesting functionality, that will allow us to build extensible databases in a way that is, I think, a little bit unusual and perhaps compelling, I hope.
[00:00:56.220] So first of all, why SQLite? Why is this good? Well, SQLite is now built into Emacs. So you get a few things out of this when you use it for data. First of all, it's a database. It's extremely good for data, of course. There's a simplicity to data manipulation using a database, compared to data manipulation, that is, manipulating data in a text file. Text files are really not built for data. So when typically when you need to do this, like I know Org Mode is-- which I'm a huge, huge Org Mode fan-- it's all about sort of data in text. It does work, but you certainly would be hard pressed to make sweeping changes to your database that is represented in text. It's just not well suited for this sort of thing. It would take a long time, where the speed of SQL is incredibly impressive. I think certainly Emacs is not known for being extremely speedy. I think the overuse of text is part of this. Of course, text and using text, using files has awesome advantages as well. I'm really here to talk about the other side of the coin, right? Everyone can judge those advantages and disadvantages and make their own trade-offs, but I want to kind of make the pitch for SQLite.
[00:02:29.870] So let's talk about the triples package. The triples package is a package that is designed to give you a very generic schema. You don't have to do, for most of the common operations, you don't have to write SQL yourself. A lot of stuff is built in and is based on a very generic schema. That is, it's a single table. That table has, of course, fixed schema. It basically has three columns. It actually has four columns. In this talk, I'm not going to get into the fourth column and why, but it's useful. So the three columns are subject, predicate, and object. This is what it's related to what we call an RDF format. These things basically describe a link. The link is from the subject to the object. The link type is a predicate. That sounds overly theoretical, but the point is that you can describe a lot of things with this format. You probably describe everything with it. It's very simple because the schema is fixed. It's only this kind of data. That means for your application, you define a schema in subject, predicate, object format. That defines what data you can use, what types there are, what properties they have, how you can use the system, and what is legal to do. And this is stored as data. I think as Lisp people, I think we're all very onboard with the fact that you have a simple way to express everything, and you don't have these two systems. In this way, you don't have to have code as a system. Do you have to load code to use the triples package to make sure your schema is obeyed? No, it's all just built in to this database. I'll describe this. As I said, it's a little bit abstract right now, but it will become a lot clearer when we go through an example, which we're going to do now.
[00:04:50.209] As an exercise, let's create Emacs bookmarks, which basically are three things: a name, a file, and an annotation. I may be missing out on functionality. Of course, everything in Emacs, everything has lots and lots of functionality, but let's just start with this simple thing. First of all, we're going to open up a database. Pretty simple. I think there's nothing to explain there. But here on this line that I'm on right now, we are saying, okay, there's going to be a type called bookmark. It's going to have the following properties. First, a file, which is unique and a string. The second is an annotation, which is again unique and a string. Then we're going to have another type called named. First of all, why is it named as part of bookmark? As I'll get into, it's interesting when you start sharing this database with other things, not just bookmarks, but other types. Bookmarks are very similar to many other things that you might want to expand into. Those other things have names, but they're not bookmarks. It's nice to separate these concerns out and just have another type called named, which just basically has a name. We can execute this. It's not interesting to look at these. that is not all that useful for anything, It outputs something but that's okay. What's done is, actually, it's created a database and it's populated it with the schema. We can look at this. We won't go through all of this because it's a little bit too much for a short presentation like this, but you could see that there's something here that's like, oh, we have a subject bookmark. We have a property base/type. That just means that there's a property that's defined by the base. This means this is from the triples package itself. It's not some other package. Third is, what is the type of this object? It's a schema. This thing could be many types. As I said, if you have a-- we haven't seen an example yet, but if you have a bookmark, it's going to have a name type and a bookmark type. Everything is multi-typed and that's kind of a feature of this kind of storage system. I'm not going to go through everything, but you can see it's all there in triples, the whole schema, everything we just did.
[00:07:40.740] Let's create a bookmark. Again, we're going to connect to our database and we're going to basically set an entire subject. The subject is, it's basically like an entity. We're going to define a whole entity or you could refer to it as an object. That's a perfectly fine way to look at it, I think. It's going to have some identifier. That identifier could be anything. It doesn't have to be a string. It could really be anything, but we're going to give it a string called emacs-init. It does not matter what this identifier is, at least for our purposes. It does matter when you're linking to it, but I think nothing that I'm about to show you. This could truly be anything because it has a separate name. I'm giving it a name here, which is init. I'm just specifying the named type here and I'm specifying the bookmark type here and its values. We're going to do that. Now, if we look, we see everything we saw before, but now we have subject emacs-init. It has a type and it's named. We also see the same thing two lines down. This type is also a bookmark, but the name is init in the named/name, which is like the type is name, named, and the property is name. It's init. As you can see, this is just how everything looks. It's pretty straightforward and you can retrieve it. Now, we're looking at the database, but you don't really have to look at the database for... In fact, I think we're done looking at the triples format, because I think it's very simple. You've already got the hang of it, I think.
[00:09:31.366] Let's retrieve that just to make sure, yes, we can retrieve it. We're going to retrieve it and say, okay, what do we get when we load the emacs-init subject? Well, we get a plist of all of its properties, which then you can use in your application. There's many more ways to retrieve and there's many more ways to save. In fact, I think the way I did it here with set subject is probably not the right way to do it most of the time. It's usually because it'll erase everything. It's only really to be used when you're sure you control all the data, but you're never sure because there could be other packages that are also using this database, they could have their own data. You don't want to erase all the other Emacs init subject data. In this case, we did because we are confident it was a new entity, but in most cases, the right thing to do is just set it by type. Just say, we're just going to set the bookmark type, which is... The properties are this, and then the name type and the properties are that. That's a way that you could do things. There's also many retrieval types. The retrieval types: you can retrieve by a number of different ways, which I'm not going to get into, but you can read about in either the source or the readme in the package.
[00:11:10.034] We have backlinks as well. Let me explain what backlinks are. There's another feature of the triples. As I mentioned, these things can be thought about as links, but what could be a link in one direction could also be a link in the other direction, and we basically get this for free. Here's an example where we are again connecting toward bookmark. Here we're going to add a new type called tagged. We're going to give everything in tags. Here we're saying, okay, there's a type called tagged and it has a property called tags. This is not unique, so it's a list basically. It's a list of string. There also is a type called tag. This is for things that are tags themselves. Then it has a type called numbers and it has what we call a virtual reversed property. It's virtual because it's not actually stored. We just compute it by reversing the subject and the object. This is on tagged tags. When we query this, we can just say, okay, what are all the subjects that have tagged tags of me, the tag? In fact, let me demonstrate that for you. We're going to set the type on emacs-init. We're going to add the "tagged" type. This is the alternate way of setting data that I mentioned. This won't erase anything else. We're just adding something here. We're adding this type "tagged" to our previous bookmark emacs-init. We're going to add emacs and config as the tags. We're going to then set emacs as a tag and config as a tag. That just lets us have this virtual property. You have to do something. You can't get it out of thin air. The design decision we've made is: you at least need to tag it before you get the free property. What you should see... Let's try it out. We got the subject config, which we've set no data on. You can tell we're not sending any data. If I get that subject, the result is that it says its members are emacs-init. That's what a virtual reverse property. As we tag more things, this just continues to work because it's just doing a SQL query here.
[00:13:59.380] Besides showing off the backlinks function, this also shows off the general way you can have extensible entities. That is, it's possible that someone writes a bookmarks package that stores everything in a database, in the triples database, but then someone else can come and say, okay, I'm going to define my own types that's meant to work with this database, just like someone could do what I did here, just to add simple tagging. It's very easy to do. This stuff is not that easy to do otherwise. To do this in Lisp, I would say it's a little awkward. With databases, again, it's not only possible, it's relatively trivial, especially with this kind of database. The benefit is it's super easy to work with. With this kind of generic database, the drawback is it's not all that efficient as a special purpose table that is really built for efficiencies. A lot of times you have to do multiple lookups and things like that. Again, it's a trade-off for various things. As you can see, this is like one database for everything. That means that we don't have to all contribute to one giant database. All the packages that use triples, it doesn't have to be one database, but it's cool if it does. I don't know what I want to happen or what I expect to happen, but I think an interesting property is that this is a way for lots of data to live together and build off each other in ways that I think are hard to do with other forms of table layouts and things like that.
[00:15:57.008] Let's talk about a use of it, which is the EKG package. The EKG package is something I've written to demonstrate the triples library and use it for something I think is interesting, which is personal knowledge management systems of the same type, of the same genre that Org Roam is, but with different design decisions. I'll show it in action for a little bit. Let's just look at... it's all tag-based, same kind of tags we saw before when we were playing around with the bookmarks example application, What I want to show is: I can look at a tag. but everything here is... I could see notes with that tag. Everything you see here is in the database, no files involved. All of this is just a thing that's an entire object, the entire string. It works. It has tags. You can see not only this tag, but all the other tags associated with it and their notes. It's kind of an interesting way to do things. When you capture it, I think it's interesting. There's a lot of interesting design elements here. This tags thing is not part of the buffer. It's not like Org Roam. You see here in this other tags, these are things I've imported from Org Roam. This is why they look like they do. They have their own titles because I just wrote them all in Org Roam. What it looks like, really, for these notes is that it's just text. You really don't have to bother with this metadata. If I want another tag like pancakes, I can just add it here. Again, these tags will turn into data, triple data. The text is just a triple date[??] as well, but different to triple data. All these things are like that. You can open up any of these things, et cetera.
[00:18:12.772] I think the interesting thing here is to see the code. It's not super interesting to look at code for too long, but we don't have that long. Whenever we connect-- I just want to point out a few things. Whenever we connect, we have a schema. We just do this. Every time we connect, we just make sure it has the right schema. This way, the user is up to date. This schema just looks exactly like stuff I showed you in the triples when we were looking at bookmarks. It's not complicated. I even have stuff here with people to use this as a person database. I haven't figured out how I'm going to use this yet, but you can see just to do this, it's really trivial and it's pretty easy. Let's show a few other things, like getting all the tags. Here, we could just say, let's get the subjects of type tag. We have all the triples, all the ones that are of type tag, all the things that have a subject. All the subjects that have links, that have this type, this tag in it, we can return them all. of all these objects. Basically, it just gives you a list Again, you can think of these things as objects. All the objects of type tag, we'll just get them all. Super, super simple. Triples gives you this functionality out of the box. It's not that complicated.
[00:19:43.780] What I would like to show, and that shows my thesis for this whole talk, is this rename tag. Now, think about how you would rename a tag in Org Roam or anything where the tag is part of the files. It's like how you would re-tag everything in Org Mode. It's complicated and error-prone and slow. This is anything, this is instantaneous and super easy. Look, that's it. There's not that many places for an error to live here. One thing I would like to point out is that we are doing direct, not everything has to go through the triples package. Maybe it should, but the triples package is a fixed format, which is why it's okay-- whether it's a good idea, I'm not sure, but it's okay for client packages to just directly manipulate the tables. Here, we're just doing it just to update all the tags and then remove and set types so that the correct thing happens. As you can see, it's super, super simple.
[00:20:52.147] I think this proves my thesis about the advantages of applications with SQLite. Thank you for listening. I hope this puts ideas in your minds about taking advantage of this functionality. I hope to see more things using the triples library or otherwise that take advantage of this. Thank you for your time.
Captioner: sachac
Questions or comments? Please e-mail emacsconf-org-private@gnu.org