Unlocking linked data: replacing specialized apps with an Org-based semantic wiki

Abhinav Tushar (he/him) - abhinav@lepisma.xyz, https://lepisma.xyz, @lepisma@mathstodon.xyz

Format: 12-min talk ; Q&A: Etherpad
Etherpad: https://pad.emacsconf.org/2024-links
Status: TO_FOLLOW_UP

00:00.000 Specialized Apps and Linked Data 01:30.000 Discovering Org Roam and Linked Notes 02:53.000 Enhanced Org Roam Buffer: Rich Links and Similar Nodes 06:35.000 Semantic Search on Link Contexts 08:26.000 Exposing notes outside Emacs 10:38.000 Future Directions and Potential Improvements

Duration: 11:21 minutes

Description

I try to maintain a lot of personal information, annotations, etc. in Org files but have historically switched back to purpose built apps for different kinds of data. There are recipe managers for recipes, personal CRM tools for people related notes, bookmark managers for managing web links, etc. While these apps do good with the kind of data they work on, they don't operate well together in the sense that they don't treat links between entities as first class citizen. I believe this gap is where a lot of personal information live. As an example, consider the chain of links that tells 'person a' gave me 'this recipe' on 'my anniversary'.

After using zettlekasten via Org-roam for some time, I came to realize the power of links that we (can) form between data of different kinds. For me, these links offset the loss that comes with leaving specialized apps. With this, I have again gone back to Org files, but this time deriving good value from links between notes. Of course there are tons of other benefits of using Org files like better longevity, portability, versioning, and developer accessibility.

In this talk, I will cover my workflow of creating and managing different kinds of notes in Org mode based Semantic Wiki and the link types they tend to have. I will also show my workflow outside of Emacs, where I use small tools that sit on top of Org files to deliver missing features of niche apps (like availability on mobile devices, smart cross data-type queries, etc.).

About the speaker:

I am a Programmer and Machine Learning Engineer, and I love working with computers primary because of the early experiences of infinite extensibility that Emacs gave me. For this talk I will cover my journey of using Org files for notes, then leaving for specialized applications, and finally coming back to Org to unlock the benefits of linked data.


Another talk by this speaker:

Questions and answers

  • Q: Have you thought about doing the cosine similarity and sentence transformer calculations in Elisp so you don't need a separate Python process?  In my experience having to set up and manage additional state throws people off track.
    • A: I do want to try removing the dependency. But I haven't yet done any work in that direction. Mostly the problem is that model (for transformers) runtimes are much easier available in other languages. But if there is an ONNX runtime (or dynamic module) for Elisp, we should be able to do this.
    • Thanks, I can try writing an ONNX runtime module, this can be useful for several Emacs tasks besides semantic linking.
  • Q: So far I have not used packages such as org-roam because I do not like the idea that it might become unmaintained some day. So I keep to the basic features in org for my workflow. Did you consider this aspect?
    • A: I thought about this too. But I have found the internals of org-roam simple enough that I don't think maintaining a fork is any hassle. Anyway it uses features already available in org-mode. The only development addition it does is, IMO, to maintain an SQLite index.
      • Thank you for your advice. I'll take another look at org-roam. And thank you for your talk. It was quite inspiring to me.
  • Q: this is very cool and seems a bit influenced by logseq, which i am trying to transition away from and on to org roam. have you looked into somehow embedding the contents of a \"linked\" node into the parent itself? this is something that i miss quite a lot from logseq, where the contents were/could be transparently embedded and made for a nicer review experience
    • A: I haven't used logseq. When you say embedding, do you mean like document transclusion? Or something else?
    • yes, something like transclusion. quite useful for example in daily journalling where one can just dump the notes instead of figuring out a location. and then link them afterwards in the right file/node.
    • In some way, the org-roam buffer I showed shows linked nodes with nearby content. But I haven't done any work on transclusion till now.
    • This may be relevant to your question https://github.com/Vidianos-Giannitsis/Dotfiles/blob/master/emacs/.emacs.d/libs/zettelkasten.org#logseq-like-tagging-functionality. I don't remember exactly what it does because I don't use it myself, but I was curious to try and hack it after a discussion and it was relevant to how Logseq does transclusion in linked documents.
      • ooh, thanks for the link. this looks rather interesting :)
  • Q: How did you do the similarity search?
  • Q: Is your ml model for topics like \"family members\" available somewhere?
  • Q: is your org-roam config public? (init.el stuff) I've found vanilla org-mode not the most ergonomic. Thanks!

Notes

Transcript

Hello. My name is Abhinav, and I'm going to talk about unlocking linked data in Org Mode. So, like, a lot of you, I use Emacs and Org Mode for keeping a lot of my data, personal information. For example, if I'm trying to, you know, write journal entries, it's most likely going to be an Org Mode file. If I'm trying to save bookmarks or save links, again, they go into an Org Mode file. Now, I was doing that earlier, but, last year, I think I around last year, I started to use these specialized applications. So, you know, for example, if I'm trying to save bookmarks, I'm going to use a bookmark manager. I specifically was using Raindrop for it. What happened with that is that it allowed me to save bookmarks. Let's say, you know, when I'm on go, I'm on a mobile phone, I can just, you know, open my Android app and then save links there. I can also annotate and, you know, do other things that you can do on bookmarks. Similarly, you know, for reading, let's say, papers and PDFs, I would use Zotero. For, keeping notes about people, I'll use a tool called Monica CRM. Now all these tools, their aim is to kind of do one thing really well, but they kind of work in their own silos, and it's very hard to link data from one to another. For example, if you have a journal application, you can say things like, you know, "Hey today, I met this person, and then, you know, this person gave me this recipe," whatever. But you know that the person information is still kept in a different application, and the recipe information is still kept in a different application. You have to, like, you know, like, do a lot of work to kind of make them come together. So, one thing that happened also last year was that I started using Org Roam a lot. So Org Roam is a Zettelkasten system, you know, which allows you to have linked notes. I'll not go too much into that detail, but basically, with Org Roam, you can, you know, have a lot of these text based files that you make anyway and then keep them connected and then, you know, like, have this knowledge base that you can build, around your information, your data. While it's a good system, I still feel like it's not very pro on providing, you know, very good amount of tools for, working with links. I'll show two kind of things. First is that, I'll show how my current knowledge base looks like, what kind of, you know, workflow I kind of use to save, let's say, any information or how do I, like, you know, connect new notes. The other is that because while this information base is working out well for me, I still want all of my external usages to be, you know, to be reflected back into this database of text files. So if I'm browsing something, I still want that thing to be, you know, saved in my Org Mode files, whether I'm browsing on Android or I'm browsing on, let's say, Firefox somewhere on a laptop. So, I'll show you those two things here. It's going to be a short talk, and then, yeah, hope you like it. Okay. So we'll start with this thing. So this is a simple Org Roam node. It's, you know, it's a dummy node. I've made, like, a lot of dummy nodes here just to kind of show, so, you know, maybe some of those information will be sparse, but I hope I convey the meaning clearly. Okay. So here, if you usually, you know, if you're just using plain Org Mode, you just have this file. Right? There's nothing else. Now if you are using Org Roam, you can do something called org-roam- buffer-toggle, which will show you, you know, a few of those connected nodes. Now, usually, the connections are shown here, they only show you backlinks. So, basically, any other node that has linked to this node is going to be shown there. But in my extension, you can see more things. For example, here, in this case, you can see, first of all, we show both links. So any link from this node to something else, will also be shown there. Any link from that node to something else will also be shown there. So you can see, that is one thing. The other thing is that all these links are categorized in, you know, type of notes. For example, when this note specifically right now is of kind book. Tag is one identifier for it, but there are, like, other ways to identify, you know, a kind of note. But this is connected to another node, which is of a kind person. So as you can see, the, Person A asked me to read this book. So, you know, that link is shown there in a very, rich format. So we have, like, more information about the link, in this. The other thing that you can see there is that, there are also links which are not existing right now, but they could be, you know, possibly interesting for me. So these are similar nodes. So these you can see the scores of similarity and then, you know, other nodes like Book on Mathematics, which is another dummy node that I made for this demo. Now these are nodes which, you know, again, I can just go in there and I can see if maybe they make sense or if, you know, I can just, like, make those connections explicitly. So let's let's try some other node here. So let's say so this is a node of a recipe. It's Bombay Sandwich. It's the recipe I made. Now if you go here on the right, you can see, you know, there's, like, some things on, some person liking the sandwich. There are some related notes also, similar notes which I have not linked. Plus, there are some journal entries. Now I use Org-Roam daily to kind of, you know, write down journal entries. And then, you know, what I have to do there is basically just write whatever I want and then just, you know, make those links to this node. So for doing that, I can see, that there were two days where I made Bombay Sandwich, and I had some observations around it, which, you know, you can see here. Right. Same so same for person. Let's say if you go to Person B, you can see that, you know, this person has, again, link to Person A and there's, like, some information around it. Plus, there are some similar notes there. This works really well. The similarity function works really well if you are basically trying to go to, you know, bookmarks that I have saved. So for example, here's a bookmark that I saved, which is, Google at Interspeech 2023. Now this bookmark is a blog post from Google AI team. Basically, you know, tells what research publications they had in this conference. Now if you go to the Similar Nodes here, you can see a very similar blog post from Google's team for other conferences that they attended. Right? Now this is very helpful for me, especially when I'm, like, reading something, later. So I, like, save a lot of links together. And then when I'm deciding to read something, I just open this and then see, you know, how everything is connected, what what else I have saved. Should I read something else or not? One interesting feature I was realizing I should try out is that, you know, if I go to this node, which is Person B, you can see that while I'm linking this to Person A, I also have some context on that. So I've written specifically uncle of Person A. Now if you have a semantic wiki, you will have a typed link where you don't have a plain link. You also have a type of the link. So in this case, the type of the link could be, you know, it's like uncle: whatever that link is. But, you know, I don't want to, like, go into that much detail, and I don't want to, like, learn how to link things, learn what kind of types I can make. So I can just write things in plain text. So I've written this in plain text. What I can do now is I can just search for links like this. For example, I can just do something like family members. Now this will show me all the links which have a context which makes sense as family members. So basically, this is semantic search on links, on the context of the links, and then, you know, it kind of gives me what I want here. For example, here, in this demo, I just had, like, one node, one link, which had this uncle relationship. So that kind of works out. Now let's just try another search. For example, let's say if I'm just typing 'check before meeting'. So these are now again links where I have written something where I kind of should do something before meeting someone. So for example, the first one you can see, there's a person called Meeting Person. It's a demo node again. And, I've written one note about, one connection here is basically saying that, hey, you know, read this link before you go to meet them. Right? So it's also been very helpful for me. There are, like, few patterns where I kind of feel this works out well. As I keep making more of the links and keep writing more context around the link, this kind of works out really helpful. This becomes really helpful for me. Okay. So the other few things, you know, how do I, like, work with, systems outside Emacs. Right? So the first thing is that, you know, the I haven't found anything that works really well for saving bookmarks, when I'm on my Android phone. So I had to make a new applications, application, and it's called pile-android. Now this application basically, you know, lets me do whatever I was doing with Raindrop, which was a bookmark manager. So I can open links. I can read stuff in Firefox on my browser on my, Android phone, and then I can save all of that in my Org Roam database. Org roam database here means the Org Roam files that I have. Because, again, these are plain text file, I can sync them through mobile phone to my, you know, desktop and laptop and everything else. So that's one place where I kind of, you know, stop, going to a, new application. I just basically ingest everything in my Org Roam setup. The other thing is that, when I'm browsing on my laptop, I still want to, you know, collect all the data inside my Org Roam system. So so here's something which I call Org Roam Sidekick. Now what you can do here is that, let's say, if you want to search for something, so you can basically do a search normally, which is going to do a web search. But if you call Org Roam SK, which is Sidekick, it will do a search on all of your Org Roam notes. So now this search is basically using recoll. So recoll kind of indexes all the plain text and does a full text search for you. But this this is really helpful because when I'm searching for something and I still want to know that, hey, you know, hey, I have saved some of those links earlier. So, can I, like, you know, see them back and then, you know, it's a very good way to kind of not lose track of what you've already saved. The other thing I can do is, like, I can also, you know again, since I have saved a project, in my Org Roam, I can basically call, again, Sidekick again, and I can see a note for that. That note here specifically is tracking my tasks for this project. And other than tasks, you know, again, I can see other things like similar notes. I can see, you know, other links that are there. So yeah, so this, there's still some optimizations to be done. I think this, you know, the bookmark here is not very intuitive. I still want, I want this to be following the browser, as I switch tabs. But, again, those things are something I'll work on. Other optimizations include, you know, the way I'm doing the search using ML that needs a little bit of fine tuning because, every time I make a new link, I have to, like, rerun the, you know, re kind of build the features and everything else, which I, need it to be real time. Yeah. So that concludes my talk. Hope you enjoyed it. Let me know if there are any questions. Thank you.

Captioner: abhinav

Questions or comments? Please e-mail emacsconf-org-private@gnu.org