Unlocking linked data: replacing specialized apps with an Org-based semantic wiki
Abhinav Tushar (he/him) - abhinav@lepisma.xyz, https://lepisma.xyz, @lepisma@mathstodon.xyz
Format: 12-min talk ; Q&A: Etherpad
Etherpad: https://pad.emacsconf.org/2024-links
Status: TO_FOLLOW_UP
Duration: 11:21 minutes00:00.000 Specialized Apps and Linked Data 01:30.000 Discovering Org Roam and Linked Notes 02:53.000 Enhanced Org Roam Buffer: Rich Links and Similar Nodes 06:35.000 Semantic Search on Link Contexts 08:26.000 Exposing notes outside Emacs 10:38.000 Future Directions and Potential Improvements
Description
I try to maintain a lot of personal information, annotations, etc. in Org files but have historically switched back to purpose built apps for different kinds of data. There are recipe managers for recipes, personal CRM tools for people related notes, bookmark managers for managing web links, etc. While these apps do good with the kind of data they work on, they don't operate well together in the sense that they don't treat links between entities as first class citizen. I believe this gap is where a lot of personal information live. As an example, consider the chain of links that tells 'person a' gave me 'this recipe' on 'my anniversary'.
After using zettlekasten via Org-roam for some time, I came to realize the power of links that we (can) form between data of different kinds. For me, these links offset the loss that comes with leaving specialized apps. With this, I have again gone back to Org files, but this time deriving good value from links between notes. Of course there are tons of other benefits of using Org files like better longevity, portability, versioning, and developer accessibility.
In this talk, I will cover my workflow of creating and managing different kinds of notes in Org mode based Semantic Wiki and the link types they tend to have. I will also show my workflow outside of Emacs, where I use small tools that sit on top of Org files to deliver missing features of niche apps (like availability on mobile devices, smart cross data-type queries, etc.).
About the speaker:
I am a Programmer and Machine Learning Engineer, and I love working with computers primary because of the early experiences of infinite extensibility that Emacs gave me. For this talk I will cover my journey of using Org files for notes, then leaving for specialized applications, and finally coming back to Org to unlock the benefits of linked data.
Another talk by this speaker:
- EmacsConf - 2023 - talks - MatplotLLM, iterative natural language data visualization in org-babel
Discussion
Questions and answers
- Q: Have you thought about doing the cosine similarity and sentence
transformer calculations in Elisp so you don't need a separate
Python process? In my experience having to set up and manage
additional state throws people off track.
- A: I do want to try removing the dependency. But I haven't yet done any work in that direction. Mostly the problem is that model (for transformers) runtimes are much easier available in other languages. But if there is an ONNX runtime (or dynamic module) for Elisp, we should be able to do this.
- Thanks, I can try writing an ONNX runtime module, this can be useful for several Emacs tasks besides semantic linking.
- Q: So far I have not used packages such as org-roam because I do not
like the idea that it might become unmaintained some day. So I keep
to the basic features in org for my workflow. Did you consider this
aspect?
- A: I thought about this too. But I have found the internals of
org-roam simple enough that I don't think maintaining a fork is
any hassle. Anyway it uses features already available in
org-mode. The only development addition it does is, IMO, to
maintain an SQLite index.
- Thank you for your advice. I'll take another look at org-roam. And thank you for your talk. It was quite inspiring to me.
- A: I thought about this too. But I have found the internals of
org-roam simple enough that I don't think maintaining a fork is
any hassle. Anyway it uses features already available in
org-mode. The only development addition it does is, IMO, to
maintain an SQLite index.
- Q: this is very cool and seems a bit influenced by logseq, which i
am trying to transition away from and on to org roam. have you
looked into somehow embedding the contents of a \"linked\" node into
the parent itself? this is something that i miss quite a lot from
logseq, where the contents were/could be transparently embedded and
made for a nicer review experience
- A: I haven't used logseq. When you say embedding, do you mean like document transclusion? Or something else?
- yes, something like transclusion. quite useful for example in daily journalling where one can just dump the notes instead of figuring out a location. and then link them afterwards in the right file/node.
- In some way, the org-roam buffer I showed shows linked nodes with nearby content. But I haven't done any work on transclusion till now.
- This may be relevant to your question
https://github.com/Vidianos-Giannitsis/Dotfiles/blob/master/emacs/.emacs.d/libs/zettelkasten.org#logseq-like-tagging-functionality.
I don't remember exactly what it does because I don't use it
myself, but I was curious to try and hack it after a discussion
and it was relevant to how Logseq does transclusion in linked
documents.
- ooh, thanks for the link. this looks rather interesting
- Q: How did you do the similarity search?
- A: Similarity, as of now, is just using embedding vectors from a locally running transformer model and then matching using cosine scores. Code is here https://github.com/lepisma/org-roam-exts/tree/master/org-roam-sem
- Q: Is your ml model for topics like \"family members\" available
somewhere?
- A: https://github.com/lepisma/org-roam-exts/tree/master/org-roam-sem the model I am using is a simple lightweight embedding transforme model. See this line https://github.com/lepisma/org-roam-exts/blob/a71f2ec3bb6bd9d2b21ab5fd70ec45fa18128896/org-roam-sem/src/org_roam_sem/featurize.py#L17C7-L17C77
- Q: is your org-roam config public? (init.el stuff) I've found
vanilla org-mode not the most ergonomic. Thanks!
- A: Do you mean https://github.com/lepisma/org-roam-exts
- Also some of my writing config is here -> https://github.com/lepisma/rogue/blob/master/lisp/r-writing.el
Notes
- This looks very useful, thanks for your work
- Looks really handy! One of the biggest inhibitors to my usage has been figuring out how to collect things on mobile without friction. Will check it out!+1
- Thank you all!
- A few project links from the talk:
- Very interesting talk
- This is super cool 🙂
Transcript
Hello. My name is Abhinav, and I'm going to talk about unlocking linked data in Org Mode. So, like, a lot of you, I use Emacs and Org Mode for keeping a lot of my data, personal information. For example, if I'm trying to, you know, write journal entries, it's most likely going to be an Org Mode file. If I'm trying to save bookmarks or save links, again, they go into an Org Mode file. Now, I was doing that earlier, but, last year, I think I around last year, I started to use these specialized applications. So, you know, for example, if I'm trying to save bookmarks, I'm going to use a bookmark manager. I specifically was using Raindrop for it. What happened with that is that it allowed me to save bookmarks. Let's say, you know, when I'm on go, I'm on a mobile phone, I can just, you know, open my Android app and then save links there. I can also annotate and, you know, do other things that you can do on bookmarks. Similarly, you know, for reading, let's say, papers and PDFs, I would use Zotero. For, keeping notes about people, I'll use a tool called Monica CRM. Now all these tools, their aim is to kind of do one thing really well, but they kind of work in their own silos, and it's very hard to link data from one to another. For example, if you have a journal application, you can say things like, you know, "Hey today, I met this person, and then, you know, this person gave me this recipe," whatever. But you know that the person information is still kept in a different application, and the recipe information is still kept in a different application. You have to, like, you know, like, do a lot of work to kind of make them come together. So, one thing that happened also last year was that I started using Org Roam a lot. So Org Roam is a Zettelkasten system, you know, which allows you to have linked notes. I'll not go too much into that detail, but basically, with Org Roam, you can, you know, have a lot of these text based files that you make anyway and then keep them connected and then, you know, like, have this knowledge base that you can build, around your information, your data. While it's a good system, I still feel like it's not very pro on providing, you know, very good amount of tools for, working with links. I'll show two kind of things. First is that, I'll show how my current knowledge base looks like, what kind of, you know, workflow I kind of use to save, let's say, any information or how do I, like, you know, connect new notes. The other is that because while this information base is working out well for me, I still want all of my external usages to be, you know, to be reflected back into this database of text files. So if I'm browsing something, I still want that thing to be, you know, saved in my Org Mode files, whether I'm browsing on Android or I'm browsing on, let's say, Firefox somewhere on a laptop. So, I'll show you those two things here. It's going to be a short talk, and then, yeah, hope you like it. Okay. So we'll start with this thing. So this is a simple Org Roam node. It's, you know, it's a dummy node. I've made, like, a lot of dummy nodes here just to kind of show, so, you know, maybe some of those information will be sparse, but I hope I convey the meaning clearly. Okay. So here, if you usually, you know, if you're just using plain Org Mode, you just have this file. Right? There's nothing else. Now if you are using Org Roam, you can do something called org-roam- buffer-toggle, which will show you, you know, a few of those connected nodes. Now, usually, the connections are shown here, they only show you backlinks. So, basically, any other node that has linked to this node is going to be shown there. But in my extension, you can see more things. For example, here, in this case, you can see, first of all, we show both links. So any link from this node to something else, will also be shown there. Any link from that node to something else will also be shown there. So you can see, that is one thing. The other thing is that all these links are categorized in, you know, type of notes. For example, when this note specifically right now is of kind book. Tag is one identifier for it, but there are, like, other ways to identify, you know, a kind of note. But this is connected to another node, which is of a kind person. So as you can see, the, Person A asked me to read this book. So, you know, that link is shown there in a very, rich format. So we have, like, more information about the link, in this. The other thing that you can see there is that, there are also links which are not existing right now, but they could be, you know, possibly interesting for me. So these are similar nodes. So these you can see the scores of similarity and then, you know, other nodes like Book on Mathematics, which is another dummy node that I made for this demo. Now these are nodes which, you know, again, I can just go in there and I can see if maybe they make sense or if, you know, I can just, like, make those connections explicitly. So let's let's try some other node here. So let's say so this is a node of a recipe. It's Bombay Sandwich. It's the recipe I made. Now if you go here on the right, you can see, you know, there's, like, some things on, some person liking the sandwich. There are some related notes also, similar notes which I have not linked. Plus, there are some journal entries. Now I use Org-Roam daily to kind of, you know, write down journal entries. And then, you know, what I have to do there is basically just write whatever I want and then just, you know, make those links to this node. So for doing that, I can see, that there were two days where I made Bombay Sandwich, and I had some observations around it, which, you know, you can see here. Right. Same so same for person. Let's say if you go to Person B, you can see that, you know, this person has, again, link to Person A and there's, like, some information around it. Plus, there are some similar notes there. This works really well. The similarity function works really well if you are basically trying to go to, you know, bookmarks that I have saved. So for example, here's a bookmark that I saved, which is, Google at Interspeech 2023. Now this bookmark is a blog post from Google AI team. Basically, you know, tells what research publications they had in this conference. Now if you go to the Similar Nodes here, you can see a very similar blog post from Google's team for other conferences that they attended. Right? Now this is very helpful for me, especially when I'm, like, reading something, later. So I, like, save a lot of links together. And then when I'm deciding to read something, I just open this and then see, you know, how everything is connected, what what else I have saved. Should I read something else or not? One interesting feature I was realizing I should try out is that, you know, if I go to this node, which is Person B, you can see that while I'm linking this to Person A, I also have some context on that. So I've written specifically uncle of Person A. Now if you have a semantic wiki, you will have a typed link where you don't have a plain link. You also have a type of the link. So in this case, the type of the link could be, you know, it's likeuncle:
whatever
that link is. But, you know, I don't
want to, like, go into that much detail,
and I don't want to, like, learn how
to link things, learn what kind of types
I can make. So I can just write
things in plain text. So I've written this
in plain text. What I can do now
is I can just search for links like
this. For example, I can just do something
like family members. Now this will show me
all the links which have a context which
makes sense as family members. So basically, this
is semantic search on links, on the context
of the links, and then, you know, it
kind of gives me what I want here.
For example, here, in this demo, I just
had, like, one node, one link, which had
this uncle relationship. So that kind of works
out. Now let's just try another search. For
example, let's say if I'm just typing 'check
before meeting'. So these are now again links
where I have written something where I kind
of should do something before meeting someone. So
for example, the first one you can see,
there's a person called Meeting Person. It's a
demo node again. And, I've written one note
about, one connection here is basically saying that,
hey, you know, read this link before you
go to meet them. Right? So it's also
been very helpful for me. There are, like,
few patterns where I kind of feel this
works out well. As I keep making more
of the links and keep writing more context
around the link, this kind of works
out really helpful. This becomes really helpful for
me. Okay. So the other few things, you
know, how do I, like, work with, systems
outside Emacs. Right? So the first thing
is that, you know, the I haven't found
anything that works really well for saving bookmarks,
when I'm on my Android phone. So I
had to make a new applications, application, and
it's called pile-android. Now this application basically,
you know, lets me do whatever I was
doing with Raindrop, which was a bookmark manager.
So I can open links. I can read
stuff in Firefox on my browser on my,
Android phone, and then I can save all
of that in my Org Roam database. Org
roam database here means the Org Roam files
that I have. Because, again, these are plain
text file, I can sync them through mobile
phone to my, you know, desktop and laptop
and everything else. So that's one place where
I kind of, you know, stop, going to
a, new application. I just basically ingest everything
in my Org Roam setup. The other thing
is that, when I'm browsing on my laptop,
I still want to, you know, collect all
the data inside my Org Roam system. So
so here's something which I call Org Roam
Sidekick. Now what you can do here is
that, let's say, if you want to search
for something, so you can basically do a
search normally, which is going to do a
web search. But if you call Org Roam
SK, which is Sidekick, it will do a
search on all of your Org Roam notes. So
now this search is basically using recoll. So
recoll kind of indexes all the plain text
and does a full text search for you.
But this this is really helpful because when
I'm searching for something and I still want
to know that, hey, you know, hey, I
have saved some of those links earlier. So,
can I, like, you know, see them back
and then, you know, it's a very
good way to kind of not lose track
of what you've already saved. The other
thing I can do is, like, I can
also, you know again, since I have saved
a project, in my Org Roam, I can
basically call, again, Sidekick again, and I can
see a note for that. That note here
specifically is tracking my tasks for this project.
And other than tasks, you know, again, I
can see other things like similar notes. I
can see, you know, other links that are
there. So yeah, so this, there's still some
optimizations to be done. I think this, you
know, the bookmark here is not very intuitive.
I still want, I want this to be
following the browser, as I switch tabs.
But, again, those things are something I'll work
on. Other optimizations include, you know, the way
I'm doing the search using ML that needs
a little bit of fine tuning because, every
time I make a new link, I have
to, like, rerun the, you know, re kind of
build the features and everything else, which I,
need it to be real time. Yeah. So
that concludes my talk. Hope you enjoyed it.
Let me know if there are any questions.
Thank you.
Captioner: abhinav
Questions or comments? Please e-mail emacsconf-org-private@gnu.org