Back to the schedule
Previous: Experience Report: Steps to "Emacs Hyper Notebooks"
Next: Moving from Jekyll to OrgMode, an experience report

README-Driven Design

Adam Ard

Download compressed .webm video (26.6M)
Download compressed .webm video (21.4M, highly compressed)
View transcript

Many source code projects these days begin with a README file. While most people use markdown, if you use org-mode, you can use literate programming to generate all of your source code directly from the documentation. This strategy is a great way to keep your documentation from getting outdated, and it allows you to use all the other wonderful features of org-mode. Watch "README-Driven Design" to see exactly how to make your README file a powerful literate document.

  • Actual start and end time (EST): Start: 2020-11-28T14.15.00; End: 2020-11-28T14.34.46

Questions

If you put all your code in an Org file (in addition to prose), doesn't that make the file very large for medium/large projects? (Since all the code across all files is tangled from a single README.org)

You are right it would get pretty large. I haven't hit that point yet, but plan to experiment with separate Org files that are imported into a master file.

If a collaborator edits the tangled file(s), is reverse-tangling in Org reliable? How do you integrate the reverse in a safe way?

So, I actually think this is the big unsolved problem right now. How to do reverse tangling. As far as I know, Emacs doesn't do that. But it would really cool. I think it is probably a hard problem.

  • Actually it does! You have to enable comments that mark the boundaries of the code blocks. (org-babel-detangle) -> org-babel-detangle is pretty fragile right now.
  • Oh wonderful! I will have to check that out. There is always more to discover in Emacs. Thanks!

Would this approach make it harder to collaborate with contributors who don't use Org? / How to rectify these difficulties? (Thank you!)

I have had some sucess at work by managing an Org file myself, then I commit the tangled code and a README.md. I have to manually update my Org file though when someone makes a change to the raw source files. That process can be a pain. It would be awesome to find a way to make this easier. So that non Emacs users can collaborate and be unaware of the source Org file. To have an annotation free reverse tangling process would be the holy grail of literate programming. Would be a great thesis project for someone.

Interesting. Did you ever use this approach on a large project? Could one incorporate also TDD into this workflow?

I have only really hit the medium size. But would love to try a larger one. I have seen people write whole books in literate progamming though. (Not sure if they were using Emacs) (one example: http://www.pbr-book.org/ ). Here is a pretty large one I found on GitHub: https://github.com/nakkaya/ferret.

TDD is an interesting idea. I haven't tried doing that, but Org seems flexible enough to build a workflow around that.

Could you share the snippet for adding these source code blocks, it seems much better than the one I am using currently. Thanks!

Sure, it is documented in the literate programming demo here (https://github.com/adam-ard/literate-demo).

In Python, indentation is part of the syntax. How is this handled when <>-syntax is used for functions or even a few lines of codes that are get re-used in multiple functions? Does the user have to define different <> snippets for different indentations but otherwise identical code?

Not the speaker, but :noweb will add the prefix characters to all lines, see https://orgmode.org/manual/Noweb-Reference-Syntax.html. Python identation is fine (and used as an example in the manual :))

  • Exactly, I have done a lot of Python this way, it works great!

Could this structure be used with a SQL query with the output being an Org table?

Yep, I have done that before too. Org will send the query to a database and insert the results. It is super nice. You can add block properties to set the hostname of the database too, so it isn't limited to just databases running on your local machine.

Why do you export to Markdown when GitHub and others are supporting rendering Org directly?

Good question. I do this because I usually work with people that don't use Emacs :( so I usually take the source files and the Markdown and commit them to Git. I keep the Org file to myself. If everyone used Emacs, I wouldn't bother with that step.

This file would be very useful to have for us for reference, could you also share it please?

Yep! See the links below for a couple template files. An extended one from the talk is at: https://github.com/adam-ard/literate-demo

Notes

Transcript

00:00:03.600 --> 00:00:37.120 Adam: Hello! Welcome to Readme Driven Design in Emacs by Adam Aard. If you're a programmer, you're accustomed to putting a README file at the root of your project. It's usually a Markdown file But if you use an Org Mode file instead, you can take advantage of the great features that Org Mode provides, including literate programming, which lets you generate your source code and Markdown documentation dynamically. I want to walk you through a little bit of what this looks like.

00:00:37.120 --> 00:01:03.520 When you start a project, especially if if you use something like Github you begin with an automatically generated README.md file. So just delete that and instead create a README.org file. Starting with an empty Org file, like you see here, you can begin by recording important information about your project goals. You can add diagrams, code snippets, to-do lists, time tracking and much more.

00:01:03.520 --> 00:01:38.880 I'm going to drop in some documentation that I've written about about my project here, so you can see what this would look like. As you can see, I have a title, and a description, and then a subsection as well as some code snippets. You can see that Org Mode does a great job of formatting lists, code sections, diagrams, and so forth. It's as good or better than Markdown, but when you use it in Emacs you can do a lot more.

00:01:38.880 --> 00:02:08.000 For example, you can dynamically create diagrams using Graphviz from a text description. If you go to this source block here and hit C-c C-c, you'll see that we generate a diagram dynamically You can run these code snippets in place and get the results to show up inside of your file, which is a really powerful paradigm.

00:02:08.000 --> 00:02:19.520 But most importantly, for my purposes here, Org Mode provides you the ability to do literate programming.

00:02:19.520 --> 00:02:34.720 So take a quick look at this diagram that I generated here. It gives you a quick overview of what I mean by literate programming and how I'm using it. You can see that we start with a README.org file on top.

00:02:34.720 --> 00:03:17.120 At this point, we can do one of two things: tangle or weave. Tangle is used to describe the process of generating source code, while weave is the process of generating documentation. These are terms that Donald Knuth used. He's the one that came up with the idea of literate programming in the early 1980s. But this is really all that there is to it. You just... You are simply using a literate source file, in this case the README.org, to generate the rest of the project files, basically.

00:03:17.120 --> 00:03:59.479 So let's dig in to the details of how this works. I hope you... Hopefully you'll see how cool this is. So returning to the file here. Let's assume we have enough documentation now, that we want to get started coding. So maybe we'll just start with like a Hello World app, just so we can make sure that our environment is set up correctly. Let's get started with a code block. So I created a little snippet to help me add a source block for literate programming quickly. There's not much to it, but there are some important annotations here.

00:04:01.599 --> 00:04:55.360 Excuse me. There's a property called :tangle and that takes a value of a file name. Then there's also a :noweb property called no-export. Basically, the noexport--we'll explain that a little bit more later It has to do with how the tangling is done in the tangle step versus the weave step. I'll explain that a little bit more. But the tangle field just simply tells Emacs where it needs to generate the main.go file and where it needs to put it on the file system.

00:04:55.360 --> 00:05:21.520 You'll notice that we're going to use Go. That's just the language that I've been using the most lately, but this programming strategy is language-agnostic. You could use any language or any mix of languages. You could create some files in Python, some files in Go, some files in Lisp, or whatever you want.

00:05:24.720 --> 00:05:56.400 Let's create just a little Hello World. Let's use another snippet here to generate the basics of a Go program. I'm just going to print Hello World. So that's... And then let's make it a section in our file. So now you can see, we've got this snippet.

00:05:56.400 --> 00:06:42.319 When you have a source block in inside of Org Mode, you can easily pop into a language-specific buffer by typing C-c ' (single quote). So you can see, now I have a buffer that's in go-mode and gives you all the ability to edit like you would normally. If you hit C-c ' (single quote) again, it goes back and any changes you make will be updated there. But you can do quite a bit just inside of here too. There's quite a bit of language-specific functionality just in place, so you don't always have to go over to a separate buffer. It's a nice option sometimes.

00:06:42.319 --> 00:07:12.240 Now that you have the code in here, you're going to want to run it. Right now, it just lives here in this documentation. You need to get a copy of it into a separate file, and that's the tangle process that you you need to follow there. So I'm gonna drop in a little bit more doc, a little bit more documentation really quick here.

00:07:12.240 --> 00:07:44.879 Okay, all right. So just as a side note, I like to follow this process. Whenever I have an operation to perform, I'd like to document it here with a snippet that can be executed inline. Then I don't have to leave Org Mode, and I don't have to try to remember what I did later. So instead of just trying to do an operation, the first time I do something, I take the time to figure out what it is and document it, so then it's recorded.

00:07:44.879 --> 00:08:14.400 So here we find that to do a tangle operation, you run the command org-babel-tangle, which is an Elisp command. If you hit C-c C-c to run it in place, you get the result of main.go, which basically is telling us that we've tangled one file called main.go. You can see that that's true if you go to the file system and you look.

00:08:14.400 --> 00:08:41.120 Now in our demo directory, we have a README.org, we have that PNG that we generated, but we also have a main.go. If you visit that file, you'll see that it's just the source code that was in our documentation, which is exactly what we expected and what we wanted. So that's good. So if we return to where we were at...

00:08:41.120 --> 00:09:43.012 Now we're at the point where we have a file on the file system. Now we need to build it and to run it. So let's follow the same philosophy, where let's document these operations that we're going to perform. I'm dropping in a a build instruction section and a run instruction section. As you can see here, we have a little a bash source block, and another bash source block. This one compiles. The go build command is what compiles a file. Then the file that gets generated should be called demo. So we just run it here. If I type C-c C-c, we get an empty results block. When you compile things, no news is good news. It means there's no errors.

00:09:43.012 --> 00:10:30.839 So presumably, we've created an executable that's called demo. Let's look again at the file system and regenerate... Yep. What we have here is a demo executable, which is exactly what we wanted. Let's go back. Now we should be able to run it. C-c C-c, and we get Hello World as a result, which was exactly what we were expecting. So that's already pretty cool. You can do that much.

00:10:33.040 --> 00:11:09.760 That's really just the tip of the iceberg. To really use the more impressive features of literate programming, we need to do a little bit more at least. Really, to get the full benefit of it, we need to add some sections that will cause Emacs to have to tangle or assemble this file from different pieces.

00:11:09.760 --> 00:11:36.240 Imagine that we wanted to take this file and maybe kind of templatize it. So, using literature programming syntax, this angle bracket syntax, let's say that we want to create an imports section, a functions section, and then maybe just a main section. We'll get rid of this.

00:11:36.240 --> 00:11:56.639 So now you see, we've created something that looks a little bit like a template or a scaffolding or outline for what our file is going to be. It looks a little bit like pseudocode. What we're going to have literate programming do is dynamically insert those things into those slots.

00:11:56.639 --> 00:12:36.639 So the first thing we need to do is... So let's create a section called "Say Hello." We want to add some functionality that makes our program say hello. So using a different snippet that I have for creating something that I call like a literate section, basically, we create a another source block that's almost the same as the one for the file. It just has a few differences. Say we want to drop code into the import section and we want it to be in Go.

00:12:36.639 --> 00:13:14.399 Here we use the same :noweb no-export syntax, but then we've added this :noweb-ref imports, and this ties that slot to this reference. It tells Emacs that when you tangle, we want to stick whatever's in here in that spot. You skip the tangle file name section because you're not actually creating a file name. You're putting information into an existing file. So here, we would just add the "fmt" for the imports.

00:13:14.399 --> 00:14:10.320 Let's add another section for functions. Let's just create a function called sayHello that doesn't have any arguments. No return types. All it does is pretty much the same thing as we did before: just print something. Let's just say "Hello EmacsConf" this time. Now we have a function, and now the function won't do anything unless we invoke it. Let's do one last literate section called main. Make that Go source block. Then let's invoke that function.

00:14:10.320 --> 00:14:39.839 Now you can see that we've got our scaffolding outline, and then we have the sections that we want to get tangled or inserted. I've used this syntax. It's kinda borrowed from literate programming a little bit with a +=, so really it's just saying that I want to append this item into the import section It's really just to make a little bit more clear what's going on.

00:14:39.839 --> 00:14:57.760 When you generate documentation, you won't see these particular property annotations, and so you won't know immediately that this section goes in the imports area. So I usually put a little bit of documentation on top there, so that it's easy to see.

00:14:57.760 --> 00:15:21.120 You would, probably, if this was very complicated, you'd put some documentation above to explain what you were doing, maybe right here. You could picture yourself maybe explaining a complicated algorithm or something up here and having a nice way to document it.

00:15:21.120 --> 00:15:28.045 So now that we've got that here in the documentation, we need to figure out... We need to make sure that it's going to tangle properly.

00:15:28.045 --> 00:16:20.479 Your best friend at this point is a keyboard shortcut that lets you preview the tangled operation. If you say C-c C-v C-v, it will create a new buffer with the tangled contents and so you can see here that the fmt import went to the right place, that function went to the right place, the function invocation went to the right place. We're feeling good. You can nest these things many layers deep. If you came into the sayHello function, you could add more sections. It'll go through and it'll keep track of all that and tangle it for you so you really get a lot of freedom and flexibility for how you want to document things by doing this.

00:16:20.479 --> 00:16:57.645 So now that we've previewed it and we feel good about it, we need to tangle so we get the file on the file system. so C-c C-c and get... just main.go comes back again. C-c C-c and no errors come back. Then if we did this right, when we run this, we should get "Hello, EmacsConf." So C-c C-c, Hello EmacsConf. I think that's pretty, pretty cool, actually.

00:16:57.645 --> 00:17:23.280 So we've got the breadcrumbs of the process we've gone through to get to this point, this initial document that has some tangling in it. We have documentation for how to tangle, how to build, how to run. We've really built a nice foundation for moving forward on our project and a nice way of breaking things out and documenting further.

00:17:23.280 --> 00:17:38.640 The last piece that we need to take care of is the weave that I showed you in the diagram above. So one more time, we'll drop in

00:17:38.640 --> 00:18:35.520 some documentation, this time on how to weave. It's really just an export function. it's not... There's not a separate weave command going on here. we're just going to export what we've got here into a Markdown format. We're using org-gfm-export-to-markdown, which is the Github style markdown. You can use the other, more standard type as well. Hit C-c C-c. Now you see we've got a README file, and if you look in the file system, we've got that right there. If you go to something like ghostwriter and open that file, now you can see that it's generated some documentation.

00:18:35.520 --> 00:18:48.559 It puts a index at top at the top. I usually turn that off. It's easy to do that by putting a property at the top of your Org file, but some people like to have an index.

00:18:48.559 --> 00:19:22.802 Here you can see that it has generated pretty nicely and formatted snippets well, put the diagram in there, and then it's preserved this literate programming syntax, which is important because that's how we want to view the documentation. That's what the no-exports property was trying to maintain. no-exports means when you export, do not try to tangle.

00:19:22.802 --> 00:19:43.600 Hopefully that makes more sense now. Now you can see all the documentation. I think it demonstrates a pretty useful feature that's inside of Emacs. Hopefully you'll have as much fun using that as I have. So thanks!

Saturday, Nov 28 2020, ~ 2:18 PM - 2:38 PM EST
Saturday, Nov 28 2020, ~11:18 AM - 11:38 AM PST
Saturday, Nov 28 2020, ~ 7:18 PM - 7:38 PM UTC
Saturday, Nov 28 2020, ~ 8:18 PM - 8:38 PM CET
Sunday, Nov 29 2020, ~ 3:18 AM - 3:38 AM +08

Back to the schedule
Previous: Experience Report: Steps to "Emacs Hyper Notebooks"
Next: Moving from Jekyll to OrgMode, an experience report