Haskell code exploration with Emacs
Yuchen Pei (he/him/himself/his/his, IRC: dragestil, id@ypei.org)
In this talk, Yuchen Pei demonstrates an Emacs package for exploring Haskell code and org documentation generated by a Haddock org backend. Afterwards, he will handle questions via BigBlueButton.
The following image shows where the talk is in the schedule for Sat 2022-12-03. Solid lines show talks with Q&A via BigBlueButton. Dashed lines show talks with Q&A via IRC or Etherpad.
Format: 30-min talk followed by live Q&A (done)
Etherpad: https://pad.emacsconf.org/2022-haskell
Discuss on IRC: #emacsconf-dev
Status: TO_CAPTION_QA
Saturday, Dec 3 2022, ~2:05 PM - 2:35 PM MST (US/Mountain)
Saturday, Dec 3 2022, ~1:05 PM - 1:35 PM PST (US/Pacific)
Saturday, Dec 3 2022, ~9:05 PM - 9:35 PM UTC
Saturday, Dec 3 2022, ~10:05 PM - 10:35 PM CET (Europe/Paris)
Saturday, Dec 3 2022, ~11:05 PM - 11:35 PM EET (Europe/Athens)
Sunday, Dec 4 2022, ~2:35 AM - 3:05 AM IST (Asia/Kolkata)
Sunday, Dec 4 2022, ~5:05 AM - 5:35 AM +08 (Asia/Singapore)
Sunday, Dec 4 2022, ~6:05 AM - 6:35 AM JST (Asia/Tokyo)
Talk
00:00.000 What is Haskell? 00:30.520 Parts of a Haskell program 01:33.640 Example of Haskell source code 02:13.400 Writing Haskell like Lisp 02:37.160 What is a code explorer? 03:53.760 Prior art 04:56.240 Haskell mode 05:46.080 Jumping to declarations 06:43.560 Finding references 07:24.840 The Haskell language server 08:20.520 Hoogle and Hackage 08:54.960 Haskell Code Explorer 09:34.600 Demo of Haskell Code Explorer 10:42.080 Learning about monads 12:35.480 Web client 13:39.920 User freedom 14:47.800 hc.el 15:38.560 Demo 16:46.520 Declarations 17:38.920 Finding definitions and references 18:19.160 Eldoc 19:22.360 Searching for identifiers 20:32.560 Help buffer integration 22:01.440 Haddock 23:28.840 Servant 24:30.480 Org 25:50.320 Links 26:19.280 Navigation 28:41.160 Going the other direction
Q&A
01:42.120 Does it work with offline documentation? 03:50.720 What is the state of integration of Haskell with Emacs in 2022? 09:01.680 Have you tried any projects in literate Haskell? 12:51.360 Is the indexing faster when re-indexing? Would it be too slow to re-index on-demand?
Description
In this talk I will describe and demonstrate some tools for Haskell development using Emacs, including code exploration using hcel and haddock org documentation generation using haddorg.
Bio
Yuchen is a programmer and mathematician. He co-maintains librejs and h-node.org, and a couple of Emacs packages at GNU ELPA. He is also a licensing volunteer with the FSF. In his personal time, he likes to program in Haskell and elisp and license all of his programming work under AGPLv3+ (see https://g.ypei.me). He can be reached at id@ypei.org.
Discussion
Notes
Questions and answers
- Q:Is the indexing faster when re-indexing? Would it be too slow to re-index on-demand?
Transcript
[00:00:00.000] Today, I will talk about Haskell code exploration for Emacs. What is Haskell? It is a purely functional language. For example, every value in Haskell is immutable. And it is the main compiler of Haskell, GHC. It provides API for the whole compilation pipeline. For example, the tools mentioned in this talk, including hcel and haddorg, they use, they heavily utilize the GHC front-end API for parsing and understanding the identifiers in Haskell source files. Roughly speaking,
[00:00:31.544] a Haskell program consists of several parts. it begins with some front matters, including, for example, language extensions, which are optional language features one might want to use for convenience. The front matters also contain module exports. So for example, here we define, we declare module F2Md.Config for this Haskell source file, which exports these four identifiers that other source files can use when importing F2Md.Config. And the next will be a block of imports so that we can use libraries and identifiers in these libraries. The bulk of a Haskell source file normally is a list of declarations, including values, types, and instances, and so on. The difference between a value and a type is that the type of a value is a type, and the type of a type is a kind.
[00:01:34.064] For example, here's a small block of Haskell source code. We define Range type from a lower-end integer to a higher-end integer. We also declare a value r of the type Range, which is Range from 2 to 7, because in Haskell, we like to-- by default, functions can be curried, which basically means, by default, we want to utilize the partial application of functions. We don't require parens surrounding arguments
[00:02:17.384] when invoking a function. That makes it possible, if you want, to write Haskell like Lisp by adding a bit of redundant parens. So for example, here are two blocks of code, one Lisp, one Haskell, and they look quite similar to each other. What is a code explorer?
[00:02:38.000] A code explorer is a tool to browse its code base to its code comprehension. Code explorer commonly comes with several functionalities or features, including a cross-referencer, which allows going to definitions of an identifier at points or looking up references of an identifier, like where it is used. So the example in Emacs would be xref. Code explorer also would be able to show you documentation and signatures of identifiers at points. In Emacs, that would be eldoc. It also commonly allows you to search for identifiers. Something like that in Emacs could be describe-function and find-function. Code explorer is normally quite often implemented in two parts, the indexer and the server, where the indexer parses the source code files, indexes the identifiers, and stores the information of identifiers like the definition, size, and the currencies, either in databases or in files. The other part is the server, which uses the database created by the indexer to serve the information of the identifier. Before I present my solution to code exploring,
[00:03:57.104] some description of prior art is in order. There are several tools that you can use to aid code exploration, including tech-based tools like hasktags and hs-tags. The limitation with these tools is they are focused on the current projects only and do not work for cross-packaging reference and definition. Another problem with the tag-based tools is they might not handle symbols with the same name properly. Sometimes they get confused, and they ask you to choose which definition, what is the correct definition site, even though the occurrence of the symbol or the symbol at point has only one definition ambiguously. Another tool is the haskell-mode.
[00:04:58.000] It has some limited support for eldoc by displaying the signature of an identifier at points, but the identifier has to be something that is commonly known or sort of built-in or come from the base library of Haskell. So for example, it works for common functions like head and tail. And you can see that the signature is displayed here. However, it does not work for, let's say, IO. IO is a type. Maybe that's the reason. Let's find another function that's not from the base library. toJSON is from the Aeson library, so no signature is displayed here.
[00:05:47.000] It also provides some sort of goto-declaration functionality to jump to any declaration in a file. To do that, one has to first run haskell-decl-scan-mode to enter this minor mode. Then we can run imenu to go to any definition, to go to any declaration, like getHomeR. Apparently, after running that, we are able to go to definition. So for example, let's see, we want to find definition of getCityJR. And indeed, it works if it's within the same source file, of course. It still does not work for cross-packaging identifiers. So HandlerFor is probably an identifier from servant. Or no, not necessarily servant. Maybe WAI. Anyway, it's another library. And how about find-references?
[00:06:50.504] find-references also works somehow for this file. How about WidgetFor? It works for WidgetFor too. It has some support for goto-definition and find-references. But as usual, it does not support such things cross-package.
[00:07:26.000] And finally, we have the Sledgehammer HLS Haskell language server. It can be used with EGLOT. But the problem with HLS, HLS has many many features because it is a language server, like renaming, like eldoc for standard libraries, and so on. But the problem with HLS is, one, that it is very, very slow. And I wouldn't use it with my laptop. And two, it also does not support cross-package referencing. In fact, there's an outstanding GitHub issue about this. So cross-package referencing and goto-definition is sort of a common shortfall, a common problem for these existing Haskell code explorers.
[00:08:21.000] Then finally, we also have hoogle and hackage. Hoogle is a search engine for Haskell identifiers, and the results link to Hackage, which is the Haskell documentation website for all Haskell libraries. Haskell Hackage has functionality where you can jump to the source code file rendered in HTML, and you can click on the identifiers there to jump to definitions, but it does not support find references, and it is rather basic.
[00:08:59.000] Then I learned about haskell-code-explorer, which is a fully-fledged Haskell code explorer. It is written by someone else. It is a web application for exploring Haskell package codebases. The official reference instance for haskell-code-explorer is available at this URL, which I will demo soon. What I did with these packages... I ported it to GHC 9.2. I renamed it to hcel because I want to focus on Emacs clients rather than JavaScript clients, which I will explain later. And I also wrote an Emacs client package, of course.
[00:09:37.000] This is what haskell-code-explorer looks like. On the homepage, it is a list of indexed packages indexed by the indexer. One can filter it by the package name or look for identifiers directly across all packages. Let's have a look at base. There are three versions. Let's have a look at the latest version, 4.12.0.0. Once entering the package view, you are shown a list of all modules by their path, as well as a tree of these module files. You can filter by module name or file name, or you can search for identifier within the same package or in all packages. Let's say we want to learn about Control.Monad.
[00:10:43.304] Now we are in the module view. The source file is presented to you, and it has links to identifiers. When you hover over them, the documentation shows up, including the signature where it is defined. You can go to its definition or find references. Let's say we want to go to the definition of Monad. It jumps to the definition site of the monad type class. If we click at the definition site, it brings up a list of references. On the left, you can choose which package you want to find references of monad in. Let's look at the random one, avwx. Here is a list of results where Monad is used in avwx. This is a module path. One can go to any of these results. We can search for things in all packages or in the current package. Let’s say I want to search for "Read" I think this is the "Read" that is commonly used in Haskell, the read type class for parsing strings into values. I think that is more or less it. That is the Haskell Code Explorer web application in all its glory.
[00:12:38.304] Let's go back to the slides. That was the web application, which is basically a JavaScript client that talks to the server by sending requests and receiving and parsing the JSON results or JSON responses. Initially, I was interested in hacking the web client. It uses the ember.js web framework. The first thing to do was to npm install ember-cli. It gives me 12 vulnerabilities, 4 low, 2 moderate, 3 high, 3 critical. I don't know how often it is the case when we don't really care about these nasty vulnerabilities from Node.js or npm because they are so common. I don't quite like that.
[00:13:41.144] Another reason for favoring Emacs clients over JavaScript clients is user freedom. Emacs is geared towards user freedom. It allows users maximum freedom to customize or mod Emacs. I think Emacs clients can be a way to fix JavaScript traps, like using user scripts to replace non-free JavaScript. There are tools to do that, for example, like Haketilo. Why write JavaScript replacement if we can write Elisp replacement? If we overwrite all kinds of front-ends in Emacs for commonly-used web applications like Reddit, Hacker News, what have you, then we have an Emacs app store where we can just install these applications and browse the web more freely.
[00:14:51.184] Back to hcel, which is the Emacs client I wrote. I tried to reuse as much of Emacs built-ins as possible, including eldoc, for showing documentation, xref for cross-referencer, compilation-mode for showing search results of identifiers, outline-mode for a hierarchical view of package module identifiers, sort of a cursor-mode for highlighting identifiers, help-mode for displaying quick help for Haskell identifiers, integration with haddorg, which I will mention later, etc. It is available as hcel without the dot on GNU ELPA. Time for a demo.
[00:15:40.184] To start using hc.el, surprise surprise, we run the hcel command. We are presented with a list of packages indexed by the hcel indexer. This is an outline mode, so we can tab to list all the modules represented by the module path. We can further tab into the list of identifiers declared in this module. Now it asks whether you want to open module source. This is because some module source code can be quite large and it can take a bit of time. In this case, the control monad is quite small, so let's say yes. We see the list of identifiers. One can jump to an identifier forever. As you can see, the identifiers at points are highlighted. This can be particularly useful in a large function declaration where you come to see, for example, all the occurrences of an identifier inside the body of the declaration.
[00:16:48.000] These are declarations which in Haskell mode are listed in imenu. We can do the same here in hcel source mode. It lists all the declarations with their signature. Let's say we want to jump to this funny operator. It worked and you can also go back and forth within the declarations by pressing "n" and "p". Similarly, you can do something similar in the outline mode by toggling the follow mode, just like in org-agenda. Let's turn it off.
[00:17:40.224] Now, how about find definition references? Using xref, we can jump to the definition of Int and jump back. Jump to Maybe, jump back. Let's have a look at references of replicateM. There are plenty of them. Maybe we want to check out ghc-lib. Here are all the references and you can of course jump to any of them in the results. Cool. You may have already noticed
[00:18:21.864] the eldoc displaying the documentation and signature of identifiers. For example, here it shows the signature of replicateM, where it is defined, and its documentation. We can bring up the eldoc buffer. In the eldoc buffer, there are also links to other identifiers, which takes you to the definition of these identifiers, like minBound. Apparently, this is not working. I'm pretty sure it maybe works. Let's go to nothing or just... I think those didn't work because the module source for those identifiers is not open.
[00:19:24.144] Of course, you can search for any identifiers across all indexed packages by invoking hcel-global-ids. Let's say we want to search for Read. We are presented with a list of results, which are identifiers starting with Read with capital R. They also show where they are defined and the documentation, just like in eldoc. One can also directly jump to the identifier in the mini-buffer results. For example, we want to check out this Read2 defined in base-4.12.0.0 Data.Functor.Classes There we go.
[00:20:34.000] Another functionality of hcel is the help buffer integration. We can do hcel-help and then let's say we want to learn about the read type class. This is a help buffer and you can jump to other definitions within the help buffer to read the documentation like readsPrec. It says Server version cannot be satistifed. Actual version. This means we need to tell hecl that the server has the correct version. hecl-fetch-server-version. Wait a bit for it to update the knowledge of the server version. Now you can follow the links, Read, readsPrec. You can do the "l" and "r" to navigate within the history. ReadS, ReadP. Just like in the help buffer for elisp code, you can jump to the definition. I believe that is everything, more or less. That concludes the demo.
[00:22:05.000] Now let's turn to haddorg, which is an Org backend for Haddock. Haddock is the documentation generator for Haskell packages. For example, the official Haskell package documentation website Hackage, all the documentation there is generated by Haddock into the HTML format. Haddock has several backends that convert the intermediate representation called interface to various output formats, including HTML, LaTeX, and Hugo. HTML is the main format with a lot of features. LaTeX is less so, and I don't think it is widely used. Let's have a look at an HTML example. This is a PDF because these HTML files can be rather large and slow down EWW significantly. It's faster to convert it to PDF and read it from pdf-tools. Looks like this is as big as it goes. I hope you can still see it. Can I still enlarge it a bit more? Maybe.
[00:23:30.144] This is Servant.Server. It is a module in the servant-server package. It is a widely used package for writing servers. It starts with a heading, which is the name of the module, and the table of contents. Then a heading: Run an wai application from an API. Under this heading, there are all the relevant identifiers that is concerned with running a WAI application from API, including serve, which is one of the main entry points for a Servant.Server. It has a signature linkable to the other identifiers, the documentation, an example with a Haskell source code block. That's what HTML output looks like.
[00:24:31.000] As I mentioned, there are several downsides or drawbacks with that, like the HTML files can be huge and slow down EWW. Also, every module is an HTML of itself, and there's also an HTML for the package with a list of all the modules. Whereas the Org backend is better in that it is much more compact. All the modules under the same package are included in one Org file as sub-headings, level 2 headings. So, servant-server, Servant.Server, that is the module. So basically, this level 2 heading contains all the information in this PDF. Run the WAI application from API, serve. It has a signature that links to other identifiers and the documentation that's also linkable. The Haskell source block is now an Org source block, and you can do all sorts of interesting things with it using org-babel.
[00:25:52.744] Let's check the links as server. Right, so the link works. Application, right, Request. It also supports cross-packaging package linking, so following the link to request takes us from servant-server package Org documentation to the WAI Org documentation.
[00:26:24.784] Another nice thing with Org documentation is that you can use Org functions like org-goto to jump to any identifiers. Let's say we want to jump to application. We have toApplication. So it jumpts to toApplication. I guess application is not an identifier, yes, it is more like a type alias, that's why we couldn't find it. So that is haddorg. And of course, I implemented a bit of integration between haddorg and hcel so that we can jump from one to the other. Let's go back to servant. Let's see, ServerT. Maybe we want to check out the source code definition of ServerT. To find out exactly what sort of type alias it is, like what is the alias (or type synonym) We run hcel-identifier-at-point-- sorry, hcel-haddorg-to-hcel-definition... Oh, we have an HTTP error. Typ ServerT not found in module src/Servant/Server.hs Why? Well, this is because the HCEL server only understands, it only has knowledge of identifiers that is defined in the original source file. So, it is not aware of, say, identifiers that are re-exported in the module. Most likely, Servant.Server module re-exports ServerT from another module. We will probably have better luck looking into some internal modules like this one. Let's try this type class HasContextEntry. So this time it worked.
[00:28:42.000] And, of course, we can go the other direction from hecl to haddorg. Let's say if we want to display named context in the haddorg documentation so that we can read about, other identifiers documentation that is related to named context. We do hecl-identifier-at-point-to-haddorg And it does take us to the server-server old file. Okay. And that concludes my presentation. You can find hecl in GNU Elpa, and you can also find the source code, as well as the source of haddorg and instructions on how to generate org documentation using haddorg in my cgit instance. Thank you for your attention. I hope you enjoy the rest of the conference. Thank you.
Captioner: anush
Questions or comments? Please e-mail id@ypei.org