REPLs in strange places: Lua, LaTeX, LPeg, LPegRex, TikZ
Eduardo Ochs - IRC: edrx, edrx http://anggtwu.net/, eduardoochs@gmail.com
Format: 60-min talk; Q&A: IRC
Status: Q&A to be extracted from the room recordings
Duration: 59:10 minutes00:00.000 Intro 00:21.560 Diagrams 01:03.320 eev 02:51.360 Another figure 08:52.560 eev-wconfig, magic, and black boxes 10:44.240 Lua 16:10.960 Object orientation in Lua 19:19.823 My init file 20:31.000 LaTeX and LuaLaTeX 25:28.280 Manim 26:30.880 Generating diagrams from REPLs 31:03.240 Parsers 39:03.200 ELpeg1.lua 50:04.160 Building lists
Description
Many years ago, when I started programming, my dream was to write games. I failed miserably in that, but I became fascinated by languages, and I discovered Forth - that was perfect for writing languages whose syntax was as simple as possible. Then I switched to GNU/Linux and I had a series of favorite languages; at some point I discovered Lua, that became not only my favorite language but also my favorite tool for implementing other languages. One of the main libraries of Lua is something called LPeg, that lets "people" implement complex parsers in just a few lines of code using PEGs - Parsing Expression Grammars.
I've put the "people" in the last paragraph between quotes because for many years I wasn't included in the "people who can implement complex parsers with LPeg"… lots of things in LPeg didn't make sense to me, and I couldn't visualize how it worked. Something was missing - some diagrams, maybe?
The main tool for drawing diagrams in LaTeX is something called TikZ, that is HUGE - its manual has over 1000 pages. TikZ includes lots of libraries and extensions, and each one of these libraries and extensions extends TikZ's core language with some extra constructs.
I don't know anyone - except for a handful of experts - who knows what is the "core language" of Tikz, that lies, or that should lie, below all these extensions… all of my friends who use TikZ are just "users" of TikZ - they've learned some parts of TikZ by starting with exemples, and by then modifying these examples mostly by trial and error. In particular, no one among my friends knows how styles in TikZ really work; styles are implemented using "keys", that are hard to inspect from a running TeX - see [1] - and I found the chapter on "key management" in the manual very hard to understand. It feels as if something is missing from it… some diagrams, maybe?
In my day job I am a mathematician. I work in a federal university in Brazil, and besides teaching I do some research - mostly in areas in which the papers and theses have lots of diagrams, of many different kinds, and in which people use zillions of different programs to draw their diagrams. Every time that I see those diagrams I think "wow, I need to learn how to draw diagrams like that!", but until a few months ago this seemed to be impossible, or very hard, or very painful…
This presentation will be about a point in which all these ideas intersect. I am the author of an Emacs package called eev, that encourages using REPLs in a certain way; Lua can be used in several different styles, and if we use it in a certain way that most people hate - with lots of globals, and with an implementation of OO that makes everything inspectable and modifiable - then it becomes very REPL-friendly; there is an extension of LPeg called LPegRex ([2], [3]), that I found promising but hard to use, so I rewrote some parts of it to make them more REPL-friendly, and to make it print its ASTs in 2D ASCII art. The core of my presentation will be about how I am using REPLs written in Lua to write grammars, parsers, and tools to generate many kinds of diagrams, and how I am using these diagrams to document both my own code and other people's programs - the parts of them in which some diagrams seem to be missing. My hope is that people will find these ideas easy to port to other languages besides Lua, to other tools for generating diagrams besides LaTeX - SVG, maybe? - and to other ways to use REPLs in Emacs besides eev. Some ideas in this presentation were inspired by the blog post [4].
[1] https://tex.stackexchange.com/questions/663740/alternative-to-edef-c-pgfkeys-a [2] https://github.com/edubart/lpegrex [3] https://github.com/edubart/lpegrex/blob/main/parsers/lua.lua [4] https://ianthehenry.com/posts/my-kind-of-repl/
About the speaker:
I am this person here: http://anggtwu.net/eepitch.html
Discussion
Questions and answers
- Q:if you had to summarize what you where trying to say in 3
sentences or less, what would you say?
- A: Ouch! I would answer with a link... this one: http://anggtwu.net/eev-for-longtime-emacs-users.html#summarize-in-one-paragraph
Notes
- Magic is good as long as you have the option to look behind the scenes when you want! Imagine if all code was assembly language.
hi edrx! =) great talk
A :I didn't create a git repo with the code yet because I don't have any idea if anyone would want to test it today... everything is made to be used with this interface, http://anggtwu.net/eepitch.html
Q: is the code available as a tarball perhaps? or not at all yet?
- as I know very few people who use eev - and who already use that interface - I wanted to ask them if they'd be ok with installing some files in ~/LUA/ and ~/LATEX/, or if they really needed to use other directories, or what... the things that are to be installed in ~/LUA/ are in a tarball, but a few of the files require some files in ~/LATEX/. I'm preparing the LATEX/ directory of the tarball now and I'll announce it on the eev mailing list soon
- Dealing with diagrams with Emacs is tricky. Having documented examples of that is nice and would be helpful
- A: I guess that the ideas that I presented would be easy to adapt to SVG diagrams, and to some packages that use Javascript to generate their diagram... but I don't want to write the code for SVG and for js diagrams all by myself. what do you use - or what have tried to use - to generate diagrams?
- i've personally tried using a bunch of different tools but never found anything that fully clicked for me or was remotely pleasant to use. i guess 'draw.io' is decent, but something in Emacs would be awesome
- do you think that musa's way to making emacs run javascript could work for draw.io?
- hmm no clue tbh, worth trying to ask him. but i must say i'm not super enthused about embedding js in emacs
- having tried most things (from exwm to org-protocol, to devtools debug protocol, and what not), I've converged on small personal extension that loads across browsers, locally, and stay connected with Emacs via the very useful emacs-websocket package, to interact both with the internal state of the browser (windows, tabs, etc.) and intra-page
- hmm no clue tbh, worth trying to ask him. but i must say i'm not super enthused about embedding js in emacs
- do you think that musa's way to making emacs run javascript could work for draw.io?
- i've personally tried using a bunch of different tools but never found anything that fully clicked for me or was remotely pleasant to use. i guess 'draw.io' is decent, but something in Emacs would be awesome
- A: I guess that the ideas that I presented would be easy to adapt to SVG diagrams, and to some packages that use Javascript to generate their diagram... but I don't want to write the code for SVG and for js diagrams all by myself. what do you use - or what have tried to use - to generate diagrams?
Transcript
[00:00:00.000] Intro
Hi! My name is Eduardo Ochs and the title of this talk is: REPLs in strange places - Lua, LateX, LPeg, LPegRex, and TikZ. I'm the author of an Emacs package called eev, and this is a talk at the EmacsConf 2023, that is happening in December 2023, at the internets.
[00:00:22.520] Diagrams
This is one of the examples of diagrams that we are going to see - let me show how I generate it... one second, I have to use a smaller font here... this is a file called ParseTree2.lua... let me go back to this block of tests again... and now if I run this... we get these outputs here at the right, and then in this line here it generates a PDF, and if I type f8 here it shows the PDF in the lower right window.
[00:01:03.920] eev
Let me start by explaining briefly what is eev. First: it is something that appeared by accident in the mid-90s - I explained this story in my presentation at the EmacsConf 2019... it's a package... it's an Emacs package that is part of ELPA... it has at least 10 users - those are the ones that I know by name... eev means `emacs-execute-verbosely'... eev is something that treats eval-last-sexp as the central feature of Emacs... eev blurs the distinction between programmers and users, and it replaces the slogan "users should not be forced to see Lisp", that is something that Richard Stallman told me once, by "users should see Lisp instead of buttons" and "new users should see Lisp in the first 5 minutes"... I'm going to show some examples of that soon. Eev uses code in comments a lot, and also tests in comments... I changed my way of presenting it and it became very REPL-centric in the last few years, in the sense that I start by explaining its main features by its support for REPLs... eev supposes that we want to keep executable notes of everything - I'm also going to show examples of this in a second... eev has lots of "videos for people who hate videos", and it tries to do everything with very little magic and without black boxes - I'm going to explain many of these things very soon.
[00:02:50.320] Another figure
This is a figure that that I'm going to show in details soon, that is about something important about Lua... the font is very bad now, so let me change the font... the figure is this one... and... what most people do when they visit a file with something interesting on it is that they just go there and they set a bookmark there, or they put the position in a register... but I prefer to keep links to everything that is interesting as elisp hyperlinks. So, for example, this is an elisp hyperlink to a file, that goes to this anchor here, and to this string after this anchor... this is a variant that opens that file in the window at the right - here... and this is a sexp that changes the font. I have a command with a very short name that does that, but I prefer to keep that as a one-liner. About the videos... we can see the list of first-class videos of eev by executing this, M-x find-1stclassvideos, or by running this alias here, M-x 1c... and then what we see is this... the first sexp here regenerates this buffer - so we can make a mess here and then run this and the original buffer is regenerated again in a clean way... each of these things here opens a buffer with information about a video... let me take a specific example here... this video here is about one of the ancestors of this talk, that is a library that I wrote for creating diagrams in LaTeX using a package called Pict2e using REPLs... anyway... the thing is that if we run a sexp like this one and we don't have a local copy of the video eev will try to download to the local copy - and instead of doing that by asking something like "do you want me to download the local copy? Blah blah blah blah blah..." it simply opens a buffer like this, I mean, if we don't have a local copy yet it will open a buffer like this one, in which these things here in comments are links to the documentation... I mean, this thing here explains the idea of local copies of files from the internet... there are more details here, and here... and this is a script that we can execute line by line, so instead of this script being hidden behind the button that we just press after a question like "Do you want me to do something blah blah blah? Yes or no?" the script is visible here and we can execute it step by step... it creates a terminal with a shell here in the right window, and when we type f8 in one of these lines here the lines are sent... (...) so this is going to download a copy of the video... the wget says that I already have a copy of the video and its subtitles... and so on. And after getting a copy of the video we can run this sexp here and it displays the video. I said that eev has lots of "videos for people who hate videos", and the idea is that very few people are going to watch the videos in real time... and most of the people that I know - or: most of the people that are interested in eev in some way... they are going to watch just small sections of the video, and most of the time they're just going to read the subtitles of the video. So, for each one of the videos we have a page about the video... let me see if I have internet here... yes. This is a page... and usually these pages have a link to another page that has all the subtitles of the video... uh, wherever... in this one it's not so visible... but anyway, there are several ways of accessing the subtitles of the video, and one of the ways is by running this sexp here, that opens a file in Lua that is what I use to generate the subtitles. Anyway... by the way, these things... each one of these things here is a hyperlink to a position of the video, so if I type this the right way it goes to that position. Anyway, let me go back... also, the tutorials of eev... the "intros" of eev, that start with "find-" and end with "-intro", they have lots of blocks that say "[Video links:]", like this one, and these blocks have links to positions in videos, and if we don't have a local copy of the video yet the thing shows us a script that lets us download the local copy. Anyway, I said that I was going to explain what I mean by "magic" and "black boxes". this is something that I've been trying to explain for a long time, and I think that I got a very good explanation about that in a video that I made about something called eev-wconfig, that is a tool for configuring eev on Windows without "magic" - without buttons that do things without explaining what they're doing. This is a part of the subtitles of the video, let me read that... eev-wconfig is an attempt to solve the problem of how to install these things on Windows both without magic and with very little magic. Remember this slogan: "any sufficiently advanced technology is indistinguishable from magic". Here in this video I'm going to use the term magic as a shorthand for sufficiently advanced technology, that is something that is complex and non-obvious and that is indistinguishable from magic in the sense of being almost impossible to understand. And I'm also going to use a the term "black box" as a near-synonym for magic, and sometimes the term "black box" is more convenient even though it's a bit longer - it has more letters - because when I use the term black box it invites us to use expressions like "opening the black box", and I'm going to use that expression a lot. Now let me try to explain what is... sorry, let me change the font... what is Lua. Lua is a minimalistic language, in the sense of "batteries not included"... it uses associative tables for most of its data structures... and it is so minimalistic that its default print function, when we tell... when we create an associative table and we ask it to print... when we ask "print" to print an associative table it just prints the address of the table. Here are some examples... here is a table, and when we ask "print" to print it it just says that it's the table at this address here. So, one of things that that most people do when they start using Lua is that either they download a package with a pretty-printing function or they write their own pretty-printing functions. My own pretty-printing function is called PP, with upper case letters, and it works like this... and it prints associative tables in a way like this. It says that for the key 1 the the value associated to it is 2, for the key 2 the value is 3, and for the key 3 the value is 5. When I started using Lua one of my favorite languages was also a language that used associative tables a lot - it was called Icon... and I had to write my own pretty-printing functions for Icon, so I just had to port my pretty-printing functions to Lua... and my first version looked at something like this... it just had some some global functions... lots of them, actually... and after a while I rewrote it, and I rewrote it again, and again, and again, and this is one of the versions of that, is not even the default at this point... "Tos" is for "to string"... and this is a demo... it's very modular, so it's easy to replace parts of it, or to toggle flags... and this is an example. If I try to print the table of methods for a certain class... I will need a smaller font... it prints the table like this, with the names of the methods and then links to the source code of the functions... these links only make sense in Emacs and in eev... and when we run a link like this one... it shows the source code in the window at the right. So, for some functions the source code is three lines, for other ones it's one line... and whatever. Anyway, let me go back... Lua can be used in many different styles... most people hate other people's styles... when I started using it in the year 2000 I learned most of the basic language in a single day - it was very similar to things that I was already using... and then I rewrote the the mini- language that I was using to generate the HTML for my pages in Lua... actually I had to rewrite it many times, but the first version I certainly did in my first weeks or first months using Lua... In the beginning I was just using it for writing programs that either didn't take any input at all - because the input was already in the source file - or that worked as Unix programs, that would read files and process these files in some way and output something. I mentioned the "basic language" here... I only learned how to use closures, metatables, and coroutines many years later... in the beginning, when I started using Lua, it didn't have a package manager... it appeared later, it is called Luarocks... it has had this package manager for several years, most of the rocks for Luarocks are poorly documented and hacker-unfriendly, so you can't rely just on the documentation and you can't rely just on the source code, because, I mean... if you are a genius of course you can, but for people who are either lazy, or dumb, or whatever, like me, or unfocused... the source code is hard to understand and hard to tinker with. Some rocks are excellent. The best rocks are well documented but they are hacker-unfriendly in a sense that I hope that I'll be able to explain soon. The best rocks use local variables and metatables a lot - so if you are beginner learning Lua you're not going to understand what their source code do... they use lots of dirty tricks.
[00:16:08.520] Object orientation in Lua
Let me talk a bit about object orientation in Lua. It can be done in many ways... the main book about Lua, called "Programming in Lua", by one of the authors of the language, Roberto Ierusalimschy, presents several ways of doing object orientation in Lua... I hated all of these ways - and also the ways that I tried from the rocks. And then I wrote my own way of doing object orientation in Lua... it's very minimalistic, it's in this file here, eoo.lua... the main code is just this five lines here... and here's an example of how it works. Here we define the class Vector, with some metamethods... this metamethod here will tell Lua what to do when the user asks to add two vectors, this one here tells Lua what to do when the user asks Lua to convert a vector to a string, and... whatever, this one is something that I'm going to explain in a second. So, here we create a vector with these coordinates, 3 and 4... here we create another Vector... if we "print" here then Lua uses this function here, in the tostring... if we add the two vectors it uses this function here, in the add metamethod, and if we run the method :norm... it is defined here, in the table __index. Anyway... Even this thing being so small I used to forget how its innards worked all the time. Actually I always forget how things work and I have to remember them somehow... and I have to have tricks for remembering, and tricks for summarizing things, and diagrams, and so on. And every time that I forgot how this thing worked I went back to the source code, and then I looked at the diagrams... or, of course, in the first times I had to draw the diagrams... and I run the examples, and of course in in the beginning I thought that the code was clear and my examples were very brief, and so I had to rewrite the examples many times until they became, let's say... perfect. I was saying that Lua can be used in many ways, and in my way of using Lua - in my favorite way - everything can be inspected and modified from REPLs, like we can do in Emacs and in SmallTalk, or sort of. So, in my favorite way of using Lua there's no security at all, everything can be changed at all times. Of course most people hate that...
[00:19:19.120] My init file
My init file has lots of classes... by the way, instead of keeping many small files with many things I put lots of stuff in just one big init file. My init file has lots of classes, and lots of global functions, and lots of cruft - and people hate that, of course. This is an example... this is the index at the top of my init file, the classes start here, and then we have some functions, and then we have functions that load certain packages, and then we have... cruft. Whatever. Most people think that my style of using Lua is dirty, and dangerous... and they wouldn't touch my Lua code with a 10 feet pole... but most of the things that I'm going to present here in this presentation are ideas that should be easy to port to other environments and other languages, especially the diagrams... so the code is not so important.
[00:20:35.280] LaTeX and LuaLaTeX
Now let me talk a bit about LuaLaTeX, that is LaTeX with a Lua interpreter embedded inside, and two ways of generating pictures in LaTeX: TikZ, that is very famous, and Pict2e, that is not very famous and that is very low level... and I think that not many people use it. I said before that when I learned Lua I realized that it was very good for writing little languages. I was doing my PhD at the time and typesetting the diagrams for my PhD thesis was very boring, so one of the things that I did was that I created a little language for typesetting the diagrams for me. it was called Dednat because initially it only generated diagrams for Natural Deduction, and then it had several versions... these are the slides for my presentation about Dednat6... "Dednat6 is an extensible semi-preprocessor for LuaLaTeX that understands diagrams in ASCII art"... in the sense that when I have a .tex file that has this, and when Dednat6 is loaded, when I give the right commands Dednat6 interprets this block here as something that defines this diagram... oops, sorry, it interprets this diagram here, this diagram in comments here, as something that defines a diagram called foo... a deduction called foo, and it generates this code here... so that we can just invoke the definition of the deduction by typing \ded{foo}. And Dednat6 also supports another language for typesetting bidimensional diagrams with arrows and stuff for category Theory and blah blah blah... the specifications of these diagrams look like this... here is a... sorry, here is a very good example, this is a huge diagram... sorry, one second... so, the source code that generates this diagram here is just this thing at the left, so it's very visual... we can typeset the diagram in ASCII art here and then in this part here we tell how the nodes are to be joined, which arrows have to to have annotations, and so on... and this language is extensible in the sense that... uh, where's that... here: comments that start with "%:" are interpreted as definitions for tree diagrams, lines that start with "%D" define 2D diagrams with arrows and stuff, and lines that start with "%L" contain blocks of Lua code that we can use to extend the interpreter on-the-fly... anyway, here are some recent examples of diagrams that I used Dednat6 to typeset... this diagram here was generated by this specification here... and this diagram here with the curved arrows was generated by this specification here. So, Dednat6 was very easy to extend, and at some point I started to use it to generate diagrams using Pict2e - mainly for the classes that I give at the University... I teach mathematics and whatever... in a bad place. Whatever... Let me show an animation... here is a diagram that I generated with Dednat6, and it is a flip book animation, like... we type PgUp and PgDn and we go to the next page of the book and to the previous page of the book... and here is the source code that generates that. This source code is not very visual, so it's quite clumsy to edit that diagram directly in the .tex file like that...
[00:25:28.080] Manim
These diagrams were inspired by something called my Manim, that... I forgot the name of the guy, but it's a guy that makes many videos about Mathematics, and he created this library called Manim for generating his animations, and other people adapted his library to make it more accessible... I tried to learn it, but each animation, even an animation that has very few frames... each animation took ages to render, so it wasn't fun... and animations in PDFs can be rendered in seconds. So these things were fun for me, because my laptop is very very slow, and my Manim was not fun.
[00:26:24.360] Generating diagrams from REPLs
Anyway, writing code like this inside a .tex file was not very fun because it was hard to debug... so in 2022 I started to play with ways of generating these diagrams from REPLs, and I found a way for Pict2e and a way for TikZ... each one of these ways became a video... if you go to the list of first-class videos of eev you're going to see that there's a video about Pict2e here here and a video about TikZ... here you have some some information like length, an explanation, etc... and here are the pages for these videos. My page about the video about Pict2e looks like this, it has some diagrams... whatever... and this one is much nicer, and a lot of people watched that video... I mean, I think that 250 people watched it - for me that's a million of people... and this video is about how to extract diagrams from the manual... from the TikZ manual and how to run those examples in a REPL and modify them bit by bit... this is a a screenshot... but let me go back. At that point these things were just prototypes, the code was not very nice... and in this year I wrote... I was able to unify those two ways of generating PDFs, the one for TikZ and the one for Pict2e, and I unified them with many other things that generated diagrams. The basis of these things is something called Show2.lua... I'm not going to show its details now, but its extension that generates TikZ code is just this, so we can specify a diagram with just a block like this, and then uh if we run :show00() it returns a string that is just the body... the inner body of the .tex file, if we run this we see the whole .tex file, and if we run this we save the .tex file and we compile the .tex file to generate a PDF... and if we run this we show the PDF in the lower right window. And that's the same thing for all my recent programs that generate PDFs - they are all integrated... here is the one that... the basis for all my modules that generate diagrams with Pict2e... its demos are not very interesting, so let me show some demos of extensions that do interesting things... so, this is a diagram that I created by editing it in a REPL... I create several Pict objects here... and if I execute this it compiles an object, generates a PDF, and if I tap this... here is the PDF. And if I just ask Lua to display what is "pux", here, it shows the source code in Pict2e of the diagram... and the nice thing is that it is indented, so it's easy to debug the Pict2e code. If anyone is interested the module that does the tricks for indentation is very easy to understand... it has lots of tests and test blocks, and I think that its data structures are easy to understand. Anyway... here is another example. The :show() is here... it generates a 3D diagram.
[00:30:56.440] Parsers
Now let me talk about parsers and REPLs in VERY strange places... I mean, using REPLs to build parsers step by step and" replacing parts by more complex parts. So, I said that Lua is very minimalistic, and everybody knows that implementations of regular expressions are big and complex.. so, instead of coming with full regular expressions Lua comes with something called "patterns" and a library function called "string.match". Here is a copy of the part of the manual that explains the syntax... a part of the syntax of of patterns... here's how string.match is described in the manual - it's just this... "looks for the first match of pattern in the string as blah blah blah"... and then we have to go to the other section of the menual that explains patterns. Lua patterns are so simple, so limited, that they don't even have the the alternation operator... here is how it is described in the elisp manual - backslash-pipe specifies an alternative, blah blah blah. When we want to to build more complex... regular expressions, patterns, grammars, etc... we have to use an external library for that... no, sorry, a library that is external but that was written by one of the authors of Lua itself. This library is called Lpeg, and its manual says... "Lpeg is a new pattern matching library for Lua based on Parsing Expression Grammars (PEGs)". The manual is very terse, I found it incredibly hard to read... it doesn't have any diagrams - it has some examples, though... and the Lua Wiki has a big page called Lpeg Tutorial with lots of examples... but it it also doesn't have diagrams and I found some things incredibly hard to understand. For example, this is something that is in the the manual of Lpeg that I saw and I thought: "Wow, great! This makes all sense and is going to be very useful!"... it's a way to to build grammars that can be recursive, and they sort of can encode BNF grammars... we just have to translate the BNF a bit to get rid of some recursions and to translate them to something else. And the manual also has some things that I thought: "Oh, no! I don't have any idea of what this thing does"... and in fact I saw these things for the first time more than 10 years ago and they only started to make sense one year ago. One example is group captures. Lpeg also comes with a module called the Re module... let me pronounce as it in Portuguese - the Re module... its manual says: "The Re module (provided by the file re.lua in the distribution) supports a somewhat conventional regular expression syntax for pattern usage within lpeg"... and this is a quick reference... this thing is very brief, it has some nice examples but it's hard to understand anyway... and here are some comments about my attempts to learn Re.lua. This is a class... in this case it's a very small class... this file implements a :pm() method - I'm going to show examples of other :pm() methods very soon - so, this is a :pm() method for Re.lua that lets us compare the syntax of Lua patterns, Lpeg, and Re... let's see this example here... so, if we run this it loads my version of lpeg... no, sorry, my version of lpegrex... and it shows that when we apply the :pm() method to this Lua pattern, this lpeg pattern, and this Re pattern they all give the same results. So we can use this thing... this kind of thing here to show how to translate from Lua patterns, that are familiar because they're similar to regular expressions, only weaker... to lpeg, that is super weird and to Re, that is not so weird. Anyway, the comment says that in 2012 I had a project that needed a precedence passer that could parse arithmetical expressions with the right precedences... and at that point I was still struggling with pure lpeg, and I couldn't do much with it, so I tried to learn Re.lua instead, and I wrote this old class here... that allowed me to use a preprocessor on patterns for Lua. And the thing is that with this preprocessor I could specify precedence grammars using this thing here, that worked, but was super clumsy... and I gave up after a few attempts. and in 2022 I heard about something called lpegrex, that was a... a kind of extension or Re, and it was much more powerful than re.lua, but after a while I realized that it had the same defects as re.lua... and let me explain that, because it has all to do with the things about black boxes and magic that I told in the beginning. Both... I mean, sorry, neither re.lua or lpegrex had some features that I needed... they didn't let us explore... sorry, they received a pattern that was specified as a string, and it converted that into an lpeg pattern, but it didn't let us explore the the lpeg patterns that it generated... their code was written in a way that was REPL-unfriendly - I couldn't modify parts of the code bit by bit in a REPL and try to change the code without changing the original file... the code was very hard to explore, to hack, and to extend - in my opinion... the documentation was not very clear... and I sent one or two messages to the the developer of lpegrex and... he was too busy to help me. He answered it very briefly, and, uh, to be honest I felt... rejected. I felt that I wasn't doing anything interesting... whatever, whatever... So, in 2022 I was trying to learn lpegrex because I was thinking that it would solve my problems - but it didn't... it didn't have the features that I needed, it was hard to extend, hard to explore, and hard to debug, and I decided to rewrite it in a more hacker-friendly way - in the sense that... was modular, and I could replace any part of the module from a REPL...
[00:39:35.400] ELpeg1.lua
My version of it was called ELpeg1.lua... and I decided that in my version I wouldn't have the part that receives a grammar specified as a string and converts that to lpeg... I would just have the backend part, that are the functions in lpeg that let us specify powerful grammars. Let me go back. Let me explain a bit about lpeg... Lua has coercions: the + expects to receive true numbers, and if one of its arguments, or both of them, are strings, it converts the string... the strings to numbers so in this case here, 2+"3", it returns the number 5, and this is the concatenation operator... it expects to receive strings, so in this case it will convert the number 2 to the string "2", and the concatenation of thes two things will be 23... oops, sorry, "23" as a string. Lpeg also has some coercions. I usually set these globals to let me write my grammars in a very compact way, so instead of lpeg.B, lpeg.C, etc I use these globals, like uppercase B, uppercase C, and so on... and with these globals I can write things like this: C(1)*"_"... and lpeg knows that lpeg.C... it sort of expands this to lpeg.C, but lpeg.C expects to receive an lpeg pattern, and 1 is not yet an lpeg pattern, so it is coerced into an lpeg pattern by calling lpeg.P, so this short thing here becomes equivalent to lpeg.C(lpeg.P(1)), and the multiplication, when at least one of its arguments is an lpeg pattern... it expects to receive two lpeg patterns, and in this case the one at the right is just a string, so it is coerced to an lpeg pattern by using lpeg.P. With this idea we can sort of understand the comparison here. I mean, let me run it again... this first part is very similar to a regular expression here at the left... and when we apply this... Lua pattern to this subject here the result is this thing here, this thing, this thing and this thing... I'm going to call each one of these results "captures", so each of these things between parentheses "captures" a substring of the original string and these captured substrings are returned in a certain order. Here is how to express the same thing in lpeg... it's very cryptic but it's a good way to understand the some basic operators of lpeg, I mean we can look at the manual and understand and what C, S and R do, and also exponentiation... and this strange thing here receives this string here, runs a function that I have defined, that converts it to an object of a certain class, and that class represents Re patterns, so this thing is treated as a pattern for re.lua, and it is matched against the string, and it returns the same thing as the other one. Also, this thing here also has a comparison with lpegrex, but these patterns are very trivial, they don't do anything very strange... so let's go back and see what kinds of very strange things there are. Here is the page of lpegrex at github, here's the documentation... it's relatively brief, it explains lpegrex as being an extension of Re.lua, so it explains mainly the additional features... here is a quick reference that explains only the additional features... some of the these things I was able to understand by struggling a lot, and some I wasn't able to even by spending several evenings try to to build examples... and this is something very nice. Lpegrex comes with some example parsers... and here is a parser that parses the Lua grammar - I mean, this is the the grammar for Lua 5.4 at the end of the reference manual... it's just this... this is in a kind of BNF, and this is the BNF translated to the language of lpegrex, so this thing uses many constructions that are in re.lua and some extra constructions that are described here... and with these examples I was able to to understand some of the... of these things here that are described here in the quick reference - but not all. So, I wasn't able to use lpegrex by itself, because some things didn't make much sense, and I decided to reimplement it in my own style, because that would be a way to map... to at the very least map what I had understood and what I didn't, learn one feature at a time, do comparisons, and so on. Here I pointed to two features of lpeg... in one I said "Oh, great! This thing can be used to to define grammars, even recursive grammars", and so on... and this is an "Oh, no!" feature - one thing that didn't make any sense at all... group captures. One thing that I did to understand group captures was to represent them as diagrams. Of course in the beginning I was drawing these diagrams by hand, but then I realized that I could use the bits of lpeg that I already knew to build a grammar that would parse a little language and generate these diagrams in LaTeX, and I was able to make this. In this diagram here this thing above the arrow is Lua code... a piece of Lua code that specifies an lpeg pattern... this thing here at the top is the string that is being matched, and the things below the underbraces are the captures that each thing... sorry, that each thing captures. For example, this underbrace here corresponds to this pattern here, that parses a single character but doesn't return any captures, this thing here parses a single "b" and doesn't return any captures, this thing here parses a single character and captures it, and this thing here parses the character "d" and captures it... and this other thing here transforms this pattern into another pattern... returns first a capture with all the string that was parsed by this pattern here, and then all the captures returned by this thing here before the ":". So, this was a way to build concrete examples for things that the lpag manual was explaining in a very terse way, and it worked for me - some things that were very mysterious started to make sense, and I started to have intelligent questions to ask in the mailing list. And with that I was able to understand what are group captures, and group captures that receive a name... Well, let me explain what this does. This thing here captures... sorry, parses the empty string and returns this as a constant... so, this is something that doesn't exist in regular expressions... it parses nothing and returns this as a capture... then this thing here returns these two constants here, and parses the empty string, and this thing here converts the results of this thing here into a group capture, and stores it in the label "d"... and then here's another constant capture.
[00:50:03.720] Building lists
And I realized that these things here were similar to how Lua specifies building lists... when we build... sorry, tables. When we build a table, and we say that the first element of the table is here, this element is put at the end of the table... when after the that would say d=42... we are putting the 42 in the the slot whose key is "d". This was happening with lpeg captures, but there was something very strange... these group captures could hold more than one capture - more than one value... so there was something between lists and tables. I started to use this notation to... explain in my notation what they were doing... many things started to make sense, many mysterious sentences in the manual started to make sense... but some didn't... but at least I was able to send some intelligent questions to the mailing lis,t and the author of Lua and lpeg answered some of them... he was not very happy about my questions - he... told me that those diagrams were a waste of time, the manual was perfectly clear, and so on... whatever - but I was able to... so, it was weird, but I was able to understand lots of things from his answers. This is a copy of one of my messages, then there's another one, another one, some of them had diagrams... then he complained about these diagrams, he said that these things here, that look like table constructors, "do not exist"... whatever... anyway, once I understood group captures many features were very easy to understand and I started to be able to use lpeg to to build some very interesting things... I was able to reproduce some of the features that I saw in lpegrex - remember that this... where is that? this is the syntax of Lua... here - I was able to understand how these things here were translated to lpeg code... to lpeg patterns by using group captures in a certain way... I was able to implement them in ELpeg1.lua... and after some time I was able to use ELpeg1.lua to build grammars that were able to parse arithmetical expressions with the right precedence... and here's an example in which I built the grammar step by step... and I test the current grammar, and I replace a bit, and then I test the new grammar and so on... and you can see that the result is always a tree that is drawn in a nice two dimensional way... At this point these powers here are returned as a list, as an operation "pow" with several arguments, here... and then I apply a kind of parsing combinator, here... that transforms these trees into other trees and with these combinators here I can specify that the "^" is associative in a certain direction... that the "/" is associative in another direction... the "-" uses the same direction as a the "/", and so on... and they have the right precedences. So, here are the tests... here is my file ELpeg1.lua... it has several classes, each class has tests after it... I was able to implement something that lpegrex has, that is called "keywords", that is very useful for parsing programs in programming languages... I was able to implement something similar to the debugger... to the lpeg debugger lpeg uses... I was frustrated by some limitations of the lpeg debugger, and I implemented my own that is, uh... much better!... Let me show something else... I was able to translate a good part of the Lua parser, here, to ELpeg1.lua... I haven't finished yet, but I have most of the the translation here... and after having all that I was able to build other grammars very quickly... writing new parsers finally became fun. And here's one example that I showed in the beginning. If I remember correctly... I took a figure from the Wikipedia... I don't have its link now... but I specified a grammar that parses exactly the example that appears in the Wikipedia... so, with my grammar, considering that the top level entry is "Stmt", when I parse this string here the result is this tree... and I can do some operations on that, I can define how this thing is to be converted into LaTeX, I can define other operations that convert trees into other trees, and here are some tests of these operations... This is what I showed in the beginning... I'm not going to explain all the details of this thing now... this :show() converts this thing into LaTeX in the way specified by these instructions here, that says that... well, whatever... and here's the result - the LaTeXed result... and these diagrams here are generated by this file here, that defines a simple grammar that parses this thing here, and then LaTeXes it in a certain way, and and also tests to check if this code here... this Lua code that generates an lpeg grammar... parses this subject here and returns the expected result... So: this is the code that I wanted to show. I wanted to show many more things but I wasn't able to prepare them before the conference... and I hope that soon - for some value of "soon" - I'll be able to create REPL-based tutorials for lpeg, Re, and ELpeg1.lua... where lpeg is something very famous, Re is a module of lpeg... I could also do something like this for lpegrex... and ELpeg1.lua is the thing that I wrote, the one that has test in comments, and the tests usually generate trees, and sometimes they generate TeX code. Yeah, so that's it! I wanted to present much more but I wasn't able to prepare it... so: sorry, thanks, bye! =)
Questions or comments? Please e-mail eduardoochs@gmail.com