Back to the talks Previous by track: So you want to be an Emacs-fluencer? Next by track: Gypsum: my clone of Emacs and ELisp written in Scheme Track: General

Transducers: finally, ergonomic data processing for Emacs!

Colin Woodbury (he) - https://x.com/@fosskers , @fosskers@m.fosskers.ca on Mastodon, https://www.fosskers.ca

Format: 27-min talk ; Q&A: BigBlueButton conference room
Etherpad: https://pad.emacsconf.org/2024-transducers
Status: TO_CAPTION_QA

Talk

00:00.000 Intro 00:41.520 What are transducers? 03:27.590 Common issues 05:47.280 Transducers 07:35.280 Using transducers 09:52.625 A more involved example with comp 11:49.333 In Emacs 14:29.469 Hash tables 14:58.040 Clarity 15:55.800 How do transducers work? 20:00.520 Transducers in the wild - CSV 26:03.240 Issues and next steps

Duration: 26:51 minutes

Q&A

01:09.920 Q: When I tried comparing transducers.el to cl-lib and dash (benchmark-compiled), I got the following results 05:40.840 Q: Do you know of any theoretical texts on transducers? 07:04.720 Q: Did you think about [compiler features, macros] viz your cl, fennel, elisp, porting of your transducers? 08:16.579 Q: Does t-buffer-read provide a lazy stream that\'s linewise, or charwise, or do something else entirely? 09:09.424 Q: Can the Elisp library be combined with the stream.el API or seq in general? 11:47.543 Q: How does one debug a t-comp expression? Can you single step and see intermediate results of the different statements you declare? 14:42.495 Q: Is there a path for transducers to enable elisp processing of otherwise overly large datasets as if just normal Emacs \"buffers\" (i.e. just pulling one thing at a time so essentially stream-like under the hood but buffer-like in interface), with none of the usual perf issues with a traditional buffer structure? 16:51.200 Q: Is there an option to read a csv/json and produce an alist or plist instead of a hash table for an entry? 17:50.520 Q: Is the common lisp version ready for 'production' use? Is it complete enough and the API stable enough? 18:17.477 Q: Do we need a pre-written \"t-\" version for every already existing reducing function like + or is there a function to construct them from already defined reducer 2-arg functions? 20:26.320 Q: Is the compelling argument for transducers is that it's a better abstraction?

Listen to just the audio:
Duration: 25:24 minutes

Description

Transducers are an ergonomic and extremely memory-efficient way to process a data source. Here "data source" means simple collections like Lists or Vectors, but also potentially large files or generators of infinite data.

Transducers…

  • allow the chaining of operations like map and filter without allocating memory between each step.
  • aren't tied to any specific data type; they need only be implemented once.
  • vastly simplify "data transformation code".
  • have nothing to do with "lazy evaluation".
  • are a joy to use!

In this talk, Colin will introduce Transducers, show how to use them, and demonstrate some Emacs-specific workflows that make live processing of large data sets in JSON and CSV a breeze.

About the speaker:

Colin has been active in the FOSS world since 2011, publishing libraries and applications primarily in Haskell and Rust. Since 2023 he has been using Lisps more and more, and after falling in love with Transducers from Clojure has ported the pattern to three other Lisps.

Colin is originally from Canada and lives in Japan.

Discussion

Questions and answers

  • Q: When I tried comparing transducers.el to cl-lib and dash (benchmark-compiled), I got the following results: cl-lib: 0.5 sec, 2 GCs dash: 0.3 sec, 1 GC, transducers: 0.75 sec, 2 GC cl-loop: 0.025 sec, 0 GC (note: 0.025, one order-of-magnitude faster) I expected transducers to be slower than cl-loop but faster than the cl-lib or dash.  However this isn't the case.  Any idea why? (benchmark is here: [https://old.reddit.com/r/emacs/comments/1h5c778/which_emacsconf_2024_talks_have_your_attention/m061dge/](https://old.reddit.com/r/emacs/comments/1h5c778/which_emacsconf_2024_talks_have_your_attention/m061dge/))

    • (benchmark-run-compiled 1000  (cl-loop for n from 1 below 2000 by 2           sum (* n n) into total           finally return total))
    • A: Loop is always going to win in cases like this where we are not doing two nested (e.g.) change calls, what I called comp steps.  tranducers shines while we need to do things which chain things together; we can sometimes express ourselves more clearly vs loop.  this may sound sounds like moving the goal-posts: it's really about internal function calls, avoiding stepping though each loop in ways which loop doesn't need to do, so loop might "win".
    • Note: I'm comparing against cl-lib and dash -- the cl-loop is only for reference. I'm curious about the performance gap between transducers and cl-lib/dash.  The number of function calls is the same for cl-lib and dash and transducers.
  • Q: Did you think about generators as a source cf lists, vectors, etc? Maybe I got a word wrong. After the development that generators and Series operations needed-each-other, not being redundant as had been thought. I forgot who the generators person was in lisp.

    • A: (not yet answered)
  • Q:  Do you know of any theoretical texts on transducers?

  • Q: Waters (lazy series in lisp, late 70s) said that this *should have* been done as an additional compiler feature in compilers, but if not it must be a macro package. Did you think about that viz your cl, fennel, elisp, porting of your transducers?

    • A: I think my work could help provide the basis for this; there's much more to be done.
  • Q: Does t-buffer-read provide a lazy stream that's linewise, or charwise, or do something else entirely?

    • A: t-file-read
  • Q: Can the Elisp library be combined with the stream.el API (https://elpa.gnu.org/packages/stream.html)?  Or seq in general?

    • A: I'd say these libraries are completely orthogonal. (Re: what made me ask this question was seeing `t-buffer-read' and hoping that this doesn't convert a buffer into a string)  With seq/generics it should just work: magic of defgeneric
  • Q: How does one debug a t-comp expression? Can you single step and see intermediate results of the different statements you declare?

    • A: In Common Lisp it is wonderful. In Emacs Lisp it is more complicated. I don't know if step debugging would work, but there is a "log" (?) function to print out values.
  • Q: Is there a path for transducers to enable elisp processing of otherwise overly large datasets as if just normal Emacs "buffers" (i.e. just pulling one thing at a time so essentially stream-like under the hood but buffer-like in interface), with none of the usual perf issues with a traditional buffer structure?

    • A: Possibly so yes
  • Q: Re the performance issues mentioned and that popped up recently in the lists/forums, to what extend does tailcall optimization and other mechanisms (incl. inlining, GC-friendliness, etc.) could eventually alleviate issues and enable their use at little to no extra cost?

    • A: Over time, with further work and optimizations. Some already there (tailcall notably)
  • Q: Is there an option to read a csv/json and produce an alist or plist instead of a hash table for an entry?

    • A:  Absolutely.
  • Q: Is the common lisp version ready for 'production' use? Is it complete enough and the API stable enough?

    • A: I use it all the time. I use it extensively. Programming a game, not realizing a dent in the frame rate.
  • Q: Do we need a pre-written "t-" version for every already existing reducing function like + or is there a function to construct them from already defined reducer 2-arg functions?

    • A: already defined. This is basically fold. Some built-in functions like plus already function like reducers. It's a coincidence that they do that. But there's an example in the README. Max is one that does not act like that. For instance, maybe I could screen share later, but if you just type in plus one, If you call plus one in Emacs or Common Lisp, you get back one. It actually only needs one argument. If you only type plus, it actually gives you zero. Plus and multiple satisfy the API of reducers. But if you have one that doesn't, like the max function, and similarly, just type in plus as a function call, just plus with nothing else, and you'll see. No, as a function. zero will come out. This basically means it satisfies the reducer API. But a function like max does not. If you just type in max and then one, it won't work. Pardon me, it did. But if you type in max with nothing else, it wouldn't work. Hence, we have to wrap it in something like fold. I would say go look at the fold function.
  • Q: Is the compelling argument for transducers is that it's a better abstraction? Seems like a lot of concerns/objections (while pragmatically valid) are focused on implementation. Can this abstraction allow for advances in implementation?

    • A: Yes, what I've basically done is mostly followed the pattern of usage that exists in Clojure and in Scheme's SERP 171. In theory, the service level API is the same no matter where you're using this, and that's the idea. If you learn them in one list, you should be able to use them everywhere. Then what it's actually doing under the hood is free for us to change around. My implementations are mostly based on the scheme with a few alterations here and there. And in the Common Lisp case, like adding some Common Lisp isms to improve usage like UX a little bit. But overall, we are free to do whatever we want internally to speed up performance. I just haven't done that work.
  • Q: is there a path for transducers to enable elisp processing of otherwise excessively huge data ets as if just a normal Emacs "buffer" (i.e. but just pulling one thing at a time so essentially stream-like under the hood), with none of the usual issue when a traditional buffer structure?
    • (not yet answered)
  • Q: So the "reducer API" means that the function accepts a variable number of arguments in a monoidal way?
    • that's what I gathered
  • Q: From your investigations and tests so far, do you think there would be the necessity of transducers to eventually go down into the C level code for things like using them to solve "infinitely-big" buffer-like interfaces and such?
    • A: Yeah, like, if we really wanted to go that hardcore, maybe there's some like C level stuff that we could you know, significant demand for such a thing. You know, so far there hasn't been such demand, but maybe there will be in the future. Yeah, perhaps there's some custom stuff we could do.
  • Q: why does the reducer have to sometimes be a custom function, but other times just something like #'+?
    • it depends on if that reducer function needed special input or not
  • Q: do you have FSF copyright assignment? it is nice to get low-level libraries like transducers on ELPA so other copyright-assigned packages can use them (and so they can be included in Emacs when they reach wide adoption)
    • transducers is on MELPA
  • Q: Is that #' business for some lisps and not others the lisp-1/lisp-2 distinction?
    • Sharp quote refers to (symbol-function 'my-symbol) in a lisp2
    • yes, it emphasizes using the "function" associated with the symbol (if there's one in the "function slot" for the symbol) as opposed to some "variable"-type value
    • (and in this case pkal is not asking about the sharp quote but a t-prefixed function as opposed to a standard function like +)
    • that's because of the separate namespace for function symbols?
    • function rather than symbol-function to be extra-nitpicky (to accomodate lambda forms)
    • If I remember correctly, single quote (') does not respect when a function symbol is overridden by a let, but (pound quote) #' does?
    • yes, my question was about the t- prefix.
    • "let"s only bind the symbol value slot, not the function slot.
    • yes, iiuc; in effect, it's sort of an early-binding/late-binding distinction
    • My bad. Should've specified flet.
    • @can't speak for the elisp case, but in the clojure case using things in transducer form has a slightly different "typing" shape as you'd expect and may require a wrapper if a function can't be used as is and such, though I missed the context and that may not be relevant to your point
  • Q: Question about how the transducers video was made? Did you use Reveal.js? Do you have a pointer to the html hosted presentation? How did you generate the content for Reveal?
    • So the presentation itself was done with RevealJS from Org Mode. So as you saw, I had a raw Org Mode buffer, which was which was the presentation itself, which I then just exported with a few certain settings, a few customizations. And then for screen recording, I used OBS, which worked flawlessly on Arch Linux. I used Sway, Wayland, and all of that. So all of that just worked, which was very impressive. Where do the HTML host the presentation? I don't have that presentation hosted anywhere.
    • Text is a little small bth
    • no keep the latin on screen, i was trying to read that
      • It was Lorem Ipsum
        • Translating lorem ipsum plus plus
    • Thanks for larger text

Notes

  • What made transducers click for me way back when was realizing that the usual operations (think map/reduce/filter/etc and their generalizations) implicitly or explicitly demand realized collections, plus intermediate ones between those operations (and GC), instead of composing such transformations into one operation and then making this applicable to streams of unknown-sizes and such.
  • Sorry but map is collect and filter is select for me :)
  • there are many ways to get to them (some may already think of those functions as folds, etc.), but for the bulk of people familiar with map/reduce/filter/etc, it's useful to enter the thinking realizing that you may have infinite streams (of any type even) on input, and may not want to have realized collections at every step of the usual function applications, thus the need to compose the things that make up a map/reduce/filter/etc before applying them one by one on a continued stream of things one at a time
  • ellis: I wrote about half of one in binutils libctf (generators, anyway). See the *_next() functions. Having seen this I'll try to mutate the internals to something closer, right now every iterator is independently implemented (mostly).
    • (inspired by generators, not transducers, since I didn't know about them)
    • still *far* less ugly than "call a function with every item" *_iter() stuff
  • Thanks for the answers fosskers, I'm quite hopeful with transducers working their way into many things, so thinking ahead to where that may happen and to solving perf hurdles
  • I'm totally sold. I'm working on a CL codebase right now, and these are going in there immediately
  • (also CL does not require TCO but good compilers support it to the extent that extensive use of FP is practical)
  • it's a tricky protocol to get perfect the first time for sure, is something that transcends language barriers so always fun to see more impls :)
  • CLTL2 docs on SERIES for those who are curious http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/clm/node347.html#SECTION003400000000000000000
  • definitely watching this one more carefully. if it's CLOS-oriented i'm going to like it
  • note that full TCO is strictly more expressive than special-case optimizations as with emacs's cl-labels
  • in the general case, there not need to actually process such a composed operation on an item from the input until needed/"pulled" from the output side, so yes in a way

  • yea i think the next step in terms of performance would be using a 'plan' object internally to rewrite the lambda calls

  • Julia does loop fusion through its broadcast operator.
  • that gets very hairy and makes it far less simple imo
  • Re the current answer, it doesn't yet, but it's on the path where it could eventually (i.e. introspecting on the composed transformation and simplifying it, be it fusion-type or otherwise)
  • Waters agreed that ergonomics was the key key key thing. (In the PhD world, because of studies that people have trouble reading other peoples' iteration = loop codes)
  • "galaxy brain" is great expression.
  • CLTL2 docs on SERIES for those who are curious http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/clm/node347.html#SECTION003400000000000000000
  • i'm curious about what mathematical structures are related to transducers but that's probably a README question
  • Q: transducers C impl when
  • ecl slime repl + transducers or series
  • I wrote about half of one in binutils libctf (generators, anyway). See the *_next() functions. Having seen this I'll try to mutate the internals to something closer, right now every iterator is independently implemented (mostly).
  • note that full TCO is strictly more expressive than special-case optimizations as with emacs's cl-labels
  • (inspired by generators, not transducers, since I didn't know about them)
  • still far less ugly than "call a function with every item" *_iter() stuff
  • Thanks for the answers fosskers, I'm quite hopeful with transducers working their way into many things, so thinking ahead to where that may happen and to solving perf hurdles
  • I'm totally sold. I'm working on a CL codebase right now, and these are going in there immediately
  • (also CL does not require TCO but good compilers support it to the extent that extensive use of FP is practical)
  • it's a tricky protocol to get perfect the first time for sure, is something that transcends language barriers so always fun to see more impls :)
  • absolutely. I will try :) of course libctf is ABI-stable so I can't just change what the user sees, but maybe I can make the internals less repetitive :)
  • (and maybe add another API that is more transducery.)
  • Nice ielm
  • @pkal: that's what I meant back up, that some functions happen to have the right form for use in transducers, but fosskers expounds much better, small incompatibilities (think order of params, variadicity, etc)
  • I used emacs for so long before finding out that ielm existed. (I wish I knew sooner.)

Feedback:

  • 👏👏
  • impressive
  • cool presentation
  • (still, very nice!)
  • I am so, so thrilled with this. This blew my mind. Thank you! 😊
  • 👏
  • nice 👏👏
  • deeply not ugly
  • Thank you for the excellent talk!
  • Thanks for the talk
  • Thanks fosskers
  • 👏
  • awesome 👏
  • always a pleasure to watch transducers in action - thanks fosskers!
  • Inspiring talk, promptly added to my aoc.asd. Will come in handy.
  • definitely watching this one more carefully. if it's CLOS-based i'm going to like it; CLOS-oriented, rather
  • Thank you fosskers! it is bound to end up in some, if not all of my common lisp projects

  • YouTube comment: Nice, this seems to do basically the same thing as C++ ranges, which I've enjoyed using a lot

  • YouTube comment: Good
  • YouTube comment: Nice presentation!

Transcript

Hi everyone, this is EmacsConf 2024. I'm Colin, and today I'll be talking about transducers. After introducing them, I'll share a bit of history about transducers and the problems that they solve, some basics about how we can use them, how they work, like how they're implemented, some demonstrations of how we can actually use them in the wild, and then some other discussions about issues that they have.
[00:00:41.520] What are transducers?
Okay, let's get right in. What are transducers? Transducers are a way to do streaming iteration with a modern API. Who are transducers for, and thereby, who is this talk for? Well, it's for people who want to do streamed data processing in Emacs. It's for people who perhaps aren't satisfied with the existing APIs, for example, the seq API, or some other common libraries that provide similar functionality. Maybe you're not a fan of the loop macro. Some people find it difficult to understand. Or maybe you've done a bunch of Clojure before, and you'd like more aspects of Clojure in your Emacs Lisp. Or maybe you're just interested in transducers in general, because the pattern has now been ported to multiple different Lisps. So I'm Colin. I'm fosskers on everything online, and I do mainly back-end programming work and a lot of open source software. I wrote Haskell for a long time, both as a hobbyist and professionally. Since the COVID years, I've been writing Rust, both open source and professionally. But now I find that in my spare time, I'm mostly writing Common Lisp. Some things I learned from my years of Haskell was that a lot of programming is just altering the shape of data. You know, sometimes we work through our algorithm line by line. We're trying to just tell the computer exactly what to do. But if we step back, a lot of the time we're just getting in data of some shape, changing it, and then passing it along. A lot of these patterns are common, identified decades ago. For instance, we have some collection, and we want to transform every element of that collection and then pass it on. Or maybe we're trying to filter out bad elements in that collection. Or maybe we're looking for a specific element in that collection. Yes, you could write all that with for loops, but these kind of common patterns were identified and given names decades ago. So why not use them? They say that there are two major problems in computer science, one being cache validation and the other being naming things.
[00:03:27.590] Common issues
I've identified five other problems that come up when we're trying to deal with collections of data, or big streams of data. One is that if we were trying to load a file all into memory all at once and process the whole thing, sometimes we can have memory problems. You've probably seen out-of-memory errors or such things. A second issue that comes up is that if we were looking at a giant for loop, in particular a nested for loop or such things, it can be hard to tell just by looking at the code what it's trying to do, what it intends. If we don't go character by character or line by line, it can be hard to understand it. Furthermore, and this is particularly an issue with Emacs Lisp, is that if one call, for instance, to seq-map, then piped into seq-filter, for instance, will have an intermediate allocation, the map will take the source container, allocate a new one, and then the filter will operate over the second one. This is wasteful. Furthermore, it can often be difficult to abort a stream. For instance, if we were filtering through our collection, but we knew we only wanted to go halfway, for instance, for some reason, we have no way to stop it halfway through. We just have to process the whole thing, even if we know we don't need to. Another issue is that for languages that have traits, or in Haskell they're called type classes, if you are defining what it means to map over something, you often have to redefine that for every kind of container or thing that you're iterating over. Wouldn't it be nice if we could define things like map just once and then reuse them everywhere? Now, transducers solve all five of these, without the addition of new language features, and with little more than plain old function composition.
[00:05:47.280] Transducers
If this is your first time hearing of transducers, yeah, no problem. They were originally invented in Clojure by Rich Hickey, and this is a quote from him. He thinks transducers are a fundamental primitive that decouple critical logic from list or sequence processing, and if he had to do Clojure all over, he'd put them at the bottom, at the very bottom of all the fundamental primitives. Now, that's Rich speaking quite highly of them. And I think he has a point here. They were invented originally in Clojure. In more recent years, they were brought over to Scheme via SRFI 171. That's where I found them when I was learning the Guile language. In the process of submitting a patch, I realized that there were other things to be improved. So I ported the pattern to Common Lisp, then Fennel, and then more recently, Emacs Lisp. The Common Lisp and Emacs Lisp APIs are identical. And the Fennel one is not identical, but fairly similar. Overall, everywhere you find transducers, they should basically be fairly uniform. When I originally made the Common Lisp variant first, I sampled the APIs from a number of different languages and came up with what I believed to be a representative sample of what most people would want out of such a library. I gave functions their common modern names. For instance, map is map and filter is filter and so on.
[00:07:35.280] Using transducers
What does the usage of transducers look like? Well, these examples will all be the Emacs Lisp variant, but the Common Lisp will look basically exactly the same, minus this little t- prefix. Running transducers requires three things. It requires a source. This could be an obvious thing like a list or a vector, but it could be other things like a file, or in Emacs list in particular, a buffer. A reducer is a function. It's something like the + operator or the * operator, or certain constructors of various containers. It takes values and collates them into some final version. Now, finally, we have what we're calling here a transducer chain. This could be one transducer function or it could be multiple composed together. These are the functions that actually take data and transform them somehow. For instance, this. We have a list of three elements. We want to reduce it into a vector. How we are going to transform the elements along the way: we are doing plus one to each of them. If this syntax is new to you, just know that this #' just means that this thing that comes after it is the name of the function. In Common Lisp and Emacs Lisp, this is necessary, but for Clojure and Scheme, it is not. So we can see here that just this example is not much different than any other normal map call you might see made, but if nothing else, it's a handy way to convert a list to a vector or anything else. There are many, many reducers available and many different forms that we can collate the final value into.
[00:09:52.625] A more involved example with comp
Let's see a more involved example. Okay, now we've got some more meat here. Here we can see usage of the comp function and a custom source, ints. Ints is an infinite generator of integer values. That's not like a list or a file. It will generate infinitely. Comp is letting us compose multiple transducer functions together. Notice that this is the opposite order of what we'd usually be used to from a function like comp. The order here is top to bottom, basically, so that the map goes first, then the filter, and then the take. So effectively is what we're doing is taking all the integers that exist, positive, adding one to them, filtering out only the even ones, but then just taking 10. Cons here is a function that just produces the ending result as a list. So what happens here specifically is how we are avoiding intermediate allocations. First, the number 0 will come through. It will be pulled out of this source internally by transduce. It will make its way into the map. The map will add it. Then it will immediately go into this filter step. So it's not like all the maps occur, and then all the filters occur. We do everything for each element. So the 0 comes in, now it's 1. The filter would occur. Well, it's going to fail that because it's not even, so it will just bail there. Now we'll go to the next one. Now 1 will come, it will become 2, then it will be saved by this evenp call, and then the take will capture it, because we only want 10 values here. You can see 2, 4, 6, 8, and so on is the result that we expect. So let's play around a little bit.
[00:11:49.333] In Emacs
Let's jump into Emacs and see what we can do. Alright, you should see my Emacs screen here. These are the actual notes for the actual presentation done in Org Mode. I'll boost that up in size for a little bit. That should be more than big enough for you. Just by changing the reducer, we can change the result. Okay, now it's a vector. Well, what else can we do to it? Well, let's just add up the results. Maybe we just want to count the results. Oh, indeed, there were 10. What if we want to find the average of the results? What if we want to find the median of the results? And so on. Here's some more interesting things that we could do. We could add different steps. So here we have all the integers. Let's add, hmm, okay, we'll keep that. We're going to add t-enumerate. What enumerate does is for each item that comes through, it is going to add a sort of index to it and make it a pair. In this case, it's going to be equal to what came in here. Well, we can change it. If we start this at 1, now it will be different. 1 will be paired with 0, and then 2 would be paired with 1, and so on. We'll accept that the even call will change that a little bit. Why we're doing this is because we want to form a hash table. Let's move that down to 3, maybe we'll get a better result. What do we see? Okay, here now the result is a hash table. What are its values? Well, 0 seems to have... The key of 0 seems to be paired with 2, the key of 1 seems to be paired with 4, and 2 seems to be paired with 6. Maybe let's jazz that up even a little bit more. We're going to start from a string and we'll call it hello. That's not going to work anymore and neither is that, but what we could do is we could say t-map #'string. I believe we'll do that. Let's see if that works. It did. So that's going to convert a character into a string. Let's just go two just to make it a little easier. Now you can see that we've constructed a hash table here. The key of 0 is mapped to the string of h and 1 is mapped to e. Now, I really like having this reducer in particular.
[00:14:29.469] Hash tables
Know that hash tables are also legal sources. I find that both in Emacs Lisp and in Common Lisp, dealing with hash tables--like creating them and altering them--can be a bit of a pain. Having them immediately available like this with transducers is very handy, I find. We can work with something that wasn't a hash table. We can construct it in a way that makes it amenable to that, and then reduce it down into a hash table, and here you go. Very handy. One last point is that you can see very clearly what this is attempting to do, as opposed to, say, a for loop. It's very clear what that step is doing, and then you can see what that is doing, and you know that the result is going to be two. Each line is kind of its own declarative step, and it should be clear, just by staring at this, basically what you're going to get out. This is one main difference from other languages that have things--say, for instance, Rust's iterator API--is the difference between the transducers and the reducers. If we go up here, for example, the difference between the transducers and the reducers and the sources is not explicitly laid out, whereas with transducers, it is. You have to be aware of how these things are different. I think that that helps clarity.
[00:15:55.800] How do transducers work?
Moving on. How do transducers work? Well, we want to go see the README. So, what we're going to do is we're going to go to here. You should still be able to see this. This is the CL example, actually. Let's go to transducers.el. Their APIs and READMEs are the same, but just for the sake of it, we will go see how this looks on the Emacs side, just so that nothing is a surprise. But recall that the APIs are essentially the same between the two. If you go to this section, writing your own primitives, you can read about how transducers are actually formed, whether or not you want to write them yourself or not. We can see here t-map. We accept the function that you want to operate with. Then you've got this extra little lambda here that's coming in, and it's receiving a thing that is named reducer. Now, while here we're calling it reducer, it's actually the chain of all the composed functions together. It's all those main transducer steps. Finally, it's the reducer all composed together with normal function composition. That will matter very soon. Now here's the actual meat. We can see the accumulative result that's coming in with the current element. Now we need to operate on this. Were it normally mapped, we would see us applying the F to the input. But here, you can see us applying the F to the input and then continuing on. So us calling the rest of the composed chain here is the effect of, in the previous slide, moving to the next step. We could ignore this line for now. If you're curious, please read the README in detail. Now, what about reducers? What do those look like? Well, let's just scroll down here. Recall that a reducer is a function that's consuming a stream, right? Zoom that up for you a little bit. Now, in the case of count, recall that this is how it's working, how we saw a moment ago. So clearly this list of five elements only has five things in it. Well, a reducer by structure is a function of two, one, or zero arguments. So we can see here in the case of two, this is the normal iterative case. We don't care about the input for count, we just care about the current accumulated count that we're doing, and we add one to it, and that's it. This then goes back to the loop and the whole process starts again with the next element. In this kind of done case, this is used internal to that sort of the supervising function transduce. It's just confirming the final result. Sometimes some post-processing is necessary here, but in the case of count, as it is so simple, that is not necessary. And now here's the base case. This is also used within that supervising transduce function at the very top. Well, if you're counting, you have to start from somewhere, right? In this case, well, what you're starting with is zero. In the case of cons, you'd be starting with an empty list. In the case of vector, you'd be starting with an empty vector and so on. Once again, if you are more curious, please take a look at the README.
[00:20:00.520] Transducers in the wild - CSV
Okay, transducers in the wild. Well, let's go take a look at processing some CSV data. We're going to open up a new Emacs Lisp bracket here. So I have a file. And in this file, let's just go look at C-x b right there, you will see that we've got some bank transaction information. It's got these transactions from a whole bunch of different people into different accounts, whether it's money coming in, money going out, and then a basic description. How's your Latin? But for this little test, what we want to do is we want to find Bob's final bank balance. Let's get on to it. First of all, let's just confirm, let's do some basic stuff. with-current-buffer, find-file-noselect. What's the name of that file? This is pre-organized, so you will just see it right here. t-transduce and t-comp. We don't know what we're going to comp yet. Actually, I'll just pass to show you. And then we will see, let's just do a little t-count just to confirm. What's our source? Well, our source is a buffer, t-buffer-read. And note that because we're using with-current-buffer, if we go like this, if we go current-buffer, this will just work. So now let's... Well, that was odd. I should have done it like that. There we go. So now we should make that a little smaller so you can see what it is. Now if we hit RET, we should get the right result. Okay, so there are 50,001 lines in this file, but the one extra one is the name of the headers, right? We want to process this file in more detail. So how can we do that? Well, let's start by just automatically interpreting the results as CSV. If we do that, okay, well now we only have 50,000 entries as we expected, right? Because it's going to pull out the header line. If we now say we want to just filter out, you know, We only want Bob, right? So if... gethash, it was in the row of name. Each line here is made into, at least by default, is made into a hash map. So if we go like this, we should see that. Okay, so 12,000 of these lines or thereabout belong to Bob. Let's just move that over a little bit. Actually, I suppose we don't even need that anymore. I'll just keep that full size for you. Okay, so all right, there's about 12,000 results for Bob of the 50,000. What's next? Well, we want to confirm, we want to pull out everything, all of the in and the out entries. Thank you. So, string to number, because we know that everything came in as strings. Unfortunately, the from-csv doesn't try to be smart at all, it's just pulling everything in as string values. If you want actual things to be numbers or whatever, that is up to you to do the parsing yourself. Okay, so we have those two values now. We know that we saw from the data just a moment ago that you're only going to have a value in one column or the other. It's either going to be 0 in the empty one, or you're going to have some number in the other. So we know that we can just naively add them. If it was in, it would always be positive. So we'll just add that. But in the negative case, we want to just make it negative really briefly before we add them all together. let's now just prove to ourselves that we are sane here. What we're going to do is we're going to quickly go say take 5 just to convince ourselves, and we'll go cons, and let's see if we get kind of results that make sense. Okay, these sort of make sense. It looks like you know Bob's got some big expenses here. If we take say 15, does it look any better? Okay, looks like he had a payday. All right, good job Bob. Let's get back in there. Now we only really care about adding the final result, right? So there we go. Add that all together and we'll see what we get in a moment. Okay, wow, Bob's rich. Okay, so it looks like in his 12,000 transaction, Bob has an overall net worth of $8.5 million. Looking pretty good. So here's an example of how you can, particularly in Emacs Lisp, how you can very easily just get a file, consider it the current buffer, and then just do whatever you want to it. Note that there is sort of first-class support for both CSV and JSON, and then you have, and both of those bring in their values as hash maps, and then you're just free to do whatever you want and process them, potentially both writing them back out as CSV or JSON once again.
[00:26:03.240] Issues and next steps
Some issues with transducers that can come up is that one, a zip operator is missing, but I'm working on it. Two is that performance, particularly in Emacs Lisp, isn't that great. It could be due to the sort of nested lambda calls that have to occur internally, but the common Lisp implementation is quite good. and there's yet no support for parallelism. You can imagine that a lot of those steps you could potentially perform in parallel depending on the platform, but research has not yet gotten that far. Okay, that's all. Thank you very much. If you have any questions, please contact me.

Captioner: sachac

Q&A transcript (unedited)

Hopefully the internet goes well. It's a nice Monday morning here in Tokyo. Are we connected all right? Okay, I seem to be struggling still with my audio. 1 2nd calling. Yeah, you were muted for a moment there. Okay, there we are. Okay. All right. Sorry about that. I got a mute out my, my back office chatter. That's kind of distracting me a little bit. All right. Sorry. I may have lost the plot a little bit. I think I did. However, find the 1st question. I got pretty distracted by conversation backstage. Yeah,
[00:01:09.920] Q: When I tried comparing transducers.el to cl-lib and dash (benchmark-compiled), I got the following results
no problem. So the first question here, someone's asking, when they first tried comparing transducers.el, the cl-lib and Dash bookmark compiled, and they give some detailed results we're sharing on the stream. Um, they expected transducers to be slower than CL loop, but faster than CL lib or dash. However, this isn't the case, any idea why. And so I'll, I'll come back into their data to show there's they're showing, um, you know, there's not a lot of detail on the, on the, on the use case here. We could certainly click through it, do it. Oh, I should've waited to zoom until I find my spot here. There we are. All right, so there's our example. Looks like we are doing a simple map and a sum. Mm-hmm. Yeah, that's right. Yeah, question about performance. So a case like this, a simple, I just want to rip through a collection of numbers and sum them all. That's a case where basically loop is always going to win because loop is optimized. This is true in both Emacs Lisp and in Common Lisp. For a case like this where you're not really doing two nested of chained calls, like you don't have many sort of what I was compositional steps. If you're just ripping through a collection of numbers, loop is always going to win. Transducers kind of shines when you have to do things that loop can't in terms of expressing yourself. So there are lots of different transducers that you can chain together. And in that case, you're kind of prioritizing developer time and developer happiness because you're able to yourself more clearly, whereas sometimes those kind of algorithms can get very hairy if you're just using loop. Now that sounds like I'm moving the goalposts, and there's really no excuse for these things not being as performant as possible. In this specific case, my guess is that the transducers is slower because it has to do a whole bunch of like inner function calls in order to actually do the adding and the collecting. So there's a lot of stuff that just the raw loop doesn't have to do, which transducers does. And so in this case, that's why it would be slower. All right, makes sense. Um... I cannot comment against Dash. And also a reminder that transducers both in CL and in Emacs Lisp here doesn't attempt to do any, you know, fun, you know, inner rewriting or, you know, what's called an Haskell fusion. Like if you have two different map steps, like in a row, it's not gonna see that and somehow fuse them internally. It's a fairly, in that sense, the implementation is just as is. to make it you know as raw fast as possible. The idea being that ergonomics is more important up front. Yeah, that's kind of a whole fascinating sub-panel, right? My theme this conference has been, oh, all these different things we should try to get sub-panels going for and use that. Maybe fill in the dev track or even have a third track or whatever. I'm not that concerned about the logistics of squeezing into the schedule so much. But anyway, interesting, I mean, to say.
[00:05:40.840] Q: Do you know of any theoretical texts on transducers?
Did we already speak to theoretical texts? No, right? No, let's continue. Okay, so another question from the group. Do you know of any theoretical texts on transducers? My readme, particularly of the Common Lisp implementation, is the theoretical text on transducers. Rich Hickey has some YouTube videos which also come close. I mean, he invented the things. But in terms of having a full explanation of everything, it's my readme and it's also the... The info manual of Guile Scheme, their documentation on Surfy 171 is what I used to learn transducers and to re-implement them in other LISPs. So if you just want like a document explaining them, MyReadMe is actually the clearest that I've found. Awesome. Okay, next question. And I'm sorry, you gave a name, you referred to somebody's videos. Rich Hickey, the inventor of Clojure. Rich Hickey, thank you. Hope I got the spelling right, and maybe somebody can catch that and fix it. If not, I'll reach on. Thank you.
[00:07:04.720] Q: Did you think about [compiler features, macros] viz your cl, fennel, elisp, porting of your transducers?
Reach on to the next question. Waters (Lazy Series in Lisp, late 70s) said this should have been done as an additional compiler feature in compilers, but if not, must be a macro package. Do you think about that vis your CL, Fennel, Elisp, porting of transducers? I think that there's definitely some Galaxy Brain Lisp author out there is probably smart enough to turn a bunch of this stuff into macros. I believe that's how the common Lisp library series works. It sees that you were calling map or whatever, and it actually knows that that's a special macro key. in order to be fast. I did not do that. The implementation as I have it is very simple and simplicity shouldn't be underestimated. I love it. What a nice succinct answer. Even I can manage to type that out as I scroll us to the next question.
[00:08:16.579] Q: Does t-buffer-read provide a lazy stream that's linewise, or charwise, or do something else entirely?
So, does t-buffer-read provide a lazy stream that's line-wise or character-wise or do something else entirely? Okay, there are two functions. I showed t-buffer-read. There's also one called t-file-read, which does that. You actually have the buffer open, it's much more clever. t-buffer-read, I believe, is simpler. As long as you have an Emacs list, what is called the current buffer active. I'm fairly sure you're able to just call next-line on it. I don't believe that I'm doing anything fancy there, looking for line ends. I believe I'm just grabbing the next line and then processing that line-wise. Very good.
[00:09:09.424] Q: Can the Elisp library be combined with the stream.el API or seq in general?
Can the Elisp library be combined with the stream.el API or seq in general? I would say that these libraries are completely orthogonal. You saw that everything was prefixed by t-. Basically, transducer is its own zone. However, one thing that I do in the common lisp, which is theoretically possible for the Emacs Lisp as well, is kind of like little shim libraries. So I provide, at least for Common Lisp, for a number of, you know, popular sort of third-party collection types, I provide an ability to use them as sources. Maybe that's what you mean. Like the built-in containers for Emacs Lisp are already supported. So, you know, a vector hash table and so on. make sense so i think what i heard there is yeah go ahead please sorry in terms of mixing like you know like for instance you know like seq-map used in transducers we'll put it that way i was just gonna say i think it um it it sounds like you're saying Yeah, probably they are actually. We don't know yet about any places where they don't play nicely together. So quite possibly so. We can use sequence and transducers together, for example. As a source potentially, yeah. It's very easy because that just uses defgeneric. As long as you have a new, like if you have a new collection type, as long as you implement a def method for it somewhere, it'll just magically work with this library. That's the magic of... Yeah, as an Emacs user enjoying, you know, sort of the renaissance of new features it's had, or sorry, Emacs ERC user for chat. I've seen a lot of awesome stuff get done in the last couple of years with generic set. JP never was working on that. And like, that's just making me my eyes pop and go, wow, that does make a whole lot of things simpler, doesn't it? I think we're a lot of us running into generics and how that solves problems in Emacs.
[00:11:47.543] Q: How does one debug a t-comp expression? Can you single step and see intermediate results of the different statements you declare?
How does one debug a t-comp expression? Can you talk in terms of single step, step-by-step, intermediate results of the different statements you declare? Yes. So in Common Lisp, this is and sly stickers and things like that. In Emacs Lisp, it's a little bit, shall we say, more difficult. For step debugging, so what comp does is comp internally, it should be a macro, but currently it's not, although there's work to improve that. It's doing an internal reduce and it's turning into one giant kind of composed lambda inside. So I don't know if step debugging would work there. However, we do have one function called log, which lets you inspect intermediate results. So you could technically use that to inject yourself somewhere into the transduction chain and, you know, halt or, you know, inspect the current value, et cetera. So you get a bunch of questions lined up. I think we're coming up, uh, within our last five minutes, uh, before some declared, uh, reset time that we have internally to just roll our closing credits, so to speak. Um, not that I would want to cut the question and answer short, but I might have to step away personally. But, um, as we discussed before, you can just kind of run the QA, however you want here. Um, or, or take questions offline if you'd like to answer them off the pad. And I just want to say one more time. Kitt said it managed later. Thanks again for your talk for dedicating the time to this live QA. And I think we can see by the many questions that are here. So I'll try to kind of flip us through as many of them as I can with our last couple of minutes, if that sounds good. Alternately, this might be a good time if you have kind of wrap it up, final thoughts, as Leo Sopanda saying. By all means, have at. Sure, thanks a lot. I'd say that if you are still curious, check out the read-me's because those have a lot of information, including a full description of the API and everything that's available. Otherwise, just give them a shot. Using these things is the best way to learn them, of course. I use them everywhere, basically, all across my Emacs list and all across my common list now. They get a lot of mileage. All right. You're speaking our language now. As Emacs users, all our ears poke up when you say, I'm getting a lot of mileage. I'm using it across everything. Every Emacs user has a story that harmonizes with that, I think.
[00:14:42.495] Q: Is there a path for transducers to enable elisp processing of otherwise overly large datasets as if just normal Emacs \"buffers\" (i.e. just pulling one thing at a time so essentially stream-like under the hood but buffer-like in interface), with none of the usual perf issues with a traditional buffer structure?
So our next question, is there a path for transducers to enable Elisp processing or otherwise overly large data sets as if just normal Emacs buffers, i.e. just pulling one thing at a time. So essentially stream like under the hood, but buffer like an interface. I think that makes sense to me. with none of the usual performance issues, like as if, you know, the history with long files is what that brings to mind, I guess. Yes, so as you saw before, the withBufferRead sort of stream function does have to have the actual buffer in memory, and then you can go really fast. But there's another one with file read. Now, again, I haven't tried to optimize that yet. But in theory, it is able to read right from the underlying file without having to open it as a buffer first. Awesome. Ari, the performance issues mentioned, and that popped up recently in the list and forums, to what extent does tail call optimization and other mechanisms like inlining, garbage collection friendliness, and so on, could these alleviate issues, enable their use at little to no extra costs? I feel like we're leading the witness here, but I'm sure you see where we're going. Yeah, no problem. So in terms of tail optimization, that's already happening because the internal loop mechanism is using CL labels. And in Emacs Lisp, CL labels is just a macro that is like extremely tail recursive. So that's very, very fast. It's not tail recursive, but it's using like goto. So it's extremely, extremely fast, like the raw looping of it. So, okay, well then where does the slowness come from? It's probably coming from those lambdas and it's probably coming from, uh, like extra consing, extra allocation somewhere, which is, um, sort of what you were, what you're referring to with the GC friendliness. So perhaps there's some, um, um, yeah, some, like some fusion that I can do to speed it up. Yeah, that just sounds fascinating endlessly.
[00:16:51.200] Q: Is there an option to read a csv/json and produce an alist or plist instead of a hash table for an entry?
Are there options to like read from a CSV, JSON, produce an alist or plist instead of hash table? Absolutely. Yes, I need to double check that, but we can read both CSV and JSON, and you should be able to just turn on the plist option. I will double check, but there's fairly free conversion between those three types because hash table is not always what you want. And actually, I suspect that slowness that we saw in the demo before was because it was allocating hash tables for every, like, all of the 50,000 lines. And had it been a plist, it would have been faster. Interesting, so maybe there's opportunities even if you end up with hash lists, but then they're shared strategically and you pay the cost of a little extra layer in there that buckets them together the way that we might group files by the first four characters in the file name once we've got a million files.
[00:17:50.520] Q: Is the common lisp version ready for 'production' use? Is it complete enough and the API stable enough?
Anyway, is the Common Lisp version ready for production use? Do you want to comment on API stability? I use it all the time. I'm writing a game in Common Lisp right now, and I'm using transducers everywhere in there, and it doesn't even make a dent in the frame rate, and I'm using them extensively. Okay, well, I'll just read from chat. Thanks so much for the answers.
[00:18:17.477] Q: Do we need a pre-written \"t-\" version for every already existing reducing function like + or is there a function to construct them from already defined reducer 2-arg functions?
Do we need a pre-written or t-minus version for every already existing reducing function, plus, as an example? Or is there a function that constructs, in my, I'm going to add the word, auto-visualifies them already, auto-defines or something, or just generically wraps function calls some way? already defined. This is basically fold. Some built-in functions like plus already function like reducers. It's a coincidence that they do that. But there's an example in the README. Max is one that does not act like that. For instance, maybe I could screen share later, but if you just type in plus one, If you call plus one in Emacs or Common Lisp, you get back one. It actually only needs one argument. If you only type plus, it actually gives you zero. Plus and multiple satisfy the API of reducers. But if you have one that doesn't, like the max function, and similarly, just type in plus as a function call, just plus with nothing else, and you'll see. No, as a function. zero will come out. This basically means it satisfies the reducer API. But a function like max does not. If you just type in max and then one, it won't work. Pardon me, it did. But if you type in max with nothing else, it wouldn't work. Hence, we have to wrap it in something like fold. I would say go look at the fold function. Right, which that I won't do. I'm not that well enough prepped. Darn it. Leo would have been here, but oh, well, you got me. Yeah, no problem. But fold is sort of the ultimate reducer function. Great. So is there, where was I? Here we go. We're way past this, right? So
[00:20:26.320] Q: Is the compelling argument for transducers is that it's a better abstraction?
is the compiling argument for transducers that it's a better abstraction? It seems like there are concerns, objections, while problematically valid focused on implementation. Can this abstraction allow for advances in implementation? Yes, what I've basically done is mostly followed the pattern of usage that exists in Clojure and in Scheme's SERP 171. In theory, the service level API is the same no matter where you're using this, and that's the idea. If you learn them in one list, you should be able to use them everywhere. Then what it's actually doing under the hood is free for us to change around. My implementations are mostly based on the scheme with a few alterations here and there. And in the Common Lisp case, like adding some Common Lisp isms to improve usage like UX a little bit. But overall, we are free to do whatever we want internally to speed up performance. I just haven't done that work. Awesome. Awesome. So here's where I have to, where we're getting the hook. We've just been pulled off the stream. The viewers just saw the crawl by as it sent us over to the other pad where I get to jump on and get involved with that now. But I can't thank you enough, Colin. Would you like me to stop the recording here? Any other comments you'd like to make? Uh, yeah, sure. Like, I mean, I'll stick around for any more live questions. I'm looking at both IRC and, and, um, uh, big blue button here. So if people have more questions, I'll hang around for a bit. I'm going to leave the channel open. I see you do have a few people in here, so I'm just going to go ahead and leave the recording. We can always trim it. Um, trim it up. If you, uh, let us know, Hey, the last 10 minutes weren't anything, you know, or whatever. No, no pressure, no worries, no mistakes. Thank you. Really appreciate you. Yep. Thanks a lot.
[00:22:31.960] Q: Question about how the transducers video was made? Did you use Reveal.js? Do you have a pointer to the html hosted presentation? How did you generate the content for Reveal?
OK, does anyone else have some questions? I see Mohsen in the BigBlueButton chat is asking how I made the video. So the presentation itself was done with RevealJS from Org Mode. So as you saw, I had a raw Org Mode buffer, which was which was the presentation itself, which I then just exported with a few certain settings, a few customizations. And then for screen recording, I used OBS, which worked flawlessly on Arch Linux. I used Sway, Wayland, and all of that. So all of that just worked, which was very impressive. Where do the HTML host the presentation? I don't have that presentation hosted anywhere. I'll look at the. I don't see that. Here it is. So we've got the file here as well. Looks like that's it for questions, basically. Yep, and it looks like everyone's moved on for now. Let's see. I mean, it would be so this is answering lounge 81 on IRC.
[00:24:20.160] Q: From your investigations and tests so far, do you think there would be the necessity of transducers to eventually go down into the C level code for things like using them to solve "infinitely-big" buffer-like interfaces and such?
Yeah, like, if we really wanted to go that hardcore, maybe there's some like C level stuff that we could you know, significant demand for such a thing. You know, so far there hasn't been such demand, but maybe there will be in the future. Yeah, perhaps there's some custom stuff we could do. And otherwise, magic one. Well, it looks like some people are quite happy with this. All right. That's about what I've seen. So why don't we end it here? I think I can control the recording from my end. If I pause it, will that work? All right. Thank you, everyone.

Questions or comments? Please e-mail emacsconf-org-private@gnu.org

Back to the talks Previous by track: So you want to be an Emacs-fluencer? Next by track: Gypsum: my clone of Emacs and ELisp written in Scheme Track: General