Parallel Text Replacement

Lovro, Valentino Picotti - IRC: hokomo, hokomo@disroot.org

Format: 15-min talk ; Q&A: BigBlueButton conference room
Status: Q&A to be extracted from the room recordings

Talk

00:00.000 Introduction 00:23.440 Problem: Goal 01:12.360 Problem: Naive Multi-pass 01:34.200 Problem: Clever Multi-pass 01:57.720 Problem: Terminology 03:04.440 Problem: Scaling Multi-pass 03:55.920 Solution: Single-pass 04:18.240 Solution: Existing 06:29.080 Solution: query-replace-parallel 06:55.240 Demonstration: Swap 07:53.970 Demonstration: LaTeX 08:48.700 Demonstration: Regex 09:36.320 Demonstration: Order 10:54.440 Demonstration: Fun 12:29.120 Implementation 14:18.740 End

Duration: 14:46 minutes

Q&A

Listen to just the audio:
Duration: 10:16 minutes

Description

We present our Emacs package for performing parallel text replacement.

"Parallel" in this context does not refer to improving efficiency through parallelism, but to the concept of performing more than one text replacement without them interfering with each other. This is in line with the usage of the term in the Lisp community when contrasting the behaviors of LET and LET*, SETQ and PSETQ, etc. (e.g. http://www.lispworks.com/documentation/lw60/CLHS/Body/s_let_l.htm).

We will present the package's features and its integration with Emacs' query-replace system, a comparison with previous solutions, and a few notes on our implementation. We will describe some common use-cases and showcase how the package is used.

The package is currently not yet published in a package archive, but the code is already publicly available at https://github.com/hokomo/query-replace-parallel. The name "query-replace-parallel" is not yet final and we are thinking of alternatives. Our current best candidate is "replace-parallel" (similar to the built-in "replace.el"), but suggestions are welcome.

Discussion

Questions and answers

  • Q: This looks great, and was very well-presented.  Do you have plans to upstream this functionality into Emacs?
    • A: Would require some refactoring upstream, so not suitable for upstreaming as-is.
  • Q: Did you use pair-programming while developing it, or did you work independently, alternating and reviewing?
    • A: Yes, we did! I was at the keyboard, Valentino was at the whiteboard, and we kept bouncing ideas back and forth, trying out prototypes, coming up with various tests, checking the edge cases, etc.
  • Q: What is your background in programming? Was it difficult to implement following the same API and architecture as what is already in Emacs?
    • A: Both Valentino and I are PhD students in computer science, but a PhD or similar is definitely not a requirement. It wasn't too difficult because we could reuse the interactive functionality from query-replace's internals. Figuring out what and how to reuse is what took a bit of creativity, but a lot of the necessary knowledge for that came from just reading and poking around Emacs' replace.el. Don't be afraid to go and read the source!
  • Q: What did you learn about Emacs programming or programming in general while working on this project?
    • A: That Emacs is so flexible that you can even advise its message function. Similarly, being able to prototype functionality so quickly and immediately integrate it into the rest of Emacs is so fun and so satisfying!

Notes

  • One usecase could be character names in a novel manuscript, if one has named a character and want to now rename it to some other character names or swap it with another one.
  • Nice, I was wondering if it utilized rx
  • package installed, ready to use!
  • excellent talk, and also such a cool package
  • great talk, very clever concept
  • that SRE "paper" you linked to is interesting
  • just saw the "Parallel Text Replacement" talk - 👏 great talk!

Transcript

[00:00:00.000] Introduction
Hi everyone! Welcome to our talk on Parallel Text Replacement. My name is Lovro, and I'll be telling you about an interesting problem that my friend Valentino and I set out to solve one afternoon. We will describe the problem, take a look at some of the existing work and then present our solution. Afterwards, we will show some demos and conclude with a quick overview of the implementation. Let's get straight into it!
[00:00:23.440] Problem: Goal
Here is a problem that most of us have dealt with at some point. Assume we have a piece of code such as the following. We use a code example here, but in general what we're about to discuss can be applied to any piece of text. After a bit of thinking, we decide that the names of the two variables, "foo" and "bar", should actually be swapped. That is, "foo" should be replaced with "bar", and "bar" should be replaced with "foo". The question is: what is a good way to achieve this? We could perform the edits manually if the code is small enough, and we might even be done reasonably quickly. However, consider two things. Imagine the usual case where there's just too much code to edit by hand. We have no other option than to automate the task. More importantly though, we have a whole programmable text editor right at our fingertips. We should object to doing things that the computer can do for us.
[00:01:12.360] Problem: Naive Multi-pass
So, one way to automate it is by using our old friend query-replace (M-%) multiple times in a sequence. We first do a pass where we replace "foo" with "bar", then we do another pass where we replace "bar" with "foo". But that's clearly not right. We all know that this naive multi-pass approach doesn't work because it results in interference between the two replacements.
[00:01:34.200] Problem: Clever Multi-pass
Instead, we have to be a bit more clever. We should first replace "foo" with a temporary string, in this case "oof", that we will call a "token". To avoid interference, we must be careful to ensure that the token does not contain whatever we're about to replace next. Then we do a second pass to replace "bar" with "foo", and finally a third pass to replace the token with "bar". This gives us the result we want.
[00:01:57.720] Problem: Terminology
Putting the implementation aside for a moment, this style of text replacement, where we replace multiple sources with their targets, without running into interference issues between replacement pairs, is what we call a "parallel replacement". This is the essence of the problem we're trying to solve. The examples with swapping that we've shown so far are really just one of the many use cases that are supported by a general parallel replacement utility. To avoid confusion, let us clarify that the word "parallel" is not in reference to hardware parallelization, but rather comes from analogy with the Lisp let operator, where the bindings of variables are performed in parallel, rather than sequentially as in let*. Parallel in this context means that none of the bindings are in scope within any of the initial value forms. In other words, just like a let's initialization form cannot refer to any of the earlier bindings, a replacement pair's source should not be able to replace the previously substituted targets of any other pair. This is what we mean by "no interference".
[00:03:04.440] Problem: Scaling Multi-pass
However, manually invoking multiple carefully chosen query-replace commands gets old very quickly. Say we scaled up the problem and wanted to perform n swaps instead of just two, e.g. to swap, or rather, rotate, "foo" to "bar", "bar" to "baz", "baz" to "quux" and "quux" to "foo". We would first have to perform n - 1 additional replacements to introduce the necessary tokens, effectively doubling the number of steps. Even if we tried to automate this, think about what tokens the code would have to generate if we had no prior knowledge of the replacement pairs given by the user. We would have to program defensively and use long randomly-generated strings that, one, hopefully do not interfere with any of the replacement pairs, and two, might slow down the search if they're overly long. Can we do better?
[00:03:55.920] Solution: Single-pass
Yes we can! We can actually perform just a single pass. The trick is to alternate between the replacement pairs, replacing whichever source occurs the earliest, and making sure to continue scanning after the end of the substituted target in order to avoid interference. This interleaving of replacements is not something that's easy to do by hand with query-replace.
[00:04:18.240] Solution: Existing
Since this is Emacs we're talking about, of course there already exist solutions that implement this idea. Here are few that we could find. The EmacsWiki has a page dedicated to this problem. Stack Overflow has an old post where a couple of users provided their solutions. Mastering Emacs also gives a method along with other interesting query-replace-regexp (C-M-%) patterns. More recently, Tony Zorman made a blogpost providing a solution with an interface based on query-replace. I encourage you to take a look at these solutions if you're interested in the details. But while a step in the right direction, these solutions are not satisfactory because they all lack one or more of the following. One, they are not completely automated and require the user to come up with a relatively complicated and verbose query-replace-regexp invocation. Two, they are restricted to performing only 2-element swaps rather than general parallel replacements. Three, they don't provide any sort of interactivity during replacement and instead perform it in one shot. Four, they don't attempt to integrate with the familiar query-replace interface, which supports skipping, undo, history and more advanced features like Lisp expressions and recursive query edits. Most importantly however, five, none of them were designed with regular expressions in mind and instead only ever consider literal strings. In fact, the only one that comes close is the half-automated solution that invokes query-replace-regexp with a specially crafted replacement. As an example, here's how you would use this technique to perform a 3-element parallel regex replacement. It uses the backslash-comma Lisp expression feature in order to choose the appropriate target to substitute. Aside from being very clumsy and tedious to write out, this approach makes it really hard to use more complex regular expressions that make use of capture groups themselves. This was the biggest limitation that we wanted to get rid of and the main motivation for our work. So, as an alternative to the existing zoo of 80% solutions, we aim to provide a 100% solution, one that handles regexes and consolidates all of the existing ideas into a single package.
[00:06:29.080] Solution: query-replace-parallel
We call it query-replace-parallel. The package is free and open-source and can currently be found on GitHub under hokomo/query-replace-parallel. The name is not yet finalized and we're open to any suggestions. We hope to get it published on an Elisp package archive in the near future, but for now you can just download and load the main Elisp file manually. With all of that said, let's go through a few demos to illustrate some use cases and see how to use the package.
[00:06:55.240] Demonstration: Swap
Our first demo is a simple swap, like the one we showed at the beginning of the presentation. This chunk of text is actually one of the tests from our package's code. Assuming we have loaded the package, we can execute the query-replace-parallel command, a parallel version of the standard query-replace. This command works with literal strings and will ask for each source and target in turn. Our goal is to replace "foo" with "bar" and "bar" with "foo". After inputting our replacements, we terminate the prompt by pressing enter with empty input. At this point, everything functions the same as in a standard query-replace invocation. The echo area shows the match and the replacement we're about to make. We can perform replacements, undo them, skip them, execute them until the end, and so on.
[00:07:53.970] Demonstration: LaTeX
The second demo shows our first regex use case. Imagine we have the following LaTeX code. We realize that we haven't been completely consistent in our use and naming of macros, so we decide to fix the problem. This time we execute query-replace-parallel-regexp because we want to work with regex instead of literal strings. We want to achieve two things. First, we want to wrap all usages of the variable n with the natvar macro. Using the backslash-less-than and blackslash-greater-than constructs allows us to only match letters n not appearing as part of a larger word. Second, we want to rename natvar to intvar because the variables a, b and c are integers and not natural numbers. We enter empty input to terminate the prompt and can now perform the replacements. There we go, the fixes are done and we didn't have to think about in which order to apply them.
[00:08:48.700] Demonstration: Regex
We now take a look at a more complicated regex example to demonstrate that even advanced query-replace features are supported. Each "foo" and "bar" in this example is followed by a number. The goal is to not only swap "foo" and "bar", but also increase or decrease the corresponding number. We first match "foo" and capture the number that follows it. For the target, we make use of the backslash-comma Lisp expression feature in order to replace the match with "bar" followed by the number's successor. We do the same thing for "bar", except that we replace the number with its predecessor. Performing the replacements, we can see how each number is incremented or decremented appropriately.
[00:09:36.320] Demonstration: Order
We haven't covered it explicitly so some of you may be wondering how parallel replacement deals with overlapping matches and whether the order of the replacement pairs is significant. This demo will clarify the exact behavior. The first example has the sources "watch" and "stopwatch". Conceptually, the matches overlap, but the rule is that matches are always processed earliest first, regardless of their length or the ordering of the pairs. Therefore it is "stopwatch" that gets replaced, and not its substring "watch". The second example uses the sources "watch" and "watchword". Both of the matches now conceptually start at the same position. In situations like these the order of the pairs does matter, and ties are broken by prefering the pair that was entered first, which is behavior that is inherited from the Elisp regex engine. So, the substring "watch" in "watchword" is what gets replaced in this case. Situations where the order of the pairs is significant are not very common however, so the user generally doesn't have to worry about this edge case. The order only matters when two or more sources share the same prefix, as in this example.
[00:10:54.440] Demonstration: Fun
The final demo tests the limits of the package and shows that it fully integrates with query-replace. It is really just for fun and can even serve as a small Emacs brainteaser. See if you can keep up! We open a directory and enter Writable Dired mode in order to rename the directories "foo" and "bar". Instead of doing it quickly by hand, we decide to show off and use query-replace-parallel-regexp. We enter our pairs and make use of the backslash-question-mark query edit feature. Now whenever we perform a replacement, the query edit makes Emacs stop and prompt us for additional input to use as the target. We confirm the renames and now enter the "bar-lib" directory in order to perform the same kind of replacement on "baz" and "quux". Rather than save time, we decide to be extra lazy and take the long route. We recall the first pair and initiate a recursive invocation of query-replace-parallel-regexp. We are now replacing the replacement. We apply our fixes and then do the same thing again with the second pair. Recall and recurse. We confirm the prompt and finally rename our directories. Wow, that really paid off.
[00:12:29.120] Implementation
Before we finish, a few quick words about the implementation for the curious. Both query-replace-parallel and query-replace-parallel-regexp delegate to the complex perform-replace function which is the workhorse of query-replace's interactive mechanism. The way we achieve multiple interleaved replacements is by providing perform-replace with a big "matcher regex" and a special replacement function. Essentially, a complex parallel replacement like this is transformed into a standard replacement like this. This is similar to the trick shown earlier in the presentation. Each source is put in its own capture group to allow the replacement function to determine which one matched and return the appropriate target. However, we now take care to support arbitrary regular expressions as sources. We achieve this by converting each source regex into an equivalent one for which we can guarantee that its capture groups will not clash with our matcher regex. Information about this conversion is stored, and once the replacement function is called it has enough data to apply the replacement from the viewpoint of the original regex. The regex transformation is reliable because it uses the rx library, allowing us to treat regexes as s-expressions and avoid any nasty manual parsing. In fact, rx itself is based on one of Olin Shivers' 100% solutions: SRE, or the S-expression regex notation. We all stand on the shoulders of many giants, so let's strive to design good solutions that we can all benefit from, many years into the future! Finally, because query-replace's core is not completely customizable, we did have to sprinkle in some advice to get certain things working. This concerns only minor cosmetic fixes and not the core replacement functionality, but we have nontheless tried to do it in the simplest and least intrusive way possible. In conclusion, go download and play with the package. Even if you're not performing overlapping replacements, you can still use query-replace-parallel for the peace of mind knowing that things won't go wrong if you perform more than one replacement at a time. Feel free to let us know about any interesting or crazy use cases you might come up with, as well as improvements or bugs that make it only a 99% solution. Thanks for listening and have a great EmacsConf!

Q&A transcript (unedited)

Hello again, everyone. And hi, Lovro. How are you doing? technically in the backstage right now you're just in Big Blue Button with us. thought there were 2 different rooms. 1 is the backstage and the other, stream, so don't worry too much about which is the backstage and which is the front page. great. Okay, yeah. Yeah, I'm doing great, just to answer your question. well splendid. So, I've pasted a link again on IRC if you want to ask your questions, and I'd invite you to do so, because we have about 9 minutes of laborious time to answer as many of them as possible. And I'm going to start with the first 1. This looks great and was very well-presented. Do you have plans to upstream this functionality into Emacs? thought about as well. Currently, we haven't really contacted anyone to do this. Also, the current implementation, so as I mentioned in the presentation towards the end, so we use a little bit of advice to sort of patch some functionality of query replace because not everything was easy to implement. The core functionality luckily was, But there's a couple of fixes we need to apply to the message function in order to display a nice message in the echo buffer because this doesn't happen on its own when we're using this trick with this big regex and whatnot. So I don't think that the code as it is would be upstreamable. I think probably if we wanted to upstream it, we would have to do some proper work on refactoring query place itself in order to integrate all of this functionality just directly without any patching left and right. But yeah, definitely something I've given some thought, but so far no progress on it. I haven't actually started doing anything about it. you developed the feature and then you moved on to the presentation or did you want to do a presentation for EmacsConf and then you worked on something like this? Which was it first, the chicken or the egg? So this is a problem I've been aware of for, I mean, probably a couple of years. And, you know, I talked to my friend Valentino about it and we had like a little discussion, you know, how would we do this? And then I remember back when I was researching about this problem and the various Emacs Lisp solutions, all I could find were these solutions that would, you know, just shy away from implementing the RegEx case, which is a really complicated 1. And, after some discussion, my friend and I decided, okay, what the hell? Let's, let's try and implement this. How hard can it be? And yeah, basically in 1 afternoon, the idea, our little trick and the whole implementation was born. And then I think that was maybe around a year ago, maybe a bit less. And then through the months, we just thought, oh yeah, maybe we could present this, maybe it would be interesting for people to see and that's how we came up with the idea to present at EmacsConf. questions. So people, it's nice if I ask questions but you know, the point is kind of for you to ask the questions. I see someone who's joined us on BBB. Peter, would you like to ask a question maybe? Otherwise I see another person writing a question on the pad, so we can either move for this 1. So I'll leave Peter to figure out if they want to ask a question. So I'm moving on to the next question. and you really clearly laid out the problem and the solution there. While I was watching it, I was thinking maybe the nice way to name it is just to name it query replace and query replace regext, you know, overloading the original functions and then using a prefix number, like control number to indicate how many replacements you're going to do. But maybe that doesn't work with the recursive editing stuff, which I don't use much. So I don't have a good method. Well, the question is, if we just overwrite the definitions, then, oh, well, I guess we could do that. Nothing stops us. I mean, we're in Emacs. We could definitely do that. And then if you give, like, a prefix argument, maybe it just drops you back to the original query replace. Yeah, that's an idea. For now, we decided, OK, let's just keep everything explicitly separate just to avoid any confusion. for now. What I'm actually thinking is that when you do query replace, it just does the regular query replace. And if you're going to do, say, 3 parallel replacements, then you do Control-U, query replace. Sorry. Control-3, query replace. And then that way you don't have The final prompt that you give nothing to. I think I like that. Yeah, that's not a bad idea. argument or to use the universal argument. When you're working with Emacs and especially the UX side of things in the package, it's so complicated to figure out which 1 you want to do. In this particular case, I think it's the better option to use the universal argument or any kind of argument with a control number before. All right, we have about 3 more minutes of questions. Peter, if you don't mind, I'll keep reading the questions in the chat. Did you use pair programming while developing it, it being a package, or did you work independently, alternating and reviewing with Valentino? thing. So if I remember correctly, I was sitting at the computer and Valentino was in front of a whiteboard and we were just dissecting this regex and a bunch of examples and trying to get these capture groups and stuff that we have to remap internally to get these offsets right and avoid off by 1 error and stuff like that. So yeah, definitely a team effort. What is your background in programming? Was it difficult to implement following the same API and architecture as what is already in Emacs? Both Valentino and I are actually PhD students in computer science, and we literally share an office. So that's how we even started talking about this whole thing. And we both use Emacs, of course. But I don't think this was too hard to implement because luckily all of the interactive functionality like this complicated undo, skipping, execute until the end and so on, all of this is really just already provided by the Emacs queer replace implementation. So sort of what we do is we just invoke it as a function and delegate to it. And we came up with this clever trick to basically delegate this multi-replacement to this 1 single function that's already there. So it wasn't too complicated. for the last question. What did you learn about Emacs programming or programming in general while working on this project? A very wide question for me. previous just answer is I don't want to say like you know we're PhDs, a PhD is required for this or anything, not at all. It's mostly just for a little bit of context, but I think obviously, even if you're not a PhD, I mean, you don't even require like university, you know, education or anything. It wasn't overly difficult to implement, sort of just read some code that's already there and you know follow what you see and poke Emacs a little bit and do a little bit of debugging on the internals and you can definitely get it. So definitely not a prerequisite to have a degree or anything to do any of this stuff. Okay so Coming back to question because we only have 1 minute. So just 1 thing in 10 seconds, Emacs programming? That Emacs is so flexible that I can go and I can patch literally its message function. And that is how we achieve the nice message function in the echo buffer. So I can literally go and patch something as crucial as message. And I think, again, we're going back to the philosophy of Emacs. Everything is programmable and even changing the message function is great. All right, well, thank you so much, Lovro, and thanks to Valentino as well, who's not here, but who's contributed to this talk. Any last word? solutions, try to make them as foolproof and as 100% as possible so we get more of these goodies that are nice and robust for everybody to use. thank you so much, Lover, for your presentation and your answer. We'll be moving on to the next talk in just about 5 seconds, and I'll see you after. Bye, Lovro! little slow. Okay, we switch to the next talk. All right, Lover, I'm gonna need to go get ready now. Yep. Bye-bye, and thanks for your talk.

Questions or comments? Please e-mail hokomo@disroot.org