We present our Emacs package for performing parallel text
replacement.
"Parallel" in this context does not refer to improving
efficiency through parallelism, but to the concept of performing
more than one text replacement without them interfering with
each other. This is in line with the usage of the term in the
Lisp community when contrasting the behaviors of LET and LET*,
SETQ and PSETQ, etc. (e.g.
http://www.lispworks.com/documentation/lw60/CLHS/Body/s_let_l.htm).
We will present the package's features and its integration with
Emacs' query-replace system, a comparison with previous
solutions, and a few notes on our implementation. We will
describe some common use-cases and showcase how the package is
used.
The package is currently not yet published in a package archive,
but the code is already publicly available at
https://github.com/hokomo/query-replace-parallel. The name
"query-replace-parallel" is not yet final and we are thinking of
alternatives. Our current best candidate is "replace-parallel"
(similar to the built-in "replace.el"), but suggestions are
welcome.
Discussion
Questions and answers
Q: This looks great, and was very well-presented. Do you have plans
to upstream this functionality into Emacs?
A: Would require some refactoring upstream, so not suitable for
upstreaming as-is.
Q: Did you use pair-programming while developing it, or did you work
independently, alternating and reviewing?
A: Yes, we did! I was at the keyboard, Valentino was at the
whiteboard, and we kept bouncing ideas back and forth, trying
out prototypes, coming up with various tests, checking the edge
cases, etc.
Q: What is your background in programming? Was it difficult to
implement following the same API and architecture as what is already
in Emacs?
A: Both Valentino and I are PhD students in computer science,
but a PhD or similar is definitely not a requirement. It wasn't
too difficult because we could reuse the interactive
functionality from query-replace's internals. Figuring out what
and how to reuse is what took a bit of creativity, but a lot of
the necessary knowledge for that came from just reading and
poking around Emacs' replace.el. Don't be afraid to go and
read the source!
Q: What did you learn about Emacs programming or programming in
general while working on this project?
A: That Emacs is so flexible that you can even advise its
message function. Similarly, being able to prototype
functionality so quickly and immediately integrate it into the
rest of Emacs is so fun and so satisfying!
Notes
One usecase could be character names in a novel manuscript, if one
has named a character and want to now rename it to some other
character names or swap it with another one.
Nice, I was wondering if it utilized rx
package installed, ready to use!
excellent talk, and also such a cool package
great talk, very clever concept
that SRE "paper" you linked to is interesting
just saw the "Parallel Text Replacement" talk - 👏 great talk!
Hi everyone!Welcome to our talk on Parallel Text Replacement.My name is Lovro, and I'll be telling you about aninteresting problem that my friend Valentino and Iset out to solve one afternoon.We will describe the problem, take a look at someof the existing work and then present our solution.Afterwards, we will show some demos and concludewith a quick overview of the implementation.Let's get straight into it!
Here is a problem that most of us have dealt withat some point.Assume we have a piece of code such as the following.We use a code example here, but in general what we'reabout to discuss can be applied to any piece of text.After a bit of thinking, we decide that the names ofthe two variables, "foo" and "bar", should actually beswapped.That is, "foo" should be replaced with "bar", and "bar"should be replaced with "foo".The question is: what is a good way to achieve this?We could perform the edits manually if the code issmall enough, and we might even be done reasonablyquickly.However, consider two things.Imagine the usual case where there's just too muchcode to edit by hand.We have no other option than to automate the task.More importantly though, we have a whole programmabletext editor right at our fingertips.We should object to doing things that the computercan do for us.
So, one way to automate it is by using our old friendquery-replace (M-%) multiple times in a sequence.We first do a pass where we replace "foo" with "bar",then we do another pass where we replace "bar" with "foo".But that's clearly not right.We all know that this naive multi-pass approachdoesn't work because it results in interferencebetween the two replacements.
Instead, we have to be a bit more clever.We should first replace "foo" with a temporary string,in this case "oof", that we will call a "token".To avoid interference, we must be careful to ensurethat the token does not contain whatever we're aboutto replace next.Then we do a second pass to replace "bar" with "foo",and finally a third pass to replace the token with "bar".This gives us the result we want.
Putting the implementation aside for a moment, this styleof text replacement, where we replace multiple sourceswith their targets, without running into interferenceissues between replacement pairs, is what we calla "parallel replacement".This is the essence of the problem we're trying to solve.The examples with swapping that we've shown so farare really just one of the many use cases that aresupported by a general parallel replacement utility.To avoid confusion, let us clarify that the word "parallel"is not in reference to hardware parallelization, butrather comes from analogy with the Lisp let operator,where the bindings of variables are performed in parallel,rather than sequentially as in let*.Parallel in this context means that none of the bindingsare in scope within any of the initial value forms.In other words, just like a let's initialization formcannot refer to any of the earlier bindings, areplacement pair's source should not be able to replacethe previously substituted targets of any other pair.This is what we mean by "no interference".
However, manually invoking multiple carefully chosenquery-replace commands gets old very quickly.Say we scaled up the problem and wanted to perform nswaps instead of just two, e.g. to swap, or rather,rotate, "foo" to "bar", "bar" to "baz", "baz" to "quux"and "quux" to "foo".We would first have to perform n - 1 additionalreplacements to introduce the necessary tokens,effectively doubling the number of steps.Even if we tried to automate this, think about whattokens the code would have to generate if we had noprior knowledge of the replacement pairs given by theuser.We would have to program defensively and use longrandomly-generated strings that, one, hopefully donot interfere with any of the replacement pairs,and two, might slow down the search if they're overly long.Can we do better?
Yes we can!We can actually perform just a single pass.The trick is to alternate between the replacementpairs, replacing whichever source occurs the earliest,and making sure to continue scanning after the endof the substituted target in order to avoid interference.This interleaving of replacements is not somethingthat's easy to do by hand with query-replace.
Since this is Emacs we're talking about, of coursethere already exist solutions that implement this idea.Here are few that we could find.The EmacsWiki has a page dedicated to this problem.Stack Overflow has an old post where a couple ofusers provided their solutions.Mastering Emacs also gives a method along with otherinteresting query-replace-regexp (C-M-%) patterns.More recently, Tony Zorman made a blogpost providinga solution with an interface based on query-replace.I encourage you to take a look at these solutions ifyou're interested in the details.But while a step in the right direction, these solutionsare not satisfactory because they all lack one ormore of the following.One, they are not completely automated and requirethe user to come up with a relatively complicatedand verbose query-replace-regexp invocation.Two, they are restricted to performing only 2-elementswaps rather than general parallel replacements.Three, they don't provide any sort of interactivityduring replacement and instead perform it in one shot.Four, they don't attempt to integrate with the familiarquery-replace interface, which supports skipping, undo,history and more advanced features like Lisp expressionsand recursive query edits.Most importantly however, five, none of them weredesigned with regular expressions in mind and insteadonly ever consider literal strings.In fact, the only one that comes close is thehalf-automated solution that invokes query-replace-regexpwith a specially crafted replacement.As an example, here's how you would use this techniqueto perform a 3-element parallel regex replacement.It uses the backslash-comma Lisp expression featurein order to choose the appropriate target to substitute.Aside from being very clumsy and tedious to write out,this approach makes it really hard to use more complexregular expressions that make use of capture groupsthemselves.This was the biggest limitation that we wantedto get rid of and the main motivation for our work.So, as an alternative to the existing zoo of 80% solutions,we aim to provide a 100% solution, one that handlesregexes and consolidates all of the existing ideasinto a single package.
We call it query-replace-parallel.The package is free and open-source and can currentlybe found on GitHub under hokomo/query-replace-parallel.The name is not yet finalized and we're open to anysuggestions.We hope to get it published on an Elisppackage archive in the near future, but for now youcan just download and load the main Elisp file manually.With all of that said, let's go through a few demosto illustrate some use cases and see how to use the package.
Our first demo is a simple swap, like the one weshowed at the beginning of the presentation.This chunk of text is actually one of the testsfrom our package's code.Assuming we have loaded the package, we can executethe query-replace-parallel command, a parallel versionof the standard query-replace.This command works with literal strings and willask for each source and target in turn.Our goal is to replace "foo" with "bar"and "bar" with "foo".After inputting our replacements, we terminate theprompt by pressing enter with empty input.At this point, everything functions the same as ina standard query-replace invocation.The echo area shows the match and the replacementwe're about to make.We can perform replacements,undo them,skip them,execute them until the end,and so on.
The second demo shows our first regex use case.Imagine we have the following LaTeX code.We realize that we haven't been completely consistentin our use and naming of macros, so we decide tofix the problem.This time we execute query-replace-parallel-regexpbecause we want to work with regex instead of literalstrings.We want to achieve two things.First, we want to wrap all usages of the variable nwith the natvar macro.Using the backslash-less-than and blackslash-greater-thanconstructs allows us to only match letters n notappearing as part of a larger word.Second, we want to rename natvar to intvar becausethe variables a, b and c are integers and not naturalnumbers.We enter empty input to terminate the prompt and cannow perform the replacements.There we go, the fixes are done and we didn't haveto think about in which order to apply them.
We now take a look at a more complicated regexexample to demonstrate that even advanced query-replacefeatures are supported.Each "foo" and "bar" in this example is followed bya number.The goal is to not only swap "foo" and "bar", butalso increase or decrease the corresponding number.We first match "foo" and capture the number thatfollows it.For the target, we make use of the backslash-commaLisp expression feature in order to replace thematch with "bar" followed by the number's successor.We do the same thing for "bar", except that wereplace the number with its predecessor.Performing the replacements, we can see how eachnumber is incremented or decremented appropriately.
We haven't covered it explicitly so some of you maybe wondering how parallel replacement deals withoverlapping matches and whether the order of thereplacement pairs is significant.This demo will clarify the exact behavior.The first example has the sources "watch" and "stopwatch".Conceptually, the matches overlap, but the rule isthat matches are always processed earliest first,regardless of their length or the ordering of the pairs.Therefore it is "stopwatch" that gets replaced,and not its substring "watch".The second example uses the sources "watch" and "watchword".Both of the matches now conceptually start at the sameposition.In situations like these the order of the pairs doesmatter, and ties are broken by prefering the pair thatwas entered first, which is behavior that is inheritedfrom the Elisp regex engine.So, the substring "watch" in "watchword" is what getsreplaced in this case.Situations where the order of the pairs is significantare not very common however, so the user generallydoesn't have to worry about this edge case.The order only matters when two or more sourcesshare the same prefix, as in this example.
The final demo tests the limits of the package andshows that it fully integrates with query-replace.It is really just for fun and can even serve as asmall Emacs brainteaser.See if you can keep up!We open a directory and enter Writable Dired modein order to rename the directories "foo" and "bar".Instead of doing it quickly by hand, we decide toshow off and use query-replace-parallel-regexp.We enter our pairs and make use of thebackslash-question-mark query edit feature.Now whenever we perform a replacement, the queryedit makes Emacs stop and prompt us for additionalinput to use as the target.We confirm the renames and now enter the "bar-lib"directory in order to perform the same kind ofreplacement on "baz" and "quux".Rather than save time, we decide to be extra lazyand take the long route.We recall the first pair and initiate a recursiveinvocation of query-replace-parallel-regexp.We are now replacing the replacement.We apply our fixes and then do the same thing againwith the second pair.Recall and recurse.We confirm the prompt and finally rename our directories.Wow, that really paid off.
Before we finish, a few quick words about theimplementation for the curious.Both query-replace-parallel and query-replace-parallel-regexpdelegate to the complex perform-replace functionwhich is the workhorse of query-replace's interactivemechanism.The way we achieve multiple interleaved replacementsis by providing perform-replace with a big "matcher regex"and a special replacement function.Essentially, a complex parallel replacement like thisis transformed into a standard replacement like this.This is similar to the trick shown earlier in thepresentation.Each source is put in its own capture group to allowthe replacement function to determine which one matchedand return the appropriate target.However, we now take care to support arbitraryregular expressions as sources.We achieve this by converting each source regex intoan equivalent one for which we can guarantee that itscapture groups will not clash with our matcher regex.Information about this conversion is stored, andonce the replacement function is called it hasenough data to apply the replacement from theviewpoint of the original regex.The regex transformation is reliable because ituses the rx library, allowing us to treat regexesas s-expressions and avoid any nasty manual parsing.In fact, rx itself is based on one of Olin Shivers'100% solutions:SRE, or the S-expression regex notation.We all stand on the shoulders of many giants, solet's strive to design good solutions that we canall benefit from, many years into the future!Finally, because query-replace's core is not completelycustomizable, we did have to sprinkle in some adviceto get certain things working.This concerns only minor cosmetic fixes and not thecore replacement functionality, but we have nonthelesstried to do it in the simplest and least intrusive waypossible.
In conclusion, go download and play with the package.Even if you're not performing overlapping replacements,you can still use query-replace-parallel for thepeace of mind knowing that things won't go wrong ifyou perform more than one replacement at a time.Feel free to let us know about any interesting orcrazy use cases you might come up with, as well asimprovements or bugs that make it only a 99% solution.Thanks for listening and have a great EmacsConf!
Q&A transcript (unedited)
Hello again, everyone.And hi, Lovro. How are you doing?technically in the backstage right now you'rejust in Big Blue Button with us.thought there were 2 different rooms.1 is the backstage and the other,stream, so don't worry too much about whichis the backstage and which is the front page.great. Okay, yeah. Yeah,I'm doing great, just to answer yourquestion.well splendid. So, I've pasted a link againon IRC if you want to ask your questions,and I'd invite you to do so,because we have about 9 minutes of laborioustime to answer as many of them as possible.And I'm going to start with the first 1.This looks great and was very well-presented.Do you have plans to upstream thisfunctionality into Emacs?thought about as well.Currently, we haven't really contacted anyoneto do this. Also, the current implementation,so as I mentioned in the presentation towardsthe end, so we use a little bit of advice tosort of patch some functionality of queryreplace because not everything was easy toimplement. The core functionality luckilywas, But there's a couple of fixes we need toapply to the message function in order todisplay a nice message in the echo bufferbecause this doesn't happen on its own whenwe're using this trick with this big regexand whatnot. So I don't think that the codeas it is would be upstreamable.I think probably if we wanted to upstream it,we would have to do some proper work onrefactoring query place itself in order tointegrate all of this functionality justdirectly without any patching left and right.But yeah, definitely something I've givensome thought, but so far no progress on it.I haven't actually started doing anythingabout it.you developed the feature and then you movedon to the presentation or did you want to doa presentation for EmacsConf and then youworked on something like this?Which was it first, the chicken or the egg?So this is a problem I've been aware of for,I mean, probably a couple of years.And, you know, I talked to my friendValentino about it and we had like a littlediscussion, you know, how would we do this?And then I remember back when I wasresearching about this problem and thevarious Emacs Lisp solutions,all I could find were these solutions thatwould, you know, just shy away fromimplementing the RegEx case,which is a really complicated 1.And, after some discussion,my friend and I decided,okay, what the hell? Let's,let's try and implement this.How hard can it be? And yeah,basically in 1 afternoon,the idea, our little trick and the wholeimplementation was born.And then I think that was maybe around a yearago, maybe a bit less.And then through the months,we just thought, oh yeah,maybe we could present this,maybe it would be interesting for people tosee and that's how we came up with the ideato present at EmacsConf.questions. So people, it's nice if I askquestions but you know,the point is kind of for you to ask thequestions. I see someone who's joined us onBBB. Peter, would you like to ask a questionmaybe? Otherwise I see another person writinga question on the pad,so we can either move for this 1.So I'll leave Peter to figure out if theywant to ask a question.So I'm moving on to the next question.and you really clearly laid out the problemand the solution there.While I was watching it,I was thinking maybe the nice way to name itis just to name it query replace and queryreplace regext, you know,overloading the original functions and thenusing a prefix number,like control number to indicate how manyreplacements you're going to do.But maybe that doesn't work with therecursive editing stuff,which I don't use much.So I don't have a good method.Well, the question is,if we just overwrite the definitions,then, oh, well, I guess we could do that.Nothing stops us. I mean,we're in Emacs. We could definitely do that.And then if you give, like,a prefix argument, maybe it just drops youback to the original query replace.Yeah, that's an idea. For now,we decided, OK, let's just keep everythingexplicitly separate just to avoid anyconfusion.for now. What I'm actually thinking is thatwhen you do query replace,it just does the regular query replace.And if you're going to do,say, 3 parallel replacements,then you do Control-U,query replace. Sorry. Control-3,query replace. And then that way you don'thave The final prompt that you give nothingto.I think I like that. Yeah,that's not a bad idea.argument or to use the universal argument.When you're working with Emacs and especiallythe UX side of things in the package,it's so complicated to figure out which 1 youwant to do. In this particular case,I think it's the better option to use theuniversal argument or any kind of argumentwith a control number before.All right, we have about 3 more minutes ofquestions. Peter, if you don't mind,I'll keep reading the questions in the chat.Did you use pair programming while developingit, it being a package,or did you work independently,alternating and reviewing with Valentino?thing. So if I remember correctly,I was sitting at the computer and Valentinowas in front of a whiteboard and we were justdissecting this regex and a bunch of examplesand trying to get these capture groups andstuff that we have to remap internally to getthese offsets right and avoid off by 1 errorand stuff like that. So yeah,definitely a team effort.What is your background in programming?Was it difficult to implement following thesame API and architecture as what is alreadyin Emacs?Both Valentino and I are actually PhDstudents in computer science,and we literally share an office.So that's how we even started talking aboutthis whole thing. And we both use Emacs,of course. But I don't think this was toohard to implement because luckily all of theinteractive functionality like thiscomplicated undo, skipping,execute until the end and so on,all of this is really just already providedby the Emacs queer replace implementation.So sort of what we do is we just invoke it asa function and delegate to it.And we came up with this clever trick tobasically delegate this multi-replacement tothis 1 single function that's already there.So it wasn't too complicated.for the last question.What did you learn about Emacs programming orprogramming in general while working on thisproject? A very wide question for me.previous just answer is I don't want to saylike you know we're PhDs,a PhD is required for this or anything,not at all. It's mostly just for a little bitof context, but I think obviously,even if you're not a PhD,I mean, you don't even require likeuniversity, you know, education or anything.It wasn't overly difficult to implement,sort of just read some code that's alreadythere and you know follow what you see andpoke Emacs a little bit and do a little bitof debugging on the internals and you candefinitely get it. So definitely not aprerequisite to have a degree or anything todo any of this stuff. Okay so Coming back toquestion because we only have 1 minute.So just 1 thing in 10 seconds,Emacs programming? That Emacs is so flexiblethat I can go and I can patch literally itsmessage function. And that is how we achievethe nice message function in the echo buffer.So I can literally go and patch something ascrucial as message.And I think, again, we're going back to thephilosophy of Emacs. Everything isprogrammable and even changing the messagefunction is great. All right,well, thank you so much,Lovro, and thanks to Valentino as well,who's not here, but who's contributed to thistalk. Any last word?solutions, try to make them as foolproof andas 100% as possible so we get more of thesegoodies that are nice and robust foreverybody to use.thank you so much, Lover,for your presentation and your answer.We'll be moving on to the next talk in justabout 5 seconds, and I'll see you after.Bye, Lovro!little slow. Okay, we switch to the nexttalk. All right, Lover,I'm gonna need to go get ready now.Yep. Bye-bye, and thanks for your talk.