Enhancing productivity with voice computing
Blaine Mooers (he/him/his) - Pronunciation: pronounced like "moors", blaine-mooers(at)ouhsc.edu, https://basicsciences.ouhsc.edu/bmb/Faculty/bio_details/mooers-blaine-hm-phd, https://twitter.com/BlaineMooers, https://github.com/MooersLab, https://codeberg.org/MooersLab, mastodon(at)bhmooers
Format: 19-min talk; Q&A: BigBlueButton conference room
Status: TO_INDEX_QA
Talk
Duration: 18:49 minutes00:00.000 Introduction 00:37.400 Three activities in voice computing 01:02.560 Talk is not about ... and about ... 01:53.520 Motivations 03:33.240 Data 03:58.680 Voice In in the Chrome Store 04:25.628 Works in web pages with text areas 05:16.880 Built-in commands in Voice In Plus 06:41.740 Common errors made by Voice In 08:14.760 Custom speech-to-text commands 09:59.420 Custom speech-to-commands 10:37.540 Introducing Talon Voice 12:28.400 Talon GUI 14:02.540 Talon file with web scope 15:34.015 Terminals on remote and virtual machines 16:52.500 Recommendations 18:17.720 Acknowledgements
Q&A
Description
Help wanted: Q&A could be indexed with chapter markers
The Q&A session for this talk does not have chapter markers yet. Would you like to help? See help with chapter markers for more details. You can use the vidid="voice-qanda" if adding the markers to this wiki page, or e-mail your chapter notes to emacsconf-submit@gnu.org.
(If you want to work on this and you think it might take you a while, you can reserve this task by editing the page and adding volunteer="your-name date" or by e-mailing emacsconf-submit@gnu.org.)
Voice computing uses speech recognition software to convert speech into text, commands, or code. While there is a venerated program called EmacSpeaks for converting text into speech, an ``EmacsListens'' for converting speech into text is not available yet. The Emacs Wiki describes the underdeveloped situation for speech-to-text in Emacs. I will explain how two external software packages convert my speech into text and computer commands that can be used with Emacs.
First, I present some motivations for using voice computing. These can be divided into two categories: productivity improvement and health-related issues. In this second category, there is the underappreciated cure for ``standing desk envy''; the cure is achievable with a large dose of voice computing while standing.
I found one software package (Voice In) to be quite accurate for speech-to-text or dictation (Voice In Plus, https://dictanote.co/voicein/plus/), but less versatile for speech-to-commands. I have used this package daily, and I found a three-fold increase in my daily word count almost immediately. Of course, there are limits here; you can talk for only so many hours per day.
Second, I found another software package that has a less accurate language model (Talon Voice, http://talon.wiki/)) but that supports custom commands that can be executed anywhere you can place the cursor, including in virtual machines and on remote servers. Talon Voice will appeal to those who like to tinker with configuration files, yet it is easy to use.
I will explain how I have integrated these two packages into my workflow. I have developed a library of commands that expand 94 English contractions when spoken. This library eliminates tedious downstream editing of formal prose where I do not use contractions. The library is available on GitHub for both Voice In Plus (https://github.com/mooersLab/voice-in-plus-contractions) and Talon Voice (https://github.com/MooersLab/talon-contractions).
I also supply the interactive quizzes to master the basic Voice In commands (https://github.com/MooersLab/voice-in-basics-quiz) and the Talon Voice phonetic alphabet (https://github.com/MooersLab/talon-voice-quizzes/qTalonAlphabet.py) I learned the Talon alphabet in one day by taking the quiz at spaced intervals. The quiz took only 60 seconds to complete when I was proficient.
I store my daily writing in a multi-file LaTeX document with one tex file per day. 365 files are compiled into one PDF per year. This is usually about 1000 pages. I am not going to push my luck with a multiyear document. Each month is a chapter. The resulting PDF is a breeze to scroll and search. It has an autogenerated table of contents and an index. I have posted a blank version for 2023 and another for the upcoming year (https://github.com/MooersLab/diary2024inLaTeX) One could take a similar approach in org-mode by using Bastian Bechtold's org-journal package (https://github.com/bastibe/org-journal).
I gave a 60-minute talk on this topic to the Oklahoma Data Science Workshop 2023 Nov. 16 (https://mediasite.ouhsc.edu/Mediasite/Channel/python). This workshop meets once a month and is for people interested in data science and scientific computing. You do not have to be an Oklahoma resident to attend. Send me e-mail if you want to be added to our mailing list.
About the speaker:
I am an Associate Professor of Biochemistry at the University of Oklahoma Health Sciences Center. I use X-ray crystallography to study the structures of RNA, proteins, and protein-drug complexes. I have been using Python and LaTeX for a dozen years, and Jupyter Notebooks since 2013. I have been using Emacs every day for 2.5 years. I discovered voice computing this summer when my chronic repetitive stress injury flared up while entering data in a spreadsheet. I tripled my daily word count by using the speech-to-text, and I get a kick out of running remote computers by speech-to-command.
Discussion
Questions and answers
- Q: Comment there is a text to command thing called clipea that
would be awesome https://github.com/dave1010/clipea
- A: https://sourceforge.net/projects/sox/ also a good alternative.
- Q: Could you comment on how speaking vs. typing affects your
logic/content. Thanks!
A: I find that this is like the difference between writing your thoughts down on a blank piece of printer paper versus paper bound with a leather notebook. I do not think there has any real difference. I know that some people believe there is a solid certain difference but this is, for the purpose I am using this, for the purpose of generating the first draft, because my skills with the-- using my voice to edit my text is still not very well developed, I am still more efficient using the keyboard for that stage.
So the hardest part about writing generally is getting the first crappy draft written. I have found that dictation is perfectly fine for that phase. I find it actually very conducive for just getting the text out. The biggest problem that most of us have is applying our internal editor and that inhibits us from generating words in a free-flowing fashion.
I generally do my generative writing--actually, I divide my writing into two categories: generative writing (generating the first crappy draft) and then rewriting. Rewriting is probably 80-90% of writing where you can go back and rework the order of the sentences, order of paragraphs, the order of words in a sentence and so forth. It is really hard work that is best done later in the day when I am more awake. I do my generative writing first thing in the morning when I am feel horrible. That is when my internal editor is not very awake and I can get more words out more words past that gatekeeper. I can do this sitting down. I can do this standing up. I can do this 20 feet away from my computer looking out the window to get my eyes a break. I find it is just a very enjoyable to use it in this fashion. The downside is that I wind up generating three times as much text. That makes for three times as much work when it comes to rewriting the text, and that means I am using the keyboard a lot and later on in the day.
I have not made any progress on recovering from my own repetitive stress injury. I hope that I will add the use of voice commands, speech-to-commands, for editing the text in the future and I will eventually give my hands more of a break.
This allows you to actually separate those two activities not only by time... So many professional writers will spend several hours in the morning doing the generative part and then they will spend the rest of the day rewriting. They have separated this to activities temporally. What most people actually do is they they do the generative part and then they write one sentence, and they apply that internal editor right away because they want to write the first draft as a perfect version, as a final draft, and that is what slows them down dramatically.
This also allows you to separate these two activities in terms of modality. You are going to do the generative writing by Voice In, the rewriting by keyboard. I think this is like what most people... One way that many people can get into using speech-to-text in a productive way that sounds great...
- A: (not the author, just an audiance): So, for example, when
you're talking, you have an immense feeling of the topic you
have. You can close your eyes and do your body gestures to
manipulate a concept or idea, and you have... I just feel you
feel more creative than just tapping. Definitely you have much
more speed advantage over tapping, but more important thing is
you use your body as a whole to interact with those ideas.
[this one is done via voice...]
- but typing is definitely good for acturate control, such as M-x some-command ...
- Q: Have you tried the ChatGTP voice chat interface, if so how has
been your experience of it? As someone experienced with voice
control, interested to hear your thoughts, performance relative to
the open source tools in particular.
- A: I do not have much experience with that particular software. I have use Whisper a little bit, and so that is related. Of course, you have this problem of lag. I find that Whisper is good for spitting out a sentence maybe for a docstring and a programming file. I find that it is very prone to hallucinations. I find myself spending half my time deleting the hallucinations, and I feel like the net gain is diminished as a result, or there has not much of a net gain in terms of what I am getting out of it.
- Q: Are any of these voice command/dictions freemium?
- A: To be able to add custom commands, you have to pay $48 a year. The Talon Voice software is free and the only limitation there is access to the language model. If you want to get the beta version, you need to subscribe to Patreon to support the developer. I did that, and I really did not find much of an improvement. I really do not intend to do that in the future. But otherwise in Talon Voice, everything is open and free. The Slack community is incredibly welcoming. Its parallels with the Emacs Community are pretty striking.
- Q: How good is Talon compared to whisper?
- A: With Talon, I find that the first part of the sentence will be fairly accurate. When I am doing dictation and then towards the end, the errors... In general, I think its error rate is about five words out of 100 or so or will be wrong. Whisper is wonderful because it will insert punctuation for you, but I guess its errors are longer and that will hallucinate full sentences for you. So they both have significant error rates. They are just different kinds of errors. Hopefully, both over time... [Talon] errors are generally shorter in extent. It do not hallucinate as long.
- Q: are any of those voice command/dictation tools libre? i can not find that information on the web
- (not the speaker):
- this FAQ https://talon.wiki/faq/ says that Talon Voice is closed source
- talon voice is non-free https://talonvoice.com/EULA.txt
- Mistral 7B is apache 2.0 license i.e. no restrictions
- (not the speaker):
Notes
- From the speaker: I really appreciate the high level of accuracy that I am getting from Voice In. I would use Talon Voice for dictation, but at this point, there is a significant difference between the level of accuracy of Voice In versus Talon Voice. It's large enough of a difference that I'll probably use Voice In for a while until I can figure out how to get Talon Voice to generate more accurate text.
- When you do Org mode and you have the bullets, it can allows you to naturally shard your thoughts in a way that is really easy to edit. ... It has a summarizing capability. It allows you to you know pull back and get a overview.
- Great stuff, definitely going to test-drive Talon
Transcript
[00:00:00.000] Introduction
Hi, I'm Blaine Mooers. I'm an associate professor of biochemistry at the University of Oklahoma Health Sciences Center in Oklahoma City. My lab studies the role of RNA structure in RNA editing. We use X-ray crystallography to study the structures of these RNAs. We spend a lot of time in the lab preparing our samples for structural studies, and then we also spend a lot of time at the computer analyzing the resulting data. I was seeking ways of using voice computing to try to enhance my productivity.
[00:00:37.400] Three activities in voice computing
I divide voice computing into three activities, speech-to-text or dictation, speech-to-commands, and speech-to-code. I'll be talking about speech-to-text and speech-to-commands today because these are two activities that are probably most broadly applicable to the workflows of people attending this conference.
[00:01:02.560] Talk is not about ... and about ...
This talk will not be about Emacspeak. This is a venerated program for converting text to speech. We're talking about the flow of information in the opposite direction, speech-to-text. We need an Emacs Listens. We don't have one, so I had to seek help from outside the Emacs world via the Voice In Plus. This runs in the Google Chrome web browser, and it's very good for speech-to-text and very easy to learn how to use. It also has some speech-to-commands. However, Talon Voice is much better with the speech-to-commands, and it's also great at speech-to-code.
[00:01:53.520] Motivations
The motivations are, obviously, as I mentioned already, for improved productivity. So, if you're a fast typist who types faster than they can speak, then nonetheless you might still benefit from voice computing when you grow tired of using the keyboard. On the other hand, you might be a slow typist who talks faster than they can type. In this case, you're definitely going to benefit from dictation because you'll be able to encode more words in text documents in a given day. If you're a coder, then you may get a kick out of opening programs and websites and coding projects by using your voice. Then there are health-related reasons. You may have impaired use of your hands, eyes, or both due to accident or disease, or you may suffer from a repetitive stress injury. Many of us have this in a mild but chronic form of it. We can't take a three-month sabbatical from the keyboard without losing our jobs, so these injuries tend to persist. And then you may have learned that it's not good for your health to sit for prolonged periods of time with your staring at a computer screen. You can actually dictate to your computer from 20 feet away while looking out the window, thereby giving your lower body a break and your eyes a break.
[00:03:33.240] Data
I'm not God, so I have to bring data. I have two data points here, the number of words that I wrote in June and July this year and in September and October. I adopted the use of voice computing in the middle of August. As you can see, I got an over three-fold increase in my output.
[00:03:58.680] Voice In in the Chrome Store
So this is the Chrome store website for voice-in. It's only available for Google Chrome. You just hit the install button to install it. To configure it, you need to select a language. It has support for 40 languages and it supports about a dozen different dialects of English, including Australian.
[00:04:25.628] Works in web pages with text areas
It works on web pages with text areas, so it works. I use it regularly on Overleaf and 750words.com, a distraction-free environment for writing. It also works in webmails. It works in Google. It works in Jupyter Lab, of course, because that runs in the browser. It also works in Jupyter Notebook and Colab Notebook. It should work in Cloudmacs. I've mapped option-L to opening Voice In when the cursor is on a web page that has a text area. So [the presence of a text area is] the main limiting factor.
[00:05:16.880] Built-in commands in Voice In Plus
[Voice In] has a number of built-in commands. You can turn it off by saying "stop dictation". It doesn't distinguish between a command mode and a dictation mode. It has undo command. You use the command "copy that" to copy a selection. The "press" commands are used in the browser. You [say] "press enter" to issue a command or [submit] text that has been written in a web form, and then "press tab" will open up the next tab in a web browser. The scroll up and down will allow you to navigate a web page. I've put together a quiz about these commands so that you can go through this quiz several times until you get at least 90 percent of them correct, 90 percent of the questions correct. In order to boost your recall of the commands, I have a Python script that you can probably pound through the quiz with in less than a minute, once you know the commands. I also provide an Elisp version of this quiz, but it's a little slower to operate.
[00:06:41.740] Common errors made by Voice In
These are some common errors that I've run into with Voice In. It likes to contract statements like "I will" into "I'll". Contractions are not used in formal writing, and most of my writing is formal writing, so this annoys me. I will show you how I corrected for that problem. It also drops the first word in sentences quite often. This might be some speech issue that I have. It inserts the wrong word because it's not in the dictionary that was used to train it. So, for example, the word PyMOL is the name of a molecular graphics program that we use in our field. It doesn't recognize PyMOL. Instead, it substitutes in the word "primal". Since I don't use "primal" very often, I've mapped the word "primal" to "PyMOL" in some custom commands I'll talk about in a minute. Then there's a problem that the commands that exist might get executed when you speak them when, in fact, you wanted to use the words in those commands during your dictation. So this is a problem, a pitfall of Voice In, in that it doesn't have a command mode that's separate from a dictation mode.
[00:08:14.760] Custom speech-to-text commands
You can set up through a very easy-to-use GUI custom voice commands mapped to what you want inserted, so this is how misinterpreted words can be corrected. You just map the misinterpreted word to the intended word. You can also map the contractions to their expansions. I did this for 94 English contractions, and you can find these on GitHub. You can also insert acronyms and expand those acronyms. I apply the same approach to the first names of colleagues. I say "expand Fred", for example, to get Fred's first and last name with the [correct] spelling of his very long German name. You can also insert other trivia like favorite URLs. You can insert LaTeX snippets. It handles correctly multi-line snippets. You just have to enclose them in double quotes. You can even insert BibTeX cite keys for references that you use frequently. All fields have certain key references for certain methods or topics.
[00:09:59.420] Custom speech-to-commands
Then it has a set of commands that you can customize for the purpose of speech-to-commands to get the computer to do something like open up a specific website or save the current writing. In this case, we have "press: command-s" for saving current writing. You can change the language [with "lang:"], and you can change the case of the text [with "case:"].
[00:10:37.540] Introducing Talon Voice
But the speech-to-command repertoire is quite limited in Voice In, so it's now time to pick up on Talon Voice. This is an open source project. It's free. It is highly configurable via TalonScript, which is a subset of Python. You can use either TalonScript or Python to configure it, but it's easier to code up your configuration in TalonScript. It has a Python interpreter embedded in it, so you don't have to mess around with installing yet another Python interpreter. It runs on all platforms, and it has a dictation mode that's separate from a command mode. You can activate it, and it'll be in a listening state asleep. You just bark out "Talon Wake" to start to wake it up, and "Talon Sleep" to have it go into a listening state. It has a very welcoming community in the Talon Slack channel. Then I need to point out that there's several packages that others have developed that run on top of Talon, but one of particular note is by Pokey Rule. He has on his website some really well-done videos that demonstrate how he uses Cursorless to move the cursor around using voice commands. This, however, runs on VS Code. At least that's the text editor for which he's primarily developing Cursorless.
[00:12:28.400] Talon GUI
I followed the [install] protocol outlined by Tara Roys. She has a collection of tutorials on YouTube as well as on GitHub that are quite helpful. I followed her tutorial for installing Talon on macOS without any issues, but allow for half an hour to an hour to go through the process. When you're done, you'll have this Talon icon appear in the toolbar on the Mac. When it has this diagonal line across it, that means it's in the sleep state. So, this leads to cascading pull-down menus. This is it for the GUI. One of your first tasks is to select a language model that will be used to interpret the sounds that you generate as words. And the other kind of key feature is that there's a, under scripting, there's a view log pull-down that opens up a window displaying the log file. Whenever you make a change in a Talon configuration file, that change is implemented immediately. You do not have to restart Talon to get the change to take effect.
[00:14:02.540] Talon file with web scope
This is an example of a Talon file. It has two components. It has a header above the dash that describes the scope of the commands contained below the dash. Each command is separated by a blank line. If a voice command is mapped to multiple actions, these are listed separately on indented lines below the first line. The words that are in square brackets are optional. So, I have mapped the word toggle voice in, or the phrase toggle voice in, to the keyboard shortcut Alt L in order to toggle on or off voice in. If I toggle voice in on, I need to immediately toggle off Talon, and this is done through this key command for Control T, which is mapped to speech toggle. Speech toggle. Then there are, there's a couple other examples. So, if there's no header present, it's an optional feature of Talon files, then the commands in the file will apply in all situations, in all modes.
[00:15:34.015] Terminals on remote and virtual machines
Here we have two restrictions.
These commands will only work
when using the iTerm2 [ccc] terminal emulator for the Mac,
and then only when the title of the window in iTerm2
has this particular address,
which is what appears when I've logged into
the supercomputer at the University of Oklahoma.
One of the commands in this file is checkjobs.
It's mapped to an alias,
a bash alias called cj for "check jobs",
which in turn is mapped to a script called checkjobs.sh
that, when it's run, returns a listing
of the pending and running jobs on the supercomputer
in a format that I find pleasing.
This \n
after cj, the new line character,
enters the command, so I don't have to do that
as an additional step. Likewise,
here's a similar setup for interacting with
a Ubuntu virtual machine.
[00:16:52.500] Recommendations
In terms of picking up voice computing, these are my recommendations. You're going to run into more errors than you may like initially, and so you need some patience in dealing with those. And also, it'll take you a while to get your head wrapped around Talon and how it works. You'll definitely want to use these custom commands to correct the errors or shortcomings of the language models. And you've seen how, by opening up projects by voice commands, you can reduce friction in terms of restarting work on a project. You've seen how Voice In is preferred for more accurate dictation. I think my error rate is about 1 to 2 percent. That is, 1 to 2 out of 100 words are incorrect versus Talon Voice where I think the error rate is closer to 5 percent. I have put together [a library of English] contractions [and their expansion] for Talon [too], and they can be found here on GitHub. And I also have [posted] a quiz of 600 questions about some basic Talon commands.
[00:18:17.720] Acknowledgements
I'd like to thank the people who've helped me out on the Talon Slack channel and members of the Oklahoma Data Science Workshop where I gave an hour-long talk on this topic several weeks ago. I'd like to thank my friends at the Berlin and Austin Emacs Meetup and at the M-x research Slack channel. And I thank these grant funding agencies for supporting my work. I'll be happy to take any questions.
Q&A transcript (unedited)
The stream is here. So folks if you would please post your questions on the pad and we'll take them up here. Thank you. Thanks. little bit, I can provide a live demonstration of the use of this Voice In plugin for Google Chrome. So I have, let's see, say new sentence. I'm on a website that is called 750 words. It provides a text area where without any other distracting icons for the purpose of writing and I'm using it for the purpose of capturing my words that I'm dictating and I have enabled the Voice In plugin by hitting the option L command. New sentence. So it interpreted that command new sentence even though I didn't pronounce it correctly, which is a pretty good demonstration of its accuracy. New sentence. Oops, that didn't work. Undo. New sentence. So new sentence is a combination of 2 commands, period and new line. So I've found it more convenient just to say new sentence than having to say period and new line. You can see that it's able to keep up with most of my speech, and it has to interpret the sounds that I'm making and convert those into words, so there's always going to be a lag. New sentence. But I've found that I can generate about 2,000, up to 2,000 words an hour as I gather my thoughts and talk in my rather slow fashion of speaking. New sentence, if you're a really fast speaker, it might have trouble keeping up. New sentence. I like to write When I'm using the keyboard with 1 sentence per line, so that when I copy my text and paste it into Emacs, for example, I can resort the sentences very easily by just selecting 1 line at a time. I like to keep the sentences unwrapped in that fashion because that greatly eases the rewriting phase. And I'm almost have sort of a hybrid reverse outlining approach by doing that. New sentence. Looks like I have gotten ahead of it a bit and it has not kept up. But generally, it does keep up pretty well. Let's see. I think we have. Yeah, sorry. You can see that it has this EN means English and then dash US. There's actually about 40 languages that it supports, including several variants of German and about a dozen English dialects. comments and questions trickling in. So someone is saying that there is a text to command application or utility called Clipia, C-L-I-P-I-A, that they think is awesome. Clipia that they think is awesome. And someone else is also saying that Sox, S-O-X is another good alternative. So thank you very much for the suggestions. page here in the chat and on the big blue button if you'd like to open that up as well. But I'll continue reading the comments and questions. So the first question, I guess, is that could you comment on how speaking versus typing affects your logic or the content, quote unquote, that you write? between writing your thoughts down on a blank piece of printer paper versus paper bound with a leather notebook. I don't think there's any real difference. I know that some people believe there is a solid certain difference, But this is for the purpose, I'm using this for the purpose of generating the first draft because my skills with using my voice to edit my text is still not very well developed. I'm still more efficient using the keyboard for that stage. So the hardest part about writing generally is getting the first crappy draft written. And so I have found that dictation is perfectly fine for that phase. And I find it actually very conducive for just getting the text out. The biggest problem that most of us have is applying our internal editor. And that inhibits us from generating words in a free-flowing fashion. So I generally do my generative writing. So actually I divide my writing into 2 categories, generative writing, generating the first crappy draft, and then rewriting. Rewriting is probably 80, 90% of writing where you go back and rework the order of the sentences, order of paragraphs, the order of words in a sentence and so forth. The really hard work. That's best done later in the day when I'm more awake. I do my general writing first thing in the morning when I feel horrible. I'm not very alert. That's when my internal editor is not very awake and I can get more words out, more words past that gatekeeper. And so I can do this sitting down, I can do this standing up, I can do this 20 feet away from my computer looking out the window to give my eyes a break. So I find it's actually very enjoyable to use it in this fashion. And the downside is that I wind up generating 3 times as much text, and that makes for 3 times as much work when it comes to rewriting the text. And that means I'm using the keyboard a lot later on in the day and I haven't made any progress on recovering from my own repetitive stress injury. I hope that I will add the use of voice commands, speech to commands, for editing the text in the future. And I'll eventually give my hands more of a break. flow of sort of being able to get your words out while your internal editor is still not inhibiting things. And then later in the day or days, get back to the actual rewriting and editing. those 2 activities, not only by time. So many professional writers will spend several hours in the morning doing the generative part and then they'll spend the rest of the day rewriting. So they have separated those 2 activities temporally. What most people actually do is, you know, they do the generative part and then they write 1 sentence and they apply that internal editor right away because they want to write the first draft in a perfect, as a perfect version as the final draft And that slows them down dramatically. But this also allows you to separate these 2 activities in terms of modality. You're going to do the generative writing by voice and the rewriting by keyboard. So I think this is 1 way that many people can get into using speech to text in a productive way. Let's see. I think we have about 3 or 4 minutes live. So I think we have time for at least another question. Have you tried the chat GPT voice chat interface? And if so, how has been your experience of it? As someone experienced with voice control, interested to hear your thoughts, performance relative to the free software tools in particular? particular software. I have used Whisper a little bit. And so that's related. And of course you have this problem of lag so I find that it's a whisper is good for spitting out a sentence you know maybe for a doc string in a programming file. But I find that it's very prone to hallucinations. And I find myself spending half my time deleting the hallucinations, I feel like the net gain is diminished as a result. There's not much of a net gain in terms of what I'm getting out of it. Whereas I really appreciate the high level of accuracy that I'm getting from voice-in. I would use Talon Voice for dictation, but at this point, there's a significant difference between the level of accuracy of voice-in versus Talon voice. It's large enough of a difference that I'll probably use voice-in for a while until I can figure out how to get town voice to generate more accurate text. another 2 or 3 minutes. So if folks have any other questions Please feel free to post them on the pad and I'll check IRC now as well. Right, so I see 1 question on IRC asking, Are any of these voice command slash dictating dictation tools free Libre software? They cannot find that information Which I think is part of it. You just mentioned There's It's a freemium so The answer is no To be able to add the commands, the custom commands, you have to pay $48 a year. The Talon Voice software is free. And the only limitation there is access to the language model. If you want to get the beta version, you need to subscribe to Patreon to help support the developer. And I found, I did do that and I really didn't find much of an improvement. So I really don't intend to do that in the future. But otherwise, Town Voice, everything is open and free, and the Slack community is incredibly welcoming. The parallels with the Emacs community are pretty striking. I think we have about another minute on the live stream, but I believe the big blue button room here is open and will be open, So if folks want to join, if Blaine maybe has a couple of extra minutes. Awesome. Yeah, then you're welcome to join and chat with Blaine and ask any further questions or just do general chatting. Chatting. compared to Whisper? So with Talon, I find that the first part of the sentence will be fairly accurate and then when I'm doing dictation And then towards the end, the errors start to accumulate. So in general, I think it's error rate is about 5 words out of a hundred or so will be wrong. And whisper, Whisper is wonderful because it will insert punctuation for you. But I guess its errors are longer and that it'll hallucinate full sentences for you. So they both have significant error rates. They're just different kinds of errors. Right. Let's see. There's a question. Are the green block the author for this talk? Not sure what that question means. think being generated from voice to text, speech to text. At the top of the pad, I think that's the question. this GitHub, on this 750words.com site where I do my generative writing at the start of the day. And it just provides a text area that's free of distractions. And you can see the text that's being recorded as I talk. I haven't been saying the command new sentence, so there isn't any punctuation over our discourse. 1 thing that I do at the start of the day is I like to write in LaTeX. Ultimately, that's how I store my writing. So new sentence, new sentence. See, insert start day. So This is an example of a chunk of LaTeX code. So I have some reflections on, you know, what did I wake up this morning? And how do I feel? I have reflections on the prior day in terms of what did I get done yesterday? Do I remember what I did yesterday? What happened last night? Focus of today. What's to be done today? And so on. So I actually, I think I have more down here. Then I've set up these lists so that I can expand them easily. If I say item, then the cursor shows up at the start of an item. And I have it coded so that that new phrase that I speak will start with a capital letter. As you can see, so capitalize the word and. So in spite of its rather limited command syntax, There's some, it's enough to get started and maybe in the future, they'll add more features. you know, doing things like expanding the names of people. So you can do set up commands like expand the name of a colleague to go from their first name to their full name with a proper spelling of their last name, which, you know, you can wind up spending a lot of time trying to look that up. And so this voice in with the custom commands enables you to store hard to remember information like that. How good is Talon compared to Whisper? I think you might have answered that already, at least partially, but... Whisperer will carry out hallucinations, so it will generate long tracks of error, whereas Talon will tend to generate more errors towards the ends of sentences, in my experience. And the errors are generally shorter in extent. It doesn't hallucinate for long tracks. that we have on the pad. If folks want to join here on Big Blue Button for a few minutes and chat with Blaine, that also works. Let's see, I'm probably going to have to drop in a few minutes to catch the next speaker. But many thanks, Blaine, for a great talk and for the interesting demos and the question and answer. this conference with people from all around the world connected together through web browsers. if and when it's working correctly. times, but when it's working, it's wonderful. Yep.
[00:21:59.540] Start of section to review
computers run the same code, so that people, you know, a lot of people work on the same thing and build upon each other's works. For journaling I found 1 good compromise between editing and stream-of-thought journaling. 1 good compromise between editing and stream of thought journaling. 1 good compromise between editing and being able to do it again and just kind of helps me do my thoughts even when I do it is when you do org mode and you have the bullets it kind of allows you to naturally chart your thoughts in a way that's really easy to edit reorder I saw you kind of did that with your mac la tech macro where you said item and it would put you down to the next item. Does... How much do you do stuff like that? How much do you do stuff like that where you use like org mode headings and then you reorder them because like I did that with also the K outline from HyperBolt package for the for Emacs org mode later on after the so I have a lot of snippets for Org Mode. I could have Org Mode version of my insert start day snippet and carry things out in org mode. So I use org mode from time to time. I often use it for the purpose of writing readme files for projects to outline the purpose of the project, and say for a director that contains a coding project. And I think this would, so the main limitation of VoiceIn is it only works in a web page and you have to have an Internet connection, whereas Talon voice is perfect for something like org mode in that you don't need an internet connection and it will operate anywhere that you can place a cursor. I haven't found a place where it doesn't work. It's amazing. So as you saw my talk, perhaps You can run it in a terminal or a remote computer. You can run it in a virtual it will work. And so as you might imagine, if you use bash aliases, I've worked for, 1 of the first things I did was map Talend commands to bash aliases so that I can do all kinds of crazy things inside of the terminal. And there are, you know, there's some support already for using Talon in Emacs. There's some Emacs functionality that's built into Talon. So when you are in Emacs, there's some features that are automatically available. And then others have developed or are developing packages, which I don't think are available yet in ELPA. There's 1 that does the font locking or syntax highlighting of Talon files, and another that adds some additional functionality that I'm regrettably not yet familiar with. sharding of the thoughts, like let's say, oh, how has my day went? It's went good for reasons 123, and bad for reasons ABC. And then later on, I might think, oh, there's an, I also, my day went good for reasons 456, then you, I can, then you jump up. And so the, like I found like, yeah, the org mode subheadings, because you're able to jump around, easily reorder them after the fact, the very streamlined approach to the stream of thought and the editing. just because like, even when you're editing that in real time, like, oh, wait a minute, I thought of another reason that my day went good, even though I was talking about how it was going bad now. So you jump up. And then you do that. And then you have it. You easily summarize your thoughts and whatnot. ideal for that kind of interact. So yeah, I see your point in terms of that sort of a blend of generative writing and editing. And it's also kind of parallel to mind mapping. I use this mind mapping software called iThoughtsX where I'll generate all these children items, and then I'll drag them around and resort them. And they can have children of their own and grandchildren and so on, in terms of the levels of the nodes. And it's pretty much the same sort of thing with a nested hierarchy that you can have with org mode. I think having several alternate modes or modalities of playing with thoughts is useful. So sometimes I'll hit a wall and we're just not really generating anything in a text mode. But if I switch to using the mind mapping, just seeing it arranged with the connecting lines plays on a different part of the brain, I think, and it can be incredibly stimulatory. It can stimulate a lot of new too much with is the mind mapping software, but... have to it in Emacs is Orgrimm in the in terms of like the 3D visualization of with Orgrimm GUI or diagrams and stuff like that, I think those 2 things would allow you stuff like Orgrimm or denote And then the diagrams would be the good ways of doing that in Emacs, but they don't have the mind map programs as well. There are a couple mind mapping packages, but they're not as advanced. it that Emacs interacted with. Very well. And so they kind of, you know, worked around and had a little. Integration with the 2. So when you be jumping around your. When you'd be clicking on the web page it would be pointing you to different places and buffers okay like those are those the There's an like org-roam node program where it kind of shows the looks like a mind map. You can click and drag them a little bit, so it's a little interactive. I'll have to look into that. That sounds very interesting. though, than Org-ROM, so it doesn't. I want to be able to, I don't like the feeling of being trapped inside org-mode documents. Like I want to be able to write, even though I don't really use Markdown and I like org-mode better than that. Like for instance, I also use the Koutline from the Hyperbole package. That's what my I got a talk on the stream of thought journaling for with Koutline and I was like, I just don't like the feeling of being tracked in 1 document and denote has the ability to it renames the file so you get keywords in like a PDF file so you can take so you can link to that with your notes without it all disappearing because it's not an org mode document. Plus the ability of having it run on multiple computers or with multiple people, the database kind of gets screwed up when you try running it under sync thing. Sync. More fragile. How far are you? So are you a regular practitioner of the Zettelkasten approach? I partly work too much like testing out the org-roam versus the notes to use it too much. So part of it is I just tweak with it too much before using it and then. I know where they are. So whenever I do need them, I can use them, even though I don't always use them. room. Zettelkasten. I've actually, it's kind of cool that you can export it and move it into other programs. I have moved it to Obsidian and played with it in Obsidian for a while, maybe added to it in Obsidian, moved it back to Orgrim. But I'm not convinced. I mean, that I think that Nicholas Luhmann was very successful with it because he spent 5 hours a day or whatever working with it. And I think I would have to do, put in a similar amount of effort to get this kind of benefits that he gained from it. I'm waiting for somebody to do a scientific study, controlled trials to see, to prove whether there's a real benefit. one of the things where you have the 1 for the sections, and then the 1.1, or you know how the notes that it does that's different. The denote, it has the ability to use a hierarchy manage, which Org-ROM does everything it can to eliminate. But you can use them both in tandem. They call it signatures. And to me, 1 of the cool features of denote would be being able to use like the signatures for the things that make sense. Like 1 of the ideas is if you don't exactly know where this is, but you know, it goes to the section, you can just use the signature. Maybe don't even have too much of a file name. Like oh, this is just another thought on, well you wouldn't use it for this, but like my day went good for reasons 1, 2, 3, 4, 5, and you could just use the denote signature to do 1, 2, 3, 4, 5, just as you have new ideas on like a subject, or like cars are cars are not this car is nice because of reasons XYZ, or these types of four-wheelers are nice because of XYZ. And you could just keep on doing that rather than having to get a new name for each 1 of those files. Or you could choose not to have it, but the ability to have it optionally in, to me, sounds like a really nice combo. Because then you I've actually imposed a hierarchy in my Zettelkasten and Orgrim. I just, I can't imagine having random ideas. They need some kind of structure. Always have some kind of parent node to attach them to. it, part of it is I'm just trying to optimize the workflow before it feels really, really, really good, and I don't want to tweak with it, or I don't know. Or maybe I don't always need the tool, but some of the distinctions it seems like that I want is, I want a daily journal For your stream of thoughts, then I want a separate 1 for your to do list because what you like. You want very different properties for each of those. Like for to-do lists, you want hierarchical, limited. But if you have more than 3 priority items, you don't have a priority item and it's not a good to-do list. It's just unordered thoughts. most of those things done beyond the first 3. trying to do the other stuff, the stream of thoughts, all that stuff I probably don't want to go straight into like my Zettelkasten because some of those problems, like it's noisy, it might be redundant, you don't know how it fits into it because you haven't done that processing on it. This hasn't been refined. So, like, you don't want to refine it. Like, I find that spell checking is detrimental to me. I don't want spell checking. I don't want spell checking. I don't want syntax highlighting. I just want to talk or to just write. If I have mistakes, I can turn on that later, do it. Because otherwise, it will distract me and makes that process flow. you're doing the getting things done like that's why I want them would be want would want them in separate files is that you want them like ordered, numbered lists, smaller. And then with the other, with the stream of thought, with journaling, you'd want it just unordered. Thoughts land wherever they may. Maybe not even like machine-generated timestamps, So you don't even have to worry about the names of it, as an example. So yeah, very different properties for what you want for both of those modalities. had that at, you know, working on my to-do list at the start of the day, but in a certain sense that is not ideal time. I really haven't optimized the timing of assembly of the to-do list, I think, in retrospect. It's just by lifelong habit. I do that at the beginning of the day, but probably would be better to do it at night or the night before. And so you sort of prime your brain to go, just get up and go, go after those items. You were, you maybe you want to revise the items a little bit after sleeping on it, but after your subconscious has worked on those items. Do you have a daily routine that you follow in terms of generating those kind of lists? for this stuff when I want to do it. I enjoy building the scaffolding and I know where the tools are when I need it. And I start using them when I need it, but I don't have it too consistent. org-roam, and you're using k-outline. And are there other tools that you've explored? and nerd dictation to do What your talk was about? Speaking speech to text to see how that changes Because it does change what you think What you write down when you speak it rather than write it. Same thing as when you're thinking about when you eliminate the editing, it changes the way you write. When you have the spell checking, it changes the way you write to a much smaller degree. But that's the stuff I really haven't gotten working as well, or underdeveloped. I'll move it in. Often I move it into on Overleaf, this website for a lot of tech documents. I have a plug-in for Rightful, And I use that to clean up my word choices and some grammar. And I use Grammarly. I'll copy and paste. It just depends on the nature of the writing, how serious it is, how polished it has to be. If I, if it's really vital, like for a grant application or something, I'll paste that into Grammarly and work on trying to get the writing level to the lowest possible grade level to make it as clear as possible to as wide of an audience as possible. 1 of the things I kind is I kind of wish you could say, hey, what would the subtle cast in person think of what I wrote who what would einstein think of what I wrote because rather than just trying to make 1 uniform way of talking it's like people talk differently and that's an advantage and I can't I really wish like you maybe these GPT programs could do well. I really wish it could help you with the grammar, that maybe give you thoughts on what your notes are. What does this person think of your thoughts? What does this person think of your thoughts? Well, does this person think of your thoughts? Well, does this person think of your thoughts? even through chat GDP now. I haven't spent time trying that out. But I bet that capabilities are already. It would be nice if it was like built in to Emacs, right? It's a package. Yeah. That'd be very cool. like, the grammar where they help you the way you write. Like, for instance, removing redundant words. And Yeah, it's supposed to be like beyond just spell checking, right? package for Emacs, and you get some of the functionality out of it. I've paid for the subscription to get the advanced features, but I've maybe I don't have my configuration set up correctly. I just found it was easier to copy and paste a paragraph at a time into the desktop application and it will go through and find those redundancies, junk English. 1 of these That was my problem with a lot of the grammarly type Programs is I'm I want something that would do that like be real interesting seeing 1 that's like an old English type thing or like Lumen person where it's just like how does this person write and Because it would be it would spit out something a lot different. Just different. Like, yeah, you put different people. completely different thinking and writing style. And so the purpose of doing that would be to stimulate A new way of thinking or writing I guess on your part and writing you know 1 of the targets for that could be yourself so it's like I'd much rather have a comprehensible sentence than a truly correct 1. 1 of those is far more valuable and far more correct English or to yourself. Yes. one's the other you're trying to be used by the tool. And they're not the same thing. responsible for my writing and being the final judge of it and as a scientist I have to my mantra is it's got to be clear and then precise and then concise in that order. And I claim that, you know, that's the order with which I go through doing revisions. Clarity is, you know, if it's not clear, it's useless. It's got to be clear to me, but it's got to be clear to a lot of people for whom English is not a first language. And then after that, I got to worry about precision and then conciseness, but those can't be done at the expense of clarity. So it's quite a battle. where it's like if you have more than 3 items like here the purpose of doing that is to help or grant of a to-do list is help is to Have you help choose what you're going to do for the day. Which is why if you have more than 3 items, if you have 50 items on there, you're not going to get 50 of those items done. So maybe you pick the easiest ones to do, not necessarily the ones that you want or need to be done. So it's like the process of choosing those, like, I don't know, like I found that a very good rules, like up to 3 priority items if you, and then also when you look back and you see that you did those 3 items, Who cares about this? I'd rather get those 3 items done than any number of secondary tasks. very right about that. I don't, I used to, you know, use a pattern of assigning letters. And so you have like, you know, based on like a hierarchy of, you've got the urgent and important, of course, that you got to deal with those. And then the next thing down is the important and so on. But I tend to just generate these terribly long lists that most of those items would go on what is known as a grass catchers list of things that you may get to someday, but there's no way you can get to them today. But I feel compelled, I need to capture them. I may want to do them eventually. They wind up on my list. Zettelkasten where you have the day thoughts and the day journal, then you have your Zettelkasten which I don't think should have too close of a connection because one's a lot more, what's the word? Yeah, that's the word. Yeah, one's actually much more processed. The other is you don't want that process because you want it to flow from your head with as little friction as possible. The other 1 you want to be processed so that when you look it up and stuff like that's more efficient Same thing with your to-do things. So like oh, yeah, I guess there's 1 more Category like I thought I found my 3 favorite way rather than like priority 123 is primary tasks which basically generally goes up to 3, secondary tasks, and then I like to have a third category, unplanned tasks, and I just have those wrote down in a heading in an org mode file, and then I put the tasks in there, rather than using the agenda, like too much, I don't know, just I found that that was my favorite way of doing it and then you have like another file that would just be your dump of anything you want to do and that would be like that you could pull from to get your day or I guess something that's actually better than a day is doing it all by a week at a time I found that that's actually a lot nicer because thinking about what you do in a week seems like a nicer unit, where you have a week, then you have your day, and then you have the 3 categories of priority, secondary, and unplanned. At least that's been my favorite iteration on planning on a weekly basis and he would just get his weekly list of things to get done and he was very good at pounding through that list and getting them done. I have been too much of a day-oriented person and a week-oriented person to adapt his approach, but I've been considering that too. I think what I don't do enough of is pulling back to the month level, semester level, year level, 5 year level, 10 year level. And... is like you can have like so you'd have your week and then maybe you have like 1 section after Friday or last day of the week and this is like your this is just your like staging so this is where you stage all the tasks and then what like you can just stay in your staging write them all down and then use alt and your arrow keys to quickly reorder all of them in the week and then when you're looking at 1 day and you're just looking at ordering everything well it makes a lot of sense when you just say, I don't really want to do that. Like I want this done this week. I don't necessarily want it done on this day. So it just, that's why I found that the week approach works a lot nicer even. in your week to do the staging. like, these are the things I would like to get done. And then when you schedule it, then you kind of schedule it by just using the Alt-Left key, the Alt-Arrow keys to just, oh, I want this done. It looks like this would work really good on this day. This 1 looks like it would work on this day. I found that it works at least better without it. Yeah, that's fine. Because that way I also get a log of everything I've done, which I can't find a way that, it seems easier to just make new files for it. And rather than, like you could use it with Org Agenda, but like 1 of the things that you want is with it is to look back at it, reflect. And so like if you have the, if you have, if you open up the file with 2 levels or 3 levels of headings to where you just see the priority task, you can get a very nice overview of saying, I did my priority task this day. So you get the numbers next to the things. And so you can easily just say, I've done this. I mean, it would be nice if I could figure out a way of doing agenda to give me percentages. But I haven't figured that out. Seeing the granular level, I can easily scan that with my eyes. So I just did it by hand rather than the agenda. times and pretty seriously, but I keep bouncing off it. I think I get too many things built in or scheduled and I just don't get to them. I feel bad about it and I wind up abandoning it. So that's 1 area where there's probably some potential for optimizing and making that work better. There's a lot of customizing you can do with Agenda. It's amazing. I wanted there to be a separation between the daily to-do lists and like your grab bag which I think agenda works a lot better for a grab bag. I want a nice way of looking back at my to-do daily to-do logs. So I kind of want them to be separated, so I just did them separate. With the agenda, I could never figure out exactly how I want that to work, how the files would look, and how all the Emacs settings would interact with it. I mean, I'm sure I could, but that's why I opted for weekly files. Or at least That's my most refined idea on the process. is a little different that I'm generating this text on a daily basis and popping it into this to 1 document file per day and a like a diary on Overleaf as a big so it winds chapter and it's compiled quickly enough even though it's often up to 1,000 pages long by the end of the year. And I have all these, of course, with the PDF, I can search through it. So that's not as you can't do the kind of really sophisticated searching that you can do with Org Mode. But just doing that, It sure has been very helpful in digging up information, like the little protocols on how I attack, accomplish a certain task that I have to do a year later, or to have a record of what I did on a certain day and then somebody above me might be trying to hold me to account what got done. I can look that up pretty very quickly. It's documented. I find that to be just any kind of thorough documentation system is very rather than by a weak file. I ran into trouble with, like, once you get a lot of items, like if you have 1,000 items, headings, I've had org files with 1,000 headings. It can be so hard to scroll through. Maybe it's some limitations I'm run into with the Emacs being single threaded. It was like, that's 1 of the things is like, how exactly do you want this, the information structured because it can change how it's retrieved. logs and I put it all in the date and then the priority, secondary, unplanned tasks, and then I had it stay at that, get auto expanded by that level by default so I didn't see the individual task and you and then I had a but And then it would say like I complete 205 or something like that of secondary tasks. And then just being able just to quickly scan all the days and say, oh, it just, the feedback you get from that is worth a lot. And I don't think it's something, it's not something I could think of how you do an agenda. Even though I got done in the text files just because you get that doesn't expand all the way so you so you can quickly just see on this day I did this well on this day I did this well all within and 4 lines per day. So it's not, that doesn't, that's not very visually verbose. Probably about as visually verbose as you want it. They're not super long. You easily see the 2 of 3 and stuff like that that you get done so you can quickly and say, oh well, these are the days where I got my primary tasks done or this week, and this day I didn't do it well and you could helps you correlate like your feelings with your to-do lists and journals and whatnot. Because it's summarizing capability. It allows you to, you know, pull back and get an overview. Get an overview. from that almost when I did that, it feels like half the reason or should be like half the reason is and it's something that I don't if you use the agenda as it is, you wouldn't, I don't know how you would get it, like saying, like looking at the week by week basis, breakdowns, you might be able to get like percentages, which would be nice. Like I did this well, or like habit, I don't, there might be things that could offer you but. Yeah, on various kinds of projects, or various kinds of activities, and to get some feedback in that regard. And then you, but you got the, So I define a project as anything that requires work at different points in time, more than 1 that I made that demonstrates that. I don't know if you, do you have your email in your talk notes or anything? slide. There should be my email address. I can add it to my talk notes. I'm going to share screen button, right? There's a share screen button, right? Can you not share the screen on this? Let's see. I have, I see some stuff on here. Wonder if I'm still active. It shows share screen. Cancel. I can put my email address in the chat. but Let's see. Yeah, I think the way that they did it on the Any of the other videos if they shared the screen they just shared the webcam they just took over the webcam with OBS and shared what they wanted with it. Yeah, I'll give that to you. Okay. I guess I'll let you go watch the rest of the Emacs videos. Thank you very much. I appreciate your willingness to share your thoughts on this matter. This is vital, time management. It's a kind of key aspect of life. Reasons to use emacs is to use the keyboard is. It's not to speed you up. Like, yeah, that's nice. But it keeps you in the stream, keeps you in the flow state and which then just makes you think better and yeah and the thing with that is you I have you I have no idea what the limits of that would be. Because you think, because yes, it's not about beating up how many words you say a minute. I mean that's nice and all, But when you start doing that, when you start removing all these friction points, all of a sudden the number, quality, and types of thoughts you get start Enjoy the rest of the meeting.
Questions or comments? Please e-mail emacsconf-org-private@gnu.org