Captions are great for making videos (especially technical ones!) easier to understand and search.
If you see a talk that you'd like to caption, feel free to download it and start working on it with your favourite subtitle editor. Let me know what you pick by e-mailing me at sacha@sachachua.com so that I can update the backstage index and try to avoid duplication of work. Find talks that need captions here. You can also help by adding chapter markers to Q&A sessions.
You're welcome to work with captions using your favourite tool. We've been using https://github.com/sachac/subed to caption things as VTT or SRT in Emacs, often starting with autogenerated captions from OpenAI Whisper or WhisperX (the .vtt file backstage).
We'll be posting VTT files so that they can be included by the HTML5
video player (demo: https://emacsconf.org/2021/talks/news/), so if
you use a different tool that produces another format, any format that
can be converted into that one (like SRT or ASS) is fine. subed
has
a subed-convert
command that might be useful for turning WebVTT
files into tab-separated values (TSV) and back again, if you prefer a
more concise format.
You can e-mail me the subtitles when you're done, and then I can merge it into the video.
You might find it easier to start with the autogenerated captions and then refer to the video or any resources provided by the speaker in order to figure out spelling. Sometimes speakers provide pretty complete scripts, which is great, but they also tend to add extra words.
Edit the VTT to fix misrecognized words
The first step is to edit misrecognized words. VTT files are plain text, so
you can edit them with regular text-mode
if you want to. If you're
editing subtitles within Emacs,
subed can conveniently synchronize
video playback with subtitle editing, which makes it easier to figure
out technical words. subed tries to load the video based on the
filename, but if it can't find it, you can use C-c C-v
(subed-mpv-find-media
) to play a file or C-c C-u
to play a URL.
Look for misrecognized words and edit them. We also like to change things to follow Emacs keybinding conventions (C-c instead of Control C). We sometimes spell out acronyms on first use or add extra information in brackets. The captions will be used in a transcript as well, so you can add punctuation, remove filler words, and try to make it read better.
Sometimes you may want to tweak how the captions are split. You can
use M-j
(subed-jump-to-current-subtitle
) to jump to the caption if
I'm not already on it, listen for the right spot, and maybe use
M-SPC
to toggle playback. Use M-.
(subed-split-subtitle
) to
split a caption at the current MPV playing position and M-m
(subed-merge-with-next
) to merge a subtitle with the next one.
If you don't understand a word or phrase, add two
question marks ([??]
) and move on. We'll ask the
speakers to review the subtitles and can sort that
out then.
If there are multiple speakers, you can indicate switches between speakers
with a [speaker-name]:
tag, or just leave it plain.
Once you've gotten the hang of things, it might take between 1x to 4x the video time to edit captions.
Subtitle timing
Times don't need to be very precise. If you notice that the times are way out of whack and it's getting in the way of your subtitling, we can adjust the times using the aeneas forced alignment tool.
Splitting and merging subtitles
If you want to split and merge subtitles, you can
use M-.
(subed-split-subtitle
) and M-m
(subed-merge-dwim
). If the playback position is
in the current subtitle, splitting will use the
playback position. If it isn't, it will guess an
appropriate time based on characters per second
for the current subtitle.
Splitting with word-level timing data
If there is a .json
or .srv2
file with
word-level timing data, you can load it with
subed-word-data-load-from-file
from
subed-word-data.el
in the subed package. You can
then split with the usual M-.
(subed-split-subtitle
), and it should use
word-level timestamps when available.
Playing your subtitles together with the video
MPV should automatically load subtitle files if
they're in the same directory as the video. To
load a specific subtitle file in MPV, you can use
the --sub-file=
or --sub-files=
command-line
argument.
If you're using subed, the video should autoplay if it's named the
same as your subtitle file. If not, you can use C-c C-v
(subed-mpv-play-from-file
) to load the video file. You can toggle
looping over the current subtitle with C-c C-l
(subed-toggle-loop-over-current-subtitle
), synchronizing player to
point with C-c ,
(subed-toggle-sync-player-to-point
), and
synchronizing point to player with C-c .
(subed-toggle-sync-point-to-player
).
Starting from a script
Some talks don't have autogenerated captions, or you may prefer to start from scratch. Whenever the speaker has provided a script, you can use that as a starting point. One way is to start by making a VTT file with one subtitle spanning the whole video, like this:
WEBVTT
00:00:00.000 -> 00:39:07.000
If the speaker provided a script, I usually put the script under this heading.
If you're using subed, you can move to the point to a good stopping
point for a phrase, use M-SPC
to toggle pausing M-.
(subed-split-subtitle
) when the player reaches that point. If it's
too fast, use M-j
to repeat the current subtitle.
Starting from scratch
One option is to send us a text file with just the text transcript in it and not worry about the timestamps. We can figure out the timing using aeneas for forced alignment.
If you want to try timing as you go, you might find it easier to start by making a VTT file with one subtitle spanning the whole video (either using the video duration or a very large duration), like this:
WEBVTT
00:00:00.000 -> 24:00:00.000
Use C-c C-p
(subed-toggle-pause-while-typing
)
to automatically pause when typing. Then start
playback with M-SPC
and type, using M-.
(subed-split-subtitle
) to split after a
reasonable length for a subtitle. If it's too
fast, use M-j
to repeat the current subtitle or
adjust subed-mpv-plackback-speed
.
Chapter markers
In addition to the captions, you may also want to add chapter markers. An easy way to do that is to add a =NOTE Chapter heading= before the subtitle that starts the chapter. For example:
...
00:05:13.880 --> 00:05:20.119
So yeah, like that's currently the problem.
NOTE Embeddings
00:05:20.120 --> 00:05:23.399
So I want to talk about embeddings.
...
We can then extract those with
emacsconf-subed-make-chapter-file-based-on-comments
.
For an example of how chapter markers allow people to quickly navigate videos, see https://emacsconf.org/2021/talks/bindat/ .
Please let us know if you need any help!
Sacha sacha@sachachua.com