Splitting an SRT file for use with a VIDEO_TS directory

My Samsung TV is pretty good at playing videos that sit on my NAS, but it is not very good at recognizing various subtitle formats. Luckily, there are free .SRT files available from various places (like http://www.suby.com, or http://www.opensubtitles.org, or http://www.ondertitel.com) and usually it is enough to throw them into the same directory as the movie file, give it the same name as the movie file (with the extension .SRT), and play.

But a while ago I copied a DVD straight to my hard-drive. It was English spoken, and wanted to add some Dutch subtitles, but the DVD consists of a VIDEO_TS directory containing more than one .VOB file:

VIDEO_TS.BUP
VIDEO_TS.IFO
VTS_01_0.BUP
VTS_01_0.IFO
VTS_01_1.VOB
VTS_01_2.VOB
VTS_01_3.VOB
VTS_01_4.VOB
VTS_01_5.VOB

Playing it was no problem: I just pointed my Samsung TV to the first .VOB file, VTS_01_1.VOB, and it started playing; and it was even nice enough to automatically start the next movie files, in order, so I could watch the entire movie without touching anything. But what name should the SRT file have?

I wasn’t sure how smart my Samsung TV was, so I tried the following:

  1. Call the file VIDEO_TS.SRT and put it in the same directory where the VIDEO_TS directory was in. This didn’t work: no subtitles were shown.
  2. Call the file VIDEO_TS.SRT and put it in the VIDEO_TS directory. No subtitles.
  3. Call the file VTS_01_1.SRT and put it in the VIDEO_TS directory. This worked great for the first file, but no subtitles were played when the second file (VTS_01_2.VOB) started to play.
  4. Could it be that they were smart enough to support this? I copied the SRT file 5 times, and named them VTS_01_1.SRT, VTS_01_2.SRT, VTS_01_3.SRT, VTS_01_4.SRT and VTS_01_5.SRT. After all, the TV knew the files should play one after another. Alas. The subtitles played, but started anew when the second VOB file began to play.

This last experiment gave me an idea how to do it: if I copied the file five times and resynced the subtitles for the last four, it should work. But of course, that is not something you want to do manually.

Just because it was possible, I wrote the following Python program. It reads the information about the VOB files (using FFmpeg,) and creates copies of the original SRT files, offset by the start time of each piece of the video. Of course, this took longer than converting the VIDEO_TS directory into, say, an MKV file, but the next time it will not…

Share and enjoy.

This program needs FFmpeg to read the video information (or, more precisely, ffprobe,) and assumes it is available in your path. FFmpeg is available at https://www.ffmpeg.org/

I learned some things from:

  • https://somethingididnotknow.wordpress.com/2012/05/02/fix-subtitles-offset-with-python/
  • https://github.com/wting/srt-resync/blob/master/srt-resync

Thanks for that.

"""
NAME
    srt2video_ts.py - split an SRT file for use with VTS_01_*.VOB files

SYNOPSIS
    srt4video_ts.py [-h] [--version] [-s SRT_FILE] [video_ts]

DESCRIPTION
    srt2video_ts splits an SRT file according to the durations of a 
    set of VOB files. You can use it to see subtitles with a copied
    DVD on a Samsung TV. It will probably be useful with other video
    applications as well.

    This program needs FFmpeg to read the video information (or, more
    precisely, ffprobe,) and assumes it is available in your path. 
    FFmpeg is available at https://www.ffmpeg.org/
    
    I learned some things from:
        https://somethingididnotknow.wordpress.com/2012/05/02/fix-subtitles-offset-with-python/
        https://github.com/wting/srt-resync/blob/master/srt-resync

    Thanks.

ARGUMENTS
    video_ts
        The VIDEO_TS directory containing the VTS_01_*.VOB files.
        Default is the current directory.

OPTIONS
    -h, --help            
        Show this help message and exit.
    --version             
        Show version information and quit.
    -s SRT_FILE, --srt-file SRT_FILE
        The SRT file to split. The SRT file will be split into a set of SRT
        files with the same names as the VOB files, but with the extension 
        SRT. Many video players use this convention to show subtitles 
        automatically.

AUTHOR 
    Dion Nicolaas

LICENSE
    zlib License:

    Copyright (c) 2016 Dion Nicolaas

    This software is provided 'as-is', without any express or implied
    warranty.  In no event will the authors be held liable for any
    damages arising from the use of this software.

    Permission is granted to anyone to use this software for any
    purpose, including commercial applications, and to alter it and
    redistribute it freely, subject to the following restrictions:

    1. The origin of this software must not be misrepresented; you must
       not claim that you wrote the original software. If you use this
       software in a product, an acknowledgement in the product
       documentation would be appreciated but is not required.
    2. Altered source versions must be plainly marked as such, and must
       not be misrepresented as being the original software.
    3. This notice may not be removed or altered from any source
       distribution.
"""

import argparse
import datetime
import math
import os
import re
import subprocess


VERSION = "1.0"
DESCRIPTION = "Split an SRT file for use with VTS_01_*.VOB files"

# The ffprobe command to retrieve the vobfile's start time. %s is the vobfile.
# This assumes ffprobe is in your path (or your current dir.)
FFPROBE="ffprobe -v error -show_entries format=start_time -of default=noprint_wrappers=1:nokey=1 %s"

# Arbitrary date, we need to use datetimes with timedeltas.
BASE_DATE = datetime.datetime(2016,1,1)


def vob_list(dirname):
    """Return a list of vobfiles in dirname. Called from parseargs."""
    dirlist = os.listdir(dirname)
    voblist = [dirname + '/' + fname
               for fname in dirlist
               if re.match(".*.vob$", fname, re.IGNORECASE)]
    return voblist


def parse_options():
    """Parse the command line option and return the options dict."""
    parser = argparse.ArgumentParser(description = DESCRIPTION)
    parser.add_argument('--version',
                        action = "version",
                        version = "%(prog)s " + VERSION,
                        help = "show version information and quit")

    parser.add_argument('-s', '--srt-file',
                        type=file,
                        help="the SRT file to split.")

    parser.add_argument('video_ts',
                        type=vob_list,
                        nargs='?',
                        default=vob_list("."),
                        help="The VIDEO_TS directory containing the VTS_01_*.VOB files. Default is the current directory.")
    return parser.parse_args()


def timestamp(time_string):
    """Turn a SRT file timestring into a datetime (with arbitrary date.)"""
    ts = time_string.replace(':', ',')
    tlist = [int(num) for num in ts.split(',')]
    return BASE_DATE + datetime.timedelta(days=0,
                                         hours=tlist[0],
                                         minutes=tlist[1],
                                         seconds=tlist[2],
                                         microseconds=tlist[3] * 1000)


def read_srt(srt_file):
    """Read an SRT file and return it as a list."""
    print "Reading SRT file..."
    rec_idx_expected = 0
    rec = []
    records = []
    for line in srt_file:
        # Check for empty lines
        m = re.match("^\\s*$", line)
        if m:
            # empty line: end of record. Let's always do this, even if we 
            # don't have anything else. Something will fail later.
            records.append(rec)
            rec = []
            rec_idx_expected = 0
        elif rec_idx_expected == 0:
            # Check for record number. We don't use the number itself, as we
            # don't expect out of order records (which is hardly supported
            # anyway.)
            m = re.match(r'(\d+)', line)
            if m:
                rec_idx_expected = 1
            else:
                print "Error, record number expected (%s)" % line
        elif rec_idx_expected == 1:
            # Check for the time stamps.
            m = re.match(r'^(\d+:\d+:\d+,\d+)\s+--\>\s+(\d+:\d+:\d+,\d+)', line)
            if m:
                if len(rec) == 0:
                    rec.append((timestamp(m.group(1)), timestamp(m.group(2))))
                    rec_idx_expected = 2
                else:
                    print "Error: time before index!"
            else:
                print "Error, time expected (%s)" % line
        elif rec_idx_expected == 2:
            # Anything next is the subtitle. Just store them.
            if len(rec) == 2:
                rec[1] += line
            else:
                rec.append(line)
    return records


def get_vob_starts(files):
    """Use ffprobe to make a list of the vobfiles's start times.
    
    Return a dictionary vobfilename -> starttime.
    """
    print "Reading VOB file offsets..."
    vob_starts = {}
    for fname in files:
        print " Probing %s..." % fname
        ffprobe = FFPROBE % fname
        output = subprocess.check_output(ffprobe.split(),
                                         stderr=subprocess.STDOUT,
                                         shell=True)
        # first line only, in case ffprobe spits out more
        time = output.splitlines()[0]
        (integer, fractional) = math.modf(float(time))
        timediff = datetime.timedelta(seconds=integer,
                                      microseconds=fractional * 1000000)
        vob_starts[fname] = timediff
    return vob_starts


def format_time(timestamp):
    """Format a timestamp in SRT file format."""
    formatted = timestamp.strftime('%H:%M:%S,%f')
    return formatted[:-3]


def write_record(f, index, record, start):
    """Write out one SRT record, times offset with start."""
    f.write("%d\n" % index)
    # The end time belongs to this video part, the start time maybe started
    # earlier. Set it to 0 in that case.
    start_time = record[0][0] - start
    if start_time < BASE_DATE:
        start_time = BASE_DATE
    f.write("%s --> %s\n" % (format_time(start_time),
                             format_time(record[0][1] - start)))
    f.write(record[1])
    f.write("\n")


def write_srt(srtname, start, srt):
    """Write an SRT file starting after time start. 
    
    Subtract start from timestamps. Leave everything after end, as it does 
    no harm.
    """
    print " Writing %s..." % srtname
    with open(srtname, "w") as f:
        index = 0
        for record in srt:
            # Check if the end-time belongs to this video
            offset_time = record[0][1] - start
            if offset_time < BASE_DATE:
                continue
            index += 1
            write_record(f, index, record, start)


def write_srts(vob_starts, srt):
    """Write SRT files for all VOB files."""
    print "Writing SRT files..."
    for (fname, start) in vob_starts.iteritems():
        srtname = re.sub('\\.VOB', '.SRT', fname, re.IGNORECASE)
        write_srt(srtname, start, srt)


if __name__ == "__main__":
    options = parse_options()
    srt = read_srt(options.srt_file)
    vob_starts = get_vob_starts(options.video_ts)
    write_srts(vob_starts, srt)
    print "Done."