Tool to Combine PDF Files
PDF toolkit is a great PC tool for splitting and merging PDF files. It is free and has a GUI and CLI verison.
I use it frequently to merge PDFs and use the command below.
pdftk.exe *.pdf cat output combined.pdf
PDF toolkit is a great PC tool for splitting and merging PDF files. It is free and has a GUI and CLI verison.
I use it frequently to merge PDFs and use the command below.
pdftk.exe *.pdf cat output combined.pdf
I recently tinkered with Torbjørn Pedersen’s (National Library of Norway) Python script video-ocr2srt to extract burnt-in English subtitles from a digital video. The script performs optical character recognition (OCR) on video files and generates a .srt subtitle file with a detailed JSON file.
The script leverages on the EAST text detector model for text detection and the Pytesseract library for OCR. I achieved decent results with it, which may improve with a better quality video file. I suspect the extremely poor transfer of the film may be the cause of numerous duplicate lines and inclusion of stray special characters in the subtitles. But what it does so well is the heavy lifting creating the in and out points for the subtitle lines ╰(°▽°)╯! It processed a 110 minute video under 40 minutes, however users will need to ‘clean’ the .srt file for spelling, grammar, punctuation, and timing after.
python video-ocr2srt.py -v input -m frozen_east_text_detection.pb -l eng -f 10 -p
PyTesseract is a widely used open-source OCR engine for Python that read and recognizes text in images. It determines text lines that are fixed pitch and slices the words into characters based on the pitch. While it is known for its accuracy and versatility, it can be challenging to install it in a Windows environment.
1. Download and install Tesseract
2. Add TESSDATA_PREFIX in the System Environment Variables:
Variable Name - TESSDATA_PREFIX
Variable Value - C:\Program Files (x86)\Tesseract-OCR\tessdata
3. Add another environment variable tesseract.
Variable Name - tesseract
Variable Value - C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
4. Add the path in the PATH environment.
Variable Value –C:\Program Files (x86)\Tesseract-OCR
Here are two simple and fun tools to browse the web with the terminal!
sudo apt install lynx
lynx examplewebsite.com
links2 is a similar tool and may be set up with the following commands:
sudo apt install links2
links2 examplewebsite.com
I had the great pleasure of organizing and producing posters for two Home Movie Day events! Check it out!