Installing Tesseract on Windows

PyTesseract is a widely used open-source OCR engine for Python that read and recognizes text in images. It determines text lines that are fixed pitch and slices the words into characters based on the pitch. While it is known for its accuracy and versatility, it can be challenging to install it in a Windows environment.

Installation steps

1. Download and install Tesseract

2. Add TESSDATA_PREFIX in the System Environment Variables:

Variable Name - TESSDATA_PREFIX
Variable Value - C:\Program Files (x86)\Tesseract-OCR\tessdata

3. Add another environment variable tesseract.

Variable Name - tesseract
Variable Value - C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

4. Add the path in the PATH environment.

Variable Value –C:\Program Files (x86)\Tesseract-OCR

Lynx and Links2 - Terminal Web Browsers

Here are two simple and fun tools to browse the web with the terminal!

lynx

sudo apt install lynx
lynx examplewebsite.com

links2 is a similar tool and may be set up with the following commands:

sudo apt install links2
links2 examplewebsite.com

Getting .mp4 Files with yt-dlp

yt-dlp is an excellent tool for pulling video files off YouTube, however its default file output is .webm.

The following command will try to pull a native .mp4 file off YouTube and will do necessary transcoding if that fails after downloading.

yt-dlp -S res,ext:mp4:m4a --recode mp4 https://www.youtube.com/watch?v=dQw4w9WgXcQ