Today marks another major milestone for the Ubiquiti home network with the Fibre Termination Point (TP) finally installed! The OpenNet technicians decided to come two days before our scheduled appointment, but I was more than happy to accommodate to their request. I decided to relocate the TP from the living room to the service balcony to have it close to the network rack. Most households typically have the TP close to the TV consoles as it is common to use the router-modem provided by the ISPs, whom typically are the internet TV providers as well. The relocation involved drilling a new hole from the electrical riser into the kitchen to have a short run of cable straight to the desired location. I was initially quite nervous about the drilling as I didn’t want to puncture any pipes, but thankfully it went fine. All of the work was completed within 40 minutes but I was surprised that the task required a team of five men!
I’ve always wondered what makes up a digital file and Nicole Martin has wonderfully written a detailed post about the zeros and ones that make up a digital file.
I’ll briefly lay out the keys points from the post.
What makes a ‘digital’ file?
A digital file comprises of bits (zeros and ones) or plain text that contain crucial information that allows the computer/software to interpret/manipulate/process/encode/decode it into human-understandable information. The digital file may represent various types of information such as text, images and audio.
Programs can read plain text data without the aid of file signatures or headers (described below). Common types of plain text data include .txt and .html files. This has wide application as it is high compatible with most computer systems and software.
Binary data (zeros and ones) can represent more complex information and is read by computer programs in a linear data ‘stream’ or sequence of bytes. The program is required to decode the information block by block as per instructed by the file’s signature and header.
Useful info
Eight bits = 1 Byte
A set of 512 bytes is one ‘sector’ and ‘sectors’ may be grouped into ‘blocks’.
Parts of a file
File signature
It consists of a short section of code that defines the files format.
File Header
A sequence of data that instructs on how to interpret the information. Machines and software will follow the instructions in the header to process/interpete the data it contains.
File Body
The main bulk of information that allows the machine/program to render the file into human-understandable information. It can be stored as plain text or binary data.
End
The last section of code that signals the end of the package.
I have been trying to figure the best way to play and manage film scans and learned a bag of new tricks from reto.ch!
Playing a DPX sequence
The regex %06d matches six digits long numbers, possibly with leading zeroes. This allows to read in ascending order, one image after the other,the full sequence inside one folder. Of course, the command must match the naming convention actually used.
Make a ProRes file from the scan sequence
f image2 forces the image file de-muxer for single image files
framerate sets the frame rate to 24
NOTE: The previous two parameters must be before the input file, because they are applied to the input file.
i path, name and extension of the input file
The regex %06d matches six digits long numbers, possibly with
leading zeroes. This allows to read in ascending order, one
image after the other, the full sequence inside one folder.
The command must of course match the naming convention
actually used.
c:v chooses the ProRes video codec
profile:v the flavour ProRes 422 HQ has the video profile 3
filter:v filters the video stream:
_ scaling to the correct size
[we use the Lanczos scaling algorithm which is slower but
better than the default bilinear algorithm]
_ padding the 4:3 format into the 16:9 HD format with pillar box
Make a H264 file from the scan sequence
Make an access file H.264 directly form the conservation files TIFF.
c:v chooses the H.264 codec by using the libx264 library
preset chooses the very slow preset which gives the best result
qp a quantisation parameter of 18 means “visually lossless”
Managing digital data has become increasingly challenging and can be a daunting experience with the sheer volume of what can build up in a short period. Digital data is vulnerable and can be easily lost if not properly cared for. Hard drives fail, files get plagued by bit-rot, or it can get deleted by accident. Data loss is painful, and recovery (if even possible) is an expensive and stressful process. With the advancement of born-digital filmmaking technologies, we are generating more data than ever before and this poses a huge risk to the loss of our films if we do not start taking steps to care for the physical longevity of a work as it is being created.
In the film archive, many of the titles we receive in born-digital formats require our staff hours to sieve through and sort out the clutter. This is usually because the data is not properly organised or lacks sufficient documentation (for instance, poor file-naming conventions) making it difficult to know what is being filed.
Since many people are working from home as a result of COVID-19, it is a great opportunity to take stock and review the backup of your files and check if you are still able to access your digital films and related materials. No one else knows your work better than you. So ensuring the integrity of your materials is a task best done by you.
We have compiled some practical tips and basic data management concepts that is easy for anyone who is planning to organise and backup their personal files at home. There is no right way to do this and you may find other solutions that might better suit your needs.
For the long-term preservation of your film works, consider sending it to the Asian Film Archive. We will assess if it falls within our acquisition policy. Click here to provide us with some information.
Let’s start!
Consolidate your files
Tracking and managing your files gets challenging when they are kept on multiple and different devices. It’s easy to lose track of its location if some are sitting on hard drives and others are on online platforms, such as Dropbox, Google Drive and OneDrive. Identifying where the files are is the first step in taking stock of what you have.
You could centralise the files on your computer or an external hard drive once you have done locating your files. Remember to select a suitably sized media to do all of this.
At this stage, everything will look like a huge mess. Take the time to survey what you have because this will inform you on how to organise and name them later.
By the end of this process, you will have a good sense of how much data you have and knowing this will help you allocate the necessary memory for your backup strategy.
Keep only what you need
Select what you need and delete whatever that is unimportant. For example, you might encounter multiple copies of a file and it could be worth considering what is sufficient to keep, for instance, keeping only the latest version.
This quality check process will free space, keeping your data volume to a minimum. Working with a smaller volume will keep costs low and allow an easier migration and backup process down the road.
Organise your files
Organise the files that you have selected by creating a file directory structure and a file naming convention that makes sense to you and others accessing the files. Whatever system you decide upon should be easily understood by you and others to ensure easy accessibility and quick identification.
File Directory Structure
This is an example of how files can be organised and structured:
find it useful to first determine the top-level folder. I used [YEAR] in this example, and branch into sub-directories by [CATEGORY]. In the screenshot above, we see Year 2020 and the categories are AFA Work, External Projects and Personal Documents. The later directories might be projects/events based but try to keep them as consistent as possible.
There is no ‘perfect’ format and this example gives you an idea on how you can start. Your directory should be intuitive and logical and should be guided by how you work.
Here are some tips on designing your own directory.
Draw it out!
Draw your directory out on a piece of paper before implementation
Keep it simple
Avoid complex and deep-layered designs
Consistency
Keep a consistent structure across folders
Be precise
se plain language and keep it short
Avoid spaces, punctuations and symbols
Some operating systems do not recognise spaces so avoid spaces in a mixed operating system environment
File Naming Convention
Descriptive file names are imperative to quick identification and retrieval of files. Poor naming conventions are frustrating and wastes a lot of time since they do not give useful information (“Best practices for file naming”, 2020).
A guiding principle for filenames is to include basic information such as object type, dates, and important remarks. These indicators can be crucial in distinguishing one file from another in the event that there are multiple variations of a given file. In essence, an effective file name should tell you what the file is without you having to open it (Antin, 2020).
In the screenshot above, assets of a project are organised by their folders: Logos, Mock_Up_Thumbnail and Watermarked_Stills. The name of the three files clearly indicates that there are three image stills (Still) from the film Sunshine Singapore (SS) watermarked (WM) in .png format.
Now that your files are all in place and organised, it is time to back them up! You can consider the classic 3-2-1 data protection strategy which is a model widely adopted by professionals in content and media production
The 3-2-1 strategy
Keep 3 copies of your data
The 3-2-1 strategy encourages the back up of three copies of data because one copy is simply not enough. Having one copy is dangerous and the more copies you have, the lesser the chance of complete data loss.
Store 2 copies on 2 different storage devices
Drives will eventually fail because of mechanical failure or wear and tear. Hence the 3-2-1 strategy recommends keeping your first two backups on two separate storage devices at your primary location. Storing the two copies differently will provide an added insurance for data restoration in the event that one source fails.
There are various backup storage solutions available, but which do you choose? Here are two solutions you can consider.
A) External hard drive (SATA & SSD)
This is the most common solution because it is affordable, easy to use and widely available. An external hard drive requires little set-up, just plug it into your computer via USB and it is ready to use. They usually come in two variants: SATA & SSD and their differences are in the links provided. SATA drives are much cheaper but SSD’s are faster and less prone to failure since there are no moving parts in it. However it is important to note that SSDs have limited write cycles, even though it is less susceptible to physical wear (“SATA vs SSD vs NVMe: Types of Hard Drives”, 2020).
Pros: Easy to use, portable, affordable, widely available
Cons: Can’t easily share files
You can consider deploying multiple hard drives and rely on tools/software to help duplicate data from one drive to another.
A NAS storage device is connected over a computer network and acts as a central location for multiple users to write and access data (“What is NAS (Network Attached Storage) and Why is NAS Important for Small Businesses?
Seagate UK”, 2020). Depending on the model, multiple hard drives are housed within a NAS for storage and can be scaled for increased capacity depending on its number of available bays. You can think of NAS as an array of hard drives put together to form a larger storage unit.
It can be set up to use a RAID configuration to ‘create’ multiple units of storage within the NAS but still behaving as one cohesive storage. This allows you to manage storage redundancy and performance according to your needs. There are many NAS solutions available but Synology and QNAP are two popular brands.
Pros: Allows multiple users to access, scalable, allows customization, status monitoring
Cons: High upfront cost, requires basic technical knowledge
The last component of the 3-2-1 strategy is to have 1 copy of your data stored off-site away from your primary location. This is a key component in designing a robust back up strategy as on-site file storage can be compromised by hardware failure, theft or a fire. You can consider cloud storage as your off-site solution where your files will be hosted on a cloud service for a cost (“Backup Strategies: Why the 3-2-1 Backup Strategy is the Best”, 2020).
There are many cloud solutions plans you can consider that can cost as low as USD6 per month for unlimited file storage. Most of these solutions keep data restoration straightforward as well. For example, they can restore your data in different ways: direct download, USB flash drive or external hard drive. Whichever method you pick will depend on the volume of data you are retrieving.
Conclusion
There is no one way to manage and backup your data since it will depend on your data volume and importantly, your budget. When it comes to designing your backup system, a general rule of thumb is to have multiple copies and diversify them on different storage solutions. If one source fails, you can rely on others for data restoration.
Managing your data requires continuous effort and it is easy to overlook it with competing responsibilities. Hence you can consider a backup regime or simply backup as you go. By doing so, you not only reduce the risk of data loss but also have peace of mind that your files are safe and are readily accessible when needed.
Reference List
Antin, K. (2020). File naming conventions: why you want them and how to create them. HURIDOCS. Retrieved 11 October 2020, from https://www.huridocs.org/2016/07/file-naming-conventions-why-you-want-them-and-how-to-create-them/.
Backup Strategies: Why the 3-2-1 Backup Strategy is the Best. Backblaze Blog Cloud Storage & Cloud Backup. (2020). Retrieved 11 October 2020, from https://www.backblaze.com/blog/the-3-2-1-backup-strategy/.
Best practices for file naming. Stanford Libraries. (2020). Retrieved 10 October 2020, from https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming.
SATA vs SSD vs NVMe: Types of Hard Drives. Pluralsight.com. (2020). Retrieved 11 October 2020, from https://www.pluralsight.com/blog/it-ops/types-of-hard-drives-sata-ssd-nvme.
What is NAS (Network Attached Storage) and Why is NAS Important for Small Businesses?, Seagate UK. Seagate.com. (2020). Retrieved 10 October 2020, from https://www.seagate.com/sg/en/tech-insights/what-is-nas-master-ti/.