Later engineers screwing up a design of necessity... (history rant)
Message:I'll try to keep a lot of information short and sweet (Why are you laughing?). Red, some of this you should already know, so some of this info is for others who may read.
H.264, aka "Advanced Video Codec" (AVC), often called mp4 is a *delivery codec* (COMpressor/DECompressor, or how the video data is actually encoded. The "file type," like "MOV" or "MP4," etc is a *container* and a container might be able to hold any number of codecs. "Mp4" files can hold non-mp4 encoded data, but that's a different issue).
Delivery codecs like AVC are intended for final distribution, NOT capture or editing. When you worked at EFX you may have encountered this, although, being an audio house it's likely the sound designers/mixers were working off low res MPEG-2 files, not the original D1 files. Point being AVC was designed as the file compression for Blu-ray/HD-DVD delivery, not for initial capture and editing. Like JPEG, MPEG-2 or h.265/HEVC, AVC (officially "MPEG-4, PART 10" too many fugging names. Stick with AVC) is a *lossy* codec which throws away image data to save file space.
Uncompressed 1920x1080 video is 5.93 megabytes per frame, or 142.38 MB/sec (at 24fps film rate. PAL or NTSC frame rates are higher - 25 or 29.97), or 500.56 Gigabytes per hour, so, obviously we need to compress the crap out of the data to put a 2-hour movie on a 25GB Blu-ray. 50:1 compression ratios are common. Netflix steams at close to a 100:1 compression ratio.
Now, AVC spec was finalized in early 2004. The state of hardware then is gonna have a lot to do with the rest of this little history lesson/rant. SD card storage would cost you a good $100 for 32MB. A hard drive could cost $200 for 200 MB. An SSD? Not on the market. Optical disk and tape were your cheap storage media. YouTube wasn't a thing yet. Broadband internet might have been a mighty 5 megabits/second (A POX on the marketing bastards who started using bits instead of bytes to inflate their numbers for ad copy. A second POX on the marketing bastards who started rounding down binary number progressions to inflate their numbers for ad copy. 1 billion bytes isn't a Gigabyte - 1,073,741,824 bytes is a Gigabyte!).
Enter VFR. Remember, VFR is a method of encoding frames in AVC codec... We're trying to save space on a limited optical disk, expensive storage drive, or slow internet connection. Now, the super simplified version here is MPEG-type compression stores one complete frame every few seconds and stores delta information - changed pixels - for everything else. Effectively the codec is guesstimating the image based on fragmented data (now you know why YouTube/streaming video, or DVD/Blu-ray video has so much image banding - high compression and interpolation from missing data). VFR was a space saver. If doing something like, say showing a Roll 20 screen while talking and nothing on screen changes at all for 30 seconds then you can save a lot of bandwidth by shifting from 30 frames/sec playback to, say 10 fps.
So VFR had a legit purpose based on hardware limitations at the time. Here, I repeat, because this is quite important, the AVC encoding, including use of VFR is supposed to be done LAST once the final program is edited and ready for final distribution.
Enter the camera phone and DSLR video.
Remember, it was only a bit over a decade ago both the Smartphone launched and DSLR cameras started recording video. Back then the top speed for an SD card was about 10mb/s. Blu-ray video is at 20mb/s while Netflix streams at 14mb/s. Here come all these new devices trying to shoot video that can't shift the data fast enough for a capture/edit codec. Solution, record directly to AVC. And use VFR. Another advantage to VFR for the hardware guys is that the 1/6" sensor used in phones has less than 1% of the surface area of a 35mm sensor. A lower frame rate on recording means that tiny little sensor has more time to gather light so you can actually shoot in low light.
So, all that background comes down to the phone and camera engineers made a compromise to overcome the limits of their storage capacity and transfer bandwidth. And VFR became widely used.
Enter the GPU engineers - Intel, Nvidia and AMD all built in hardware encode/decode for AVC on their cards. Well, hell! When someone is recording a screen capture obviously they want most of their computer resources devoted to what's being captured, not running the recorder! Them Tik-Tok kids gotta record their 1st person shooters for other kids who would rather watch a game than play one (this is incomprehensible to me - and that includes professional sports. Watching someone else play a game - unless it's a tutorial - is dumb. No offense, guys)! Well, gosh, there's a hardware encoder for the recording and we want to keep drive access down in case the recorded game needs to load in data, and we want to keep file sizes low... Let's use VFR!
And no, the people who made these decisions didn't think about editing at all. Edit a phone video or a screen capture? Surely you jest!
Thus, really bad decisions were made and millions of people wonder why the video they are trying to edit has audio drift... It's because a couple of hundred smart engineers came up with clever idea to get around hardware issues but didn't even consider where this would cause issues.
Now that SD cards can top 100mb/s transfer speeds, and getting up to terabyte capacity, phone sensors are getting larger, and SSD media is now cheap there is no fugging reason for anyone to still be recording in VFR and the teams of engineers who are still designing products that do are no longer clever people making the best of a bad situation - they are ossified thinkers who have failed to recognize the VFR hack is no longer needed and causes more issues than it solves.
Trust me, this really is the simplified version.
Yours,
IronMike
06-Sep-2022