Damn cat... This really is the shorter answer...
I typed out a long detailed answer to Conrad's question, left my phone for a second to grab more tea and the cat walked across my phone and closed the response tab.
It took me well over an hour.
So, here's the short form, with a "you're gonna have to trust me here, because I'm not typing it all out again."
AVC video isn't a "set standard." It's a "family of standards" with 14 major revisions between 2004 and 2021.
We will ignore the dozen or so other encode variables and flags besides CFR/VFR and focus on only one - "Interleaved Audio." Interleaved audio writes pointers into the audio file to make certain audio sample >X< lines up with video frame >Y<. Without interleaved audio, the audio runs on its own clock. I guarantee Red's screen recorder doesn't interleave audio.
There are almost a dozen different AVC encoders available from different companies - not every encoder supports every possible AVC flag or variation.
The Microsoft AVC encoder in Windows 11 is fundamentally broken, and every major NLE issued a bug fix in 2022 specific to the broken M$ encoder. Red is using an M$ screen recorder on Win 11. He's starting with broken files.
A media player only streams and discards frames on the fly. A. NLE has to decode, hold and manipulate each individual frame for processing.
A NLE will perform some sort of cache/proxy creation when importing media to speed up display of frames on the Timeline. This cache is not used on export renders - rather the original media file is used.
There are multiple - near infinite - ways for the Editor to parse the ever changing variable frame of VFR footage into a 29.97 fps Timeline. Let's assume for a moment we have an entire second of footage at 13 fps on a 29.97 Timeline. 13 does not divide evenly into 29.97 - each 13fps frame takes up about 2.31 frames of the 29.97 Timeline. There is no such thing as a fractional frame in an NLE. How would you like to parse out that 2.31 into integers. Where do you hold for 2 frames and where do you hold for 3? There are multiple solutions for this, and it's nothing compared to the constantly changing frame rate of VFR footage?
Because the NLE caches on ingest, the frames are parsed one way for the edit. Because the cache isn't used on export the original media is re-parsed during render. Since it's a multiple solution problem, it's unlikely the computer will parse the frames the same way again.
The VFR file isn't intended to be edited to begin with. A bad metaphor is editing a VFR file is like trying to find a correctly sized flat head screwdriver to jam into a hex-head bolt and hoping you get the leverage to turn the bolt without the screwdriver slipping and slamming your hand on the table. Using CFR footage to begin with is like having a proper set of allen wrenches for that hex-head bolt.
AVC files aren't intended for editing in general. There are better codecs for editing. Using AVC in an NLE is like squirting an epoxy around a washer at the end of that hex bolt and hoping it holds. Using an edit codec is having the proper nut.
In short, Conrad is asking why there isn't a singular solution for a variable solution problem that all begins with using the incorrect tool to begin with and wondering why a jury-rig isn't as solid as something correctly built.
My prior post, above gave the two viable solutions - use a screen recorder that records CFR with interleaved audio in the first place, or use a third party tool to transcode the VFR to CFR with interleaved audio before editing so the NLE doesn't have to make multiple guesses in near real time. Wanting the NLE coders to all figure out a single solution to a near infinitely variable input problem isn't going to happen. Use an allen wrench and proper nut with that hex bolt, not a flat head, washer and blob of epoxy.
Final/different comparison, since Conrad mentioned Notepad.
Notepad is fuggingn simple. Pure ASCII encoding with a word-wrap flag to let Notepad size text to fill the display window. A media player is Notepad.
Word processors are much trickier. Now we're talking about ASCII AND Unicode. Gotta have Unicode for emojis! 🙄 Now we're embedding page size data, margins, fonts, font sizes, font formatting (bold, ital, underline, stroke through, etc), font colors, number of columns, column margins, column wrapping, table sizes, cell sizes, cell borders, cell shading, embedded images, and a whole host of other things juts to print out a page of of text. An NLE is a word processor.
Now, when I sent out all my custom character sheets I sent out DOC, DOCX, *and* RTF versions? Why? Because Fred can't open one, Rusty can't open the second, and Conrad can't open the third format without their word processor fugging up my formatting. Can't these programmers all figure out how to properly open a static file like a page of text in a table correctly? Why do we even NEED doc, docx and rtf file types? Oh, right, first of all so M$ can charge money to license three different proprietary formats, and also because docx is the one that's openXML based and can use Unicode along with ASCII while RTF was the one developed to be compatible with whatever propriety crap Apple invented for their Word Processor.
It's Capitalism at work! Proprietary products instead of something that works! Like Thunderbolt literally being inverted USB-C! Need to make that patent money and change things up every few years to force customers to upgrade!
There's your Age of Shoddy.