I think anyone who is mixing narrative content that will end up on a streaming or broadcast service must own some part of the responsibility for dialog being hard to understand. Especially if we remove from the equation the very modern trend in acting to “mumble”. Most movies aren’t mumbled, leaving the majority of a films dialog eminently capable of being understandable. But it often is not.
A contributing factor might be that the stereo mix, the one that most viewers listen to at home, is most often not mixed at all. Hence, not optimized for the rigors of the home environment, which is where most of the problems lie. Let me explain.
Most of the studios and streamers mandate that we final mix in the highest speaker format, usually that means Home ATMOS...even if it represents only a fraction of the audience that will view it in that format. We have numbers from the streamers that look something like this: 75-80% of their audiences watch in Left/Right Stereo, 20-24% in 5.1 and 7.1. And, hold on to your hats, 1% in ATMOS... and 1% of that 1% have discreet overhead speakers.
In spite of those number, most studios and streamers still mandate an ATMOS near-field as the “hero” mix and allocate minimal or no time for the Stereo mix; the mix that will be heard by most listeners in the home. The near-field mix is meant to accommodate home viewing and the added challenges it creates for dynamic range and dialog intelligibility and, very often, sound very good...because it was mixed. The Stereo mix that 75-80% of the audience will consume? It is rendered most often in the background, automatically, with an algorithm and might never actually be manually mixed or, god forbid, listened back to. This paradigm seems patently backwards and counter productive.
On my last streaming project, I asked that we chop the final mix days down for our 5.1 “near-field” hero master mix and add those days back into making a proper stereo mix, one where we actually monitored in stereo and…mixed it. The studio agreed. And you know what we spent most of our time on in that stereo mix...the one most people will listen to? Getting the dialog clear, controlling dynamics, and making it a seamless experience for a viewer.
We whom are making these mixes often aren’t, through fiat or ignorance or laziness, making great mixes that stand up to listening in stereo. In some cases, the stereo mix that you hear at home is not even a sound asset that mixers had anything to do with. It is derived at the transmission point with a single ended, streamer created algorithm that down-mixes the hero mix (Home ATMOS, 7.1 or 5.1) without any oversight or quality control. Yes, computers are mixing what we listen to.
I am not making a blanket statement about all mixes, just a provincial statement about what I see here in Los Angeles. There ARE great mixes on Netflix and Amazon etc. And perhaps these were as a consequence of a conscientious mixer complaining to the studio about how backwards their deliverables process is and...how limited their budgeting is for doing the work that is quite essential...to make dialog intelligible.
If you have heard a great stereo mix and, by extension understood the dialog, it’s probably because someone spent time on it and safeguarded it. Sadly, I think that is the exception, not the rule. While I have good equipment in my home and my wife understands sound better than most, she still turns on the subtitles because she knows what dialog is supposed to sound like and most of the time, even on my movies, she can’t hear it well enough to turn those subtitles off.
Mea Culpa my dear...