How much is the AI necessary for this? At least for the targeting of sounds in the line of sight, that should be fairly easy to do without AI, but I don’t know about the human voice identification.
> but I don’t know about the human voice identification.
> The headphones send that signal to an on-board embedded computer, where the team’s machine learning software learns the desired speaker’s vocal patterns
Yeah I'm not really sure what's going on here. Sonar has been using ML classifiers for decades but afaik stream splitting with 100% confidence is currently considered magic. So what did they apply or what advance did they make? Afaict they threw some audio into a GPT blender without a closer look at what's being done.
Edit: I found the link to the paper. It isn't stream splitting so much as it is GPT-assisted beamforming estimation. Good stuff for sure.
I think one could build quite the good system with 2 directional microphones and then do some beamforming or how it is called to isolate the depth one want to perceive.
But this is super expensive since you need calibrated mics etc.
The biggest advantage of neural nets in this field is that you can use a dirt cheap microphone and postprocess it so good that it is good enough or even very good for humans.