Remastering speech using Adobe Enhance Speech AI?

jjcoomber · Post by **jjcoomber** » Tue Dec 20, 2022 10:16 am

I recently found Adobe Podcasts AI 'Enhance Speech', and my first thought was to try using it to 'enhance' the speech files from old adventure games. I decided to try it out with Sam & Max Hit the Road! but I'm not so technically minded (codewise) so I've run into a few problems...

First of all, finding a reliable way to extract the speech audio from monster.sou - I found a few ancient posts here in the forum, but no real answers. I tried both ScummvmEx and Scummvm Revisted, and whilst I was able to dump files individually I couldn't find a way to do it in bulk (and there are 3377 files

). I know it should be possible to edit the source code of Scummvm Tools compress_scumm_sou so it doesn't delete it's temp files and labels them incrementally, but I'm afraid that is beyond me...

I was able to find a more generic file ripper (X-Ripper 1.5), which seems to work. Though when I tried it with monster.sou it produced absolutely huge (but functioning) voc files - I didn't even get through 10% of the files and it was already hitting 22gb! I managed to get around this by instead using the compressed monster.sog and the ripper was able to read and dump the contained ogg files (.sof didn't work).

So, I finally have the 3377 ogg files for each voc track (labeled based on the original identifying number from Scummvm Rev), which I converted back to wav and joined together into three continuous files - I did this last step as Adobe Podcast Enhance Speech processes one file at a time with a max filesize of 1hr or 1gb and only handles wav or mp3.

After the AI has done its thing I downloaded the 'enhanced' wav, and it seems to have worked pretty well. The new 48khz files aren't perfect and could probably do with some tweaking, but there's now a lot more 'headroom' in the files and they don't sound as 'crispy'. It fails to correctly clean singing, so for now Bumpus' song will have to wait. The answering phone messages are completely cleaned up too, so they would need downgrading again to correctly sound like 'phone quality'.

Now the question is, what can actually be done with these files? Once I've worked out the best way to resplit the wav, how would I go about recreating the sou(or compressed sog/sof)? And most importantly does ScummVM actually accept speech files that are more than 22.05khz (I couldn't find any info in the wiki)?

Going forward I'd love to try this out on CMI and Broken Sword as well, but again I'm not sure what method to use to extract the audio files. I realize too, that distributing the 'enhanced' files is not really possible as it's legally murky waters (but I know the Revolution gave permission for ScummVM to distribute the original cutscenes, so they may be open to discussion).

Maybe it would be possible to eventually create a tool that does this on the user end, so no copyrighted files are distributed?

I know that there's also Nvidia RTX voice, but I'm not sure how that would compare to Adobe's AI.

rzil · Post by **rzil** » Tue Dec 20, 2022 8:17 pm

Hi, thanks for trying this.

For building compressed monster file (sof/sog/so3), it is possible to tweak remonstered tool from (I'm the author):
https://github.com/BLooperZ/remonstered/tree/develop
viewtopic.php?t=14506
(I also personally used it for adding hebrew fandub in Sam & Max and DOTT - still WIP)

For extracting VOC files from uncompressed monster.sou, you could try this script:
https://github.com/BLooperZ/remonstered ... ore/sou.py
(which will also give some files for using with the tool above (offset mapping and lip-sync tags))

As you said, there is also a problem in regards of distributing the enhanced samples. (see here: https://github.com/scummvm/scummvm/pull/4543)
it's best if the samples can be enhanced on the user end. (otherwise, I think the next best option is probably recording all speech samples from scratch)
to see if it is feasible to create the enhanced samples,
from your experience, do you have a general idea of required system specification, the time it takes to enhance the samples and if it can be automated?

Thanks, feel free to contact me if you'd like help using the above tools

jjcoomber · Post by **jjcoomber** » Tue Dec 20, 2022 10:06 pm

Thanks for the response, I'll definitely check out remonstered - b̶u̶t̶ ̶i̶s̶ ̶i̶t̶ ̶s̶t̶i̶l̶l̶ ̶s̶u̶p̶p̶o̶r̶t̶e̶d̶ ̶i̶n̶ ̶S̶c̶u̶m̶m̶V̶M̶ ̶2̶.̶6̶ ̶o̶n̶w̶a̶r̶d̶?̶ ̶A̶s̶ ̶I̶ ̶s̶e̶e̶ ̶i̶n̶ ̶t̶h̶e̶ ̶[̶i̶]̶r̶e̶m̶o̶n̶s̶t̶e̶r̶e̶d̶[̶/̶i̶]̶ ̶g̶i̶t̶ ̶'̶i̶s̶s̶u̶e̶s̶'̶ ̶t̶h̶a̶t̶ ̶S̶c̶u̶m̶m̶V̶M̶ ̶s̶u̶p̶p̶o̶s̶e̶d̶l̶y̶ ̶d̶r̶o̶p̶p̶e̶d̶ ̶s̶u̶p̶p̶o̶r̶t̶ ̶f̶o̶r̶ ̶M̶P̶3̶/̶F̶L̶A̶C̶/̶O̶G̶G̶ ̶d̶u̶e̶ ̶t̶o̶ ̶a̶ ̶r̶e̶w̶r̶i̶t̶e̶ ̶o̶f̶ ̶t̶h̶e̶ ̶i̶M̶u̶s̶e̶ ̶e̶n̶g̶i̶n̶e̶?̶ - Seems like it is still supported, and re-compressed audio file were only dropped in Full Throttle, The Dig and The Curse of Monkey Island.

Unfortunately, the whole process is not super userfriendly right now. Adobe Enhance Speech is a browser based tool, which only supports one file at a time, and takes a while to actually process each file (wav seems to take a lot longer than mp3). I'm not even sure how long it will remain available/free, as it seems like their Podcast suite is still in beta. It will aldo probably need manual tweaking to get the best out of the AI results - it produces something more akin to a 'vocal booth' so it's a little bit flat, and could also do with some reverb to match the game scenes.

Luckily the AI didn't mess with the duration of the clips at all, so I was able to use a cue file I made of the original files to resplit the "enhanced" version. I'll see what I can do with remonstered, to take the 'extra' data (lipsync etc) from the original monster.sou and merge it with the new files.

Overall the result with Sam & Max Hit the Road aren't a massive improvement (and maybe not worth all of the effort), like some of the (not yet implementable) 'graphics upscales'. But it might be more noticeable on other games - Sam has a pretty monotone delivery, so the changes aren't huge with him, but other human lines do sound cleaner. I suppose I'm just fiddling more for fun and as a 'proof of concept'.

I think the best use case is going to be Broken Sword 1, as that's only 11.025khz, and even the Director's Cut version still has the bad speech quality (so I guess the original recordings were lost). I'm working on the extraction side of that now (which is a different approach than .sou), but annoyingly the GOG version of the original game only includes an already ogg compressed clv rather than the original clu (and my disk versions are somewhere in storage back in my home country). I think I might have it on steam or humble bundle too, so I'll give those a check in the meantime. I've also found fre:ac to be unreliable and randon files don't correctly convert, so I've resorted to using VLC which does give a lot more accurate and error free conversions, but is painfully slow...

One thing I've noticed with Broken Sword 1, is that there are a lot of duplicate lines from the CD1 file in the CD2 file. I'm guessing that it's mainly for inventory items that are carried over between the two discs? But it seems to be even more than just that. S&MHtR had around 2.5hrs of audio, where as I'm getting closer to 9hrs on BS1 including the dupes.

invwar · Post by **invwar** » Fri Dec 23, 2022 11:28 pm

Thank you for this tip, the idea itself is great!
I immediately had to test it with some audio files.
First I tested it on Simon the Sorcerer 2 German version. The recordings were back then really badly compressed and thus are missing a lot of pitches. Furthermore the original files are confirmed lost, so this is all we will ever have. A perfect candidate, if this would work.
Unfortunately I can tell you directly, it doesn't. Of course I can't share them, as they are not public. The first example was just clearly sound different, but neither better or worse - just different suboptimal. But already the second test resulted in an absorbed sound - so an absolute no go. Sad, as the audio is in one single audio file, so very convenient for conversion.
Next attempt was Bud Tucker in Double Trouble in German. In itself a bad game with a terrible voice cast, but they well used the available space on the CD and the audio is already in an OK quality. The conversion sounded much better, but I am quite convinced that there is at the end some mechanic noise in the voice, which wasnt there before - so also not optimal, but much better.

So what i think learned from it is: This tool is not a magic enhancer, it is just working like any modern Active Noise Cancellation Headset and identify noises and removes them. Thanks to post processing of course more effectively as any Sony can, but it won't magically improve your audio quality. Bad compressed/low frequency audio will be still as it is.
This said, I am sure we might see in the future tools who can do what you want to achieve. There is an interest for the industry to improve AI to improve remastering, e.g. "crappy" 2.0 sound to full 5.1 Mastery. Even LucasFilm had their problems with it when they released the Original Star Wars on DVD.

jjcoomber · Post by **jjcoomber** » Sat Dec 24, 2022 10:06 am

Unfortunately, from what I've read it doesn't work as well for non-English languages - even trying to 'invert' English words from other languages audio. So that night be why your tests weren't as fruitful.

I've finished running through Broken Sword 1 (original and DC), but I've got to listen through everything to see if there's any major errors*. Then I'm gonna try and reach out to Revolution and maybe they'll add it as a patch for the DC version (as I doubt they'll patch the original). It then may be possible to create a tool that converts that new DC patch to the original files, so everyone (who also owns DC) can legally patch their original versions too

- or maybe I'm just gettibg carried away!

*I not sure if I've also discovered a weird a Easter egg in the DC voice files as I can't find it mentioned elsewhere (It may be in the DC version and I just missed it). It's Rolf Saxon impersonating the goat, and in a fourthwall breaking monologue, talks about what's so bad about the iconic goat puzzle.

Veda · Post by **Veda** » Wed Dec 28, 2022 8:41 pm

Maybe this could be useful:

https://github.com/mindslab-ai/nuwave2

And here's an (maybe-not-so-crazy) idea: training it using original and remastered voice files from Day of the Tentacle...

Ideally that could be used to get rid of noise/compression artifacts and upsample voices from games using the same compression algorithm (Fate of Atlantis? Sam & Max?)

MusicallyInspired · Post by **MusicallyInspired** » Fri Dec 30, 2022 7:06 am

I tried this out with some Sierra games and had stellar results with King's Quest 5 (possibly the Sierra talkie with the worst quality dialogue recordings). Couldn't believe it. It doesn't work on everything though as its model is not really trained to enhance audio from lower sampling and bitrates. But some of theme really get by quite well regardless! AI is really the future in multiple ways.

Remastering speech using Adobe Enhance Speech AI?

Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?

Re: Remastering speech using Adobe Enhance Speech AI?