Idea - Speech synthesis with emotion in blender

So text to speech synthesis is a field that has matured recently,

could we start generating dialog with emotion in blender?

could this be a good thesis subject for some kind coder?

https://www.google.com/url?sa=t&source=web&rct=j&url=http://habla.dc.uba.ar/gravano/ith-2014/presentaciones/Schroeder_2009.pdf&ved=0CC4QFjAFahUKEwiX_Liu8qbIAhWGj4AKHfvqD1k&usg=AFQjCNFwj5ewdGnFLSDwTVNewl4qnRG4XA&sig2=fT3znJCao6PZB2sSQTU3Qw

Perhaps it might find some use as an animation tool, but to be really useful may also require an automated lip-sync system (that works on more than generic human meshes). This means that this at the least would be a really involved project, and I don’t think the core developers have any time for something like this when there’s a large number of more pressing todo items that need to be done first.

If you can find a developer that wants to do this, good for you, but I wouldn’t hold my breath on it.

Developers have priorities…

Although you are more than welcome to hire a dev to work on this

it was more of a someday,
than a request.

if the fruit hangs low,
pluck it so that you may be full.

yet there are other trees.

Natural sounding speech synthesis is not a low-hanging fruit. It is difficult enough that companies focused purely on natural sounding speech synthesis (without emotion) can charge huge sums for others to use their technology. Having it sound natural with emotion is near the top of the tree, with thorns and brambles around it, even for them.

I would love for natural speech synthesis with emotion to be available for Blender, and by extension to games & films created using Blender. It’s not going to happen anytime soon. Not because of any fault with the Blender Foundation, open-source development, or motivation… simply because it’s a super difficult job that isn’t something someone can work on in their basement.

I think people sometimes find they can do hard things when they try.

I don’t think that person is me,
but I know clever people sometimes solve most of the problems they encounter by testing, thinking, changing and testing again.

I know there are open source ones, but I have not even really messed with creating sound in real time myself, just using premade fx. (except mega man style charge sound effects by using really short sounds and pitch)

I know that people can do hard things when they try. That’s not the point.

The thing is that “natural speech synthesis with emotion” is still an unsolved problem in the academic arena. It’s not like someone just needs to implement an existing theory, like was done with Least Squares Conformal Maps. They need to work out how it is done in the first place.

Could it be done? Sure could. Likelihood of it being done in Blender before academia manages to work out how it’s done? Got more luck buying a lottery ticket :wink:

I wonder if this is one of those pattern recognition problems quantum can solve :smiley:

google, can I borrow your D-wave 2 for a few days?

No, it’s not. Much as I would love to throw a problem at Google’s computing resources - this isn’t one amenable to pattern recognition. Especially without a recording of someone reading War & Peace… without emotion, angrily, sadly, happily, and whilst depressed. For patterns to be recognised, you need the data for the patterns to appear in.

That is what I was talking about,

you would need a human to label speech correctly,

I think the same sentence read angry , sad, happy etc, would be good, and read by many people.

and then patterns could be observed, for each category,

In some cases the software has actually gotten pretty close for certain situations, but the sophistication and complexity of speech means you have a rather large number of variables you need to take into account for (and I wouldn’t want to be the coder who has to try to improve things in the dozens of corner cases that could crop up).

Just what if …

Besides the basic shapes cube, tube etc, there was a natural human model, (and maybe a few more enough artist here).
These models would come rigified.
And then we just extended the standard rigify model, to have bones to use for lipsync.

Its i think realy a short development path … lipsync is opensource too
(there is even a python libsync project http://www.lostmarble.com/papagayo/ )

After we can agree on a few standard (makehuman?) and a few more models…
Its only libsync thats need coding a bit, but people have done that before with blender.