如何在Java中使用Amazon Polly启用神经网络文本转语音(NTTS)功能

huangapple go评论83阅读模式
英文:

How to enable Neural Text-to-Speech (NTTS) in Java using Amazon Polly

问题

package com.amazonaws.demos.polly;

import java.io.IOException;
import java.io.InputStream;

import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.DescribeVoicesRequest;
import com.amazonaws.services.polly.model.DescribeVoicesResult;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.Voice;

import javazoom.jl.player.advanced.AdvancedPlayer;
import javazoom.jl.player.advanced.PlaybackEvent;
import javazoom.jl.player.advanced.PlaybackListener;

public class PollyDemo {

	private final AmazonPollyClient polly;
	private final Voice voice;
	private static final String JOANNA = "Joanna";
	private static final String KENDRA = "Kendra";
	private static final String MATTHEW = "Matthew";
	private static final String SAMPLE = "Congratulations. You have successfully built this working demo of Amazon Polly in Java. Have fun building voice enabled apps with Amazon Polly (that's me!), and always look at the AWS website for tips and tricks on using Amazon Polly and other great services from AWS";

	public PollyDemo(Region region) {
		// create an Amazon Polly client in a specific region
		polly = new AmazonPollyClient(new DefaultAWSCredentialsProviderChain(), 
		new ClientConfiguration());
		polly.setRegion(region);
		
		// Create describe voices request.
		DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();

		// Synchronously ask Amazon Polly to describe available TTS voices.
		DescribeVoicesResult describeVoicesResult = polly.describeVoices(describeVoicesRequest);
		// voice = describeVoicesResult.getVoices().get(0);
		voice = describeVoicesResult.getVoices().stream().filter(p -> p.getName().equals(MATTHEW)).findFirst().get();
	}

	public InputStream synthesize(String text, OutputFormat format) throws IOException {
		SynthesizeSpeechRequest synthReq = 
		new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
				.withOutputFormat(format);
		SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);

		return synthRes.getAudioStream();
	}

	public static void main(String args[]) throws Exception {
		// create the test class
		PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_1));
		// get the audio stream
		InputStream speechStream = helloWorld.synthesize(SAMPLE, OutputFormat.Mp3);

		// create an MP3 player
		AdvancedPlayer player = new AdvancedPlayer(speechStream,
				javazoom.jl.player.FactoryRegistry.systemRegistry().createAudioDevice());

		player.setPlayBackListener(new PlaybackListener() {
			@Override
			public void playbackStarted(PlaybackEvent evt) {
				System.out.println("Playback started");
				System.out.println(SAMPLE);
			}
			
			@Override
			public void playbackFinished(PlaybackEvent evt) {
				System.out.println("Playback finished");
			}
		});
		
		
		// play it!
		player.play();
		
	}
} 
英文:

I am trying to use Amazon Polly to convert text to speech using Java API. As described by Amazon there are several US english voices which support Neural. <https://docs.aws.amazon.com/polly/latest/dg/voicelist.html>

The code I am following to run in Java application is as following:

package com.amazonaws.demos.polly;
import java.io.IOException;
import java.io.InputStream;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.DescribeVoicesRequest;
import com.amazonaws.services.polly.model.DescribeVoicesResult;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.Voice;
import javazoom.jl.player.advanced.AdvancedPlayer;
import javazoom.jl.player.advanced.PlaybackEvent;
import javazoom.jl.player.advanced.PlaybackListener;
public class PollyDemo {
private final AmazonPollyClient polly;
private final Voice voice;
private static final String JOANNA=&quot;Joanna&quot;; 
private static final String KENDRA=&quot;Kendra&quot;; 
private static final String MATTHEW=&quot;Matthew&quot;; 
private static final String SAMPLE = &quot;Congratulations. You have successfully built this working demo of Amazon Polly in Java. Have fun building voice enabled apps with Amazon Polly (that&#39;s me!), and always look at the AWS website for tips and tricks on using Amazon Polly and other great services from AWS&quot;;
public PollyDemo(Region region) {
// create an Amazon Polly client in a specific region
polly = new AmazonPollyClient(new DefaultAWSCredentialsProviderChain(), 
new ClientConfiguration());
polly.setRegion(region);
// Create describe voices request.
DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();
// Synchronously ask Amazon Polly to describe available TTS voices.
DescribeVoicesResult describeVoicesResult = polly.describeVoices(describeVoicesRequest);
//voice = describeVoicesResult.getVoices().get(0);
voice = describeVoicesResult.getVoices().stream().filter(p -&gt; p.getName().equals(MATTHEW)).findFirst().get();
}
public InputStream synthesize(String text, OutputFormat format) throws IOException {
SynthesizeSpeechRequest synthReq = 
new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
.withOutputFormat(format);
SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);
return synthRes.getAudioStream();
}
public static void main(String args[]) throws Exception {
//create the test class
PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_1));
//get the audio stream
InputStream speechStream = helloWorld.synthesize(SAMPLE, OutputFormat.Mp3);
//create an MP3 player
AdvancedPlayer player = new AdvancedPlayer(speechStream,
javazoom.jl.player.FactoryRegistry.systemRegistry().createAudioDevice());
player.setPlayBackListener(new PlaybackListener() {
@Override
public void playbackStarted(PlaybackEvent evt) {
System.out.println(&quot;Playback started&quot;);
System.out.println(SAMPLE);
}
@Override
public void playbackFinished(PlaybackEvent evt) {
System.out.println(&quot;Playback finished&quot;);
}
});
// play it!
player.play();
}
} 

By default its taking the Standard of the voice of Matthew. Please suggest what needs to be changed to make the speech Neural for the voice of Matthew.

Thanks

答案1

得分: 2

感谢 @ASR 的反馈。

我成功找到了您建议的 engine 参数。

我解决这个问题的步骤如下:

  1. pom.xml 文件中将 aws-java-sdk-polly 版本从 1.11.77(正如他们在文档中提到的)更新到最新的 1.11.762,并构建 Maven 项目。这会为 SynthesizeSpeechRequest 类带来最新的类定义。在 1.11.77 版本中,我无法在其定义中找到 withEngine 函数。
<dependency>
 <groupId>com.amazonaws</groupId>
 <artifactId>aws-java-sdk-polly</artifactId>
 <version>1.11.762</version>
</dependency>
  1. withEngine("neural") 更新如下:
SynthesizeSpeechRequest synthReq = 
		new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
				.withOutputFormat(format).withEngine("neural");
  1. https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html 中所定义,神经音色仅在特定区域可用。因此,我选择了如下设置:
PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_2));

在完成上述步骤后,神经音色正常工作。

英文:

Thanks @ASR for your feedback.

I was able to find the engine parameter as you suggested.

The way I had to solve this is:

  1. Update the aws-java-sdk-polly version from 1.11.77 (as they have in their documentation) to the latest 1.11.762 in the pom.xml and build the Maven project. This brings the latest class definition for SynthesizeSpeechRequest Class. With 1.11.77 I was unable to see withEngine function in its definition.
&lt;dependency&gt;
&lt;groupId&gt;com.amazonaws&lt;/groupId&gt;
&lt;artifactId&gt;aws-java-sdk-polly&lt;/artifactId&gt;
&lt;version&gt;1.11.762&lt;/version&gt;
&lt;/dependency&gt;
  1. Updated the withEngine("neural") as below:
SynthesizeSpeechRequest synthReq = 
new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
.withOutputFormat(format).withEngine(&quot;neural&quot;);
  1. As defined in https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html Neural voice is only available in specific regions. So I had to chose as following:
PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_2));

After this Neural voice worked perfectly.

答案2

得分: 0

我假设您正在使用AWS Java SDK 1.11

AWS文档此处指出您需要在语音合成请求中设置engine参数为neural。AWS Java SDK文档此处描述了将其设置为neuralwithEngine方法。

附注:文档页面似乎未提供方法URL,因此您将不得不搜索它。

英文:

I am assuming you are using AWS Java SDK 1.11

AWS documentation here states that you need to set the engine parameter in the speech sysnthesis request to neural. AWS Java SDK documentation here describes the withEngine method to set it to neural.

PS: the documentation page doesn't seem to provide the method URLs, so you will have to search for it.

huangapple
  • 本文由 发表于 2020年4月10日 06:42:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/61131383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定