Watson Speech to Text Integration
Available since version 3.3.0
This feature is not AEM as a Cloud Service compatible, and can only be used on AEM 6.5.
This feature is AEM 6.2 ONLY!
Purpose
Use IBM’s Watson Speech to Text transcription service to extract transcriptions from Assets.
How to Use
The way this works is that when a video or audio file is run through the TranscriptionProcess
workflow process,
it first generates a FLAC file of just the audio, then passes that file to the Watson web service, getting a Job ID back.
The workflow process then polls to see if the job is complete, finally saving the rest of the process into a rendition of
the asset named transcription.txt.
To use this, first create a configuration for com.adobe.acs.commons.http.impl.HttpClientFactoryImpl
(with some unique name)
with factory.name
property set to watson-speech-to-text
, the credentials and hostname for Watson.
Note: This OSGi configuration factory is named ACS AEM Commons - Http Components Fluent Executor Factory in /system/console/config
For example, on the hosted BlueMix platform, this might look like:
Then, add the Transcription Process
to a workflow (either the DAM Update Asset Workflow
or a separate workflow).
Watson automatically chunks the audio based on pauses and attaches a timecode to each chunk. These time-codes are used in the transcription.txt rendition like this:
[0.66s]: hello hello hello
[3.27s]: hello hello hello hello
FFMPEG Configuration
In order for this process to work, FFmpeg must be installed and must be on your path. See the AEM documentation for general instructions for installing FFmpeg.
FFmpeg must also be capable of creating FLAC file. To confirm, this you can run ffmpeg -formats
and confirm that flac is listed like this:
DE flac raw FLAC
You can also go to the flacmono
Video Profile page in AEM (i.e. http://localhost:4502/etc/dam/video/flacmono.html) and upload a
test file.