As we know, we generally refer to Transcription as the process of converting spoken words into written form. For legal, medical, academic, and other professional purposes, transcription is crucial in the documentation process. Therefore, it is a fundamental need for those users to capture the speech and then transcribe it into a written format in the most efficient way. There are a number of software programs designed to perform transcription, but the best choice for a particular task depends on the specific needs of the software user.
So let’s take a look at the solutions currently available and offered on the market. There are many vendors and platforms, each with specific characteristics and pros and cons. Artificial intelligence, and precisely the area and techniques of automatic speech recognition, is the basis of all transcription software. Therefore, vendor competition focuses on customer experience, cost optimization, level of accuracy, and compatibility with other tools.
Main Features of Transcription Software
You can find an overview of Transcription Software and its fundamental technology. Here below the relevant points to remember while walking through the following solutions and software:
- Ability to process Audio/Video Files, that the User provides, and generate the Transcripts asynchronously. The transcripts of the audio file are available after some elaboration time, which varies with the length of the audio.
- Ability to generate Live Captions and Real-Time transcripts from streaming audio. There is usually a live event like business meetings, interviews, legal trials, video streaming, etc. And the transcription software elaborates the streaming audio data by generating the transcripts immediately.
- Ability to provide the highest level of accuracy. This will depend on many factors, including the background noise, the quality of the audio, the subject or area of the arguments, and more.
- Ability to provide multilingual support. Not only transcriptions of English speech, but as many languages as possible, including inflection and dialects.
Verbatim vs Non-Verbatim Transcriptions
Another important aspect regarding Transcriptions is whether they are Verbatim or Non-Verbatim (or Clean-Verbatim).
Verbatim Transcriptions are “word for word”, consisting of documenting all speech & utterances, therefore everything from the original source is included in the transcript.
In this way, the transcription is the exact reflection of how people speak. For example, stuttering is tracked, as well as repetition, background noises, coughing, and anything else.
Scripted media, news outlets, and law trials are usually the cases where verbatim transcripts are adopted because they require the exact speech documented.
On the other side, Non-Verbating Transcription also called Cleaned-up Transcript cuts all superfluous speech in order to provide a more readable transcript, but still keeps the original meaning and structure.
Business and University audiences are the consumers of the cleaned transcriptions (and actually they also require additional advanced text/documents interpretation features, to synthesize and summarize large content by extracting only meaningful parts; but this is another segment of the AI).
One Transcriber is a new Transcription solution whose unique value proposition is the real-time automatic transcriptions and captioning of live events in streaming. Almost no one provides such a feature as universally applicable; that is, the live captioning is generally provided by collaboration and communication suites (like Microsoft Teams) but only in the context of their usage (i.e. you can enable live CC captions only during the meetings held within MS-Teams itself and often under some organizational restrictions).
One Transcriber works with a Software for PC, currently available for Windows, that you need to download (it’s free) and install locally. The installation file is quite small, about 26 MB, therefore fast to launch and execute, and not bulky.
The software, installed locally, works integrated with the server part, therefore it requires an online connection (but that’s normal for every software).
You need to select two options: the transcription language and the audio device that is used for the audio/video play. Languages available are: English US, English UK, French, German, Italian, Portuguese, Spanish, Japanese, Chinese Simplified, and Korean.
The particular and unique feature of One Transcriber is the selection of the Output PC Device that will be the audio data source: the audio to transcribe is therefore captured directly from the Output Speakers, unlike almost all other solutions that are basically dictation software capturing the audio from the microphone. The software, when launched, automatically finds all Output Devices activated on the PC (the integrated Speakers, Jabra or Headset devices, etc.), and you select the one that you will be using during the live event or the audio/video playing.
Hence, this feature makes One Transcriber fully compatible with any other Software or Application that reproduces audio.
For example, you want to get live captions and real-time transcriptions of a Business Meeting or a Conference Call: while the software that you’re using (e.g. Microsoft Teams, Zoom, Google Meet, Webex, Skype, Browers for web meetings, … you name it) plays the audio through the Output Speakers (that is, the speech from all participants to the call), you launch One Transcriber by selecting the same Output Speaker and the language, and you get in real-time the transcription and live captions, with the timestamps of each final/consolidated piece or sentence that are saved appended progressively. In the end, you get the entire final script of the conversation with all timestamps and corrections that you can also apply manually during the live transcription.
One Transcriber Software is free to download and launch, therefore you only need to register with a valid email, and no credit card is required.
A Free Trial is offered, with some minutes of free transcription for one week.
The pricing plans offered are the following three, which are quite convenient if compared to the other competitors:
- Small – 24.99€/Month with 500 minutes of transcription
- Medium – 39.99€/Month with 1000 minutes of transcription
- Top – 74.99€/Month with 2000 minutes of transcription
The transcription minutes are renewed every month (not cumulated). And there is also a cost-saving factor with the medium plan (0.04€/Minute) and top plan (0.038€/Minute), compare with the small plan (0.048€/Minute).
Trint is an online platform that provides a collaboration-tool-like user experience for transcription services, and it offers three pricing solutions: Individual, Teams, and Enterprise.
With Trint, you create video or audio transcripts by uploading your media file (from the local PC or from an online drive) and waiting for the transcription process to complete and give back the transcripts with start/end timestamps.
Furthermore, you can then create “Stories”, which are containers where you can assemble or arrange as you prefer, pieces of sentences, written by you, and audio/video transcripts, like lego building blocks.
So the Story is the core of your work. Here you can compose different parts of Videos with Transcriptions (that you generated in the previous step), and adjust the transcripts by manually correcting any mistake in the speech interpretation; in this case, you can help Trint to remember the correction by using the “add word to vocabulary” feature.
In addition, within the Story, it’s possible to also translate the transcript (after it has been generated) into many languages, as well as create captions by “connecting” the transcripts to the video.
The prices are in the range of 44 to 60 € per month for the individuals and the Team’s options. The Enterprise level requires custom pricing that is dealt with and agreed upon offline.
And a free trial is available, for one week.
In conclusion, Trint is really a great platform, mostly for Users who need to create, compose, and arrange Videos with captions (Stories). And the Transcriptions are generated from the Video or Audio files that are uploaded by the User on the online trint platform itself.
HappyScribe offers both Automatic Transcriptions and Manual Transcriptions done by native Transcriptionists, for those who need the highest level of accuracy.
The Automatic Transcription, based on an AI algorithm, offers two choices: Transcriptions and Subtitles. As for most of the other Competitors, the User uploads an audio/video file, which is then processed after some time (depending of course on the size of the audio).
There is the possibility to create a personal vocabulary with a list of specific words (e.g. acronyms, proper nouns) that the tool uses during the elaboration, in order to improve the accuracy.
The pricing is based on hours to buy, in a pay-as-you-go model. The base cost is 12€ per 1 Hour, with a discount rate applied depending on the number of hours that the user buys: 11€/Hour if you buy more than 25 Hours, and 9.8€/Hour from 50 to 75 hours (for more than 75 hours, a custom pricing should be requested to the sales team).
Several languages are available for the transcription, that is the spoken language to be transcribed (not translated..). For the automatic transcription, the declared accuracy is 85% percent, but in general, we know that it’s actually variable depending on many factors, including the audio quality, the context and the argument of the speech, the inflections, how many people are in the speech, etc.
It’s also interesting and useful the possibility to integrate, via Zapier, with other applications and online tools, like Gmail, Onedrive, AWS s3, and more, allowing to load the media files from other online platforms.
The pricing is based on hours to buy, in a pay-as-you-go model. The base cost is 12€ per 1 Hour, with a discount rate applied depending on the number of hours that the user buys: 11€/Hour if you buy more than 25 Hours, and 9.8€/Hour from 50 to 75 hours (for more than 75 hours, a custom pricing should be requested to the sales team).
In conclusion, HappyScribe is a good tool for transcription of Video/File uploaded. It doesn’t offer streaming or real-time events transcriptions (that is a rare capability), but the chance to select a more professional service through the manual transcriptionists, in case the automatic solution AI-based doesn’t fulfill your needs.
Descript is a Software Suite that offers 4 categories of features: Podcast Editor, Video Editor, Screen/Webcam Recording, and Transcription. Here we talk about the latter, although you can combine the 4 four areas based on your needs, e.g. by creating videos, captured via screen-recordings, that are merged with the transcription of the audio.
Descript provides both Automatic Transcription Service, as usual, based on AI-Tools, and Manual Transcriptions through professional native transcriptionists.
The Automatic Transcription is provided in batch mode: for instance, you upload a video/audio file and you get the transcripts, with quite fast turnaround times, even if we know that it could depend on the length of the audio/video file to transcribe. There is no Real-Time or live captioning.
The supported Transcription Languages are 22, and a useful feature is the possibility to add the speakers-labels to the transcript’s part with the help of automatic detection.
Collaboration and Integration features are available, like saving transcriptions documents in other cloud locations, sharing the work with other users, or importing data from external sources.
As said, Descript provides a PC Software Suite, therefore you need to download and install it either on Windows or macOS (El Capitan), and consider also that the installation file is quite huge (greater than 150 MB). The UI of the Software is nice and well organized, you have the chance to access the different features and upload or access the files you need to work on.
Descript offers a Free Trial with 3 hours of transcriptions. Then there are two more paid profiles: “Creator”, with 15€/Month and 10 Hours base included; and “Pro”, with 30€/Month and 30 Hours base included. In case, you can add more hours to the base with an additional cost of 2€ per Hour; e.g., if you select the Pro plan with 30 base Hours and 30€ per Month, you can add 10 more hours with a total of 20 € more, and the final plan will be 50€/Month with 40 Hours. Finally, you can also select the Annual billing instead of the Monthly plan, and you get a 20% saving on the price.
Otter is a SaaS platform (like many others), which means all functionalities and the work are mainly executed online, even if Chrome extensions are necessary for some of them. The user experience has been recently renewed and new features have been introduced in the collaboration domain.
Automatic Transcriptions are available in two basic ways:
- transcription of audio/video files, uploaded by the user. You have the possibility to define custom vocabularies to improve transcription accuracy by adapting to your own needs
- recording your voice from the microphone, by enabling the browser (e.g. Chrome) to access your device, which is substantially a dictation feature.
Therefore, also the real-time transcription of live meetings happens by capturing the audio in the microphone, which actually is not a sophisticated way. Otter’s help guide suggests how to do that: either via the browser (chrome or firefox, which can capture the microphone), where you logged in to otter, opened on the same PC where you run the meetings, or via the Otter’s Mobile App by “putting your smartphone close to the PC speakers where the meeting is held”.
In all honesty, this approach seems quite rough. You could actually use any free dictation software, put close the speakers, and get the transcriptions. Accuracy is affected by noise, volume, and physical setup, and you have no audio channels split or speakers easy identification.
Btw, with the Pro subscription, you can also Record meetings and calls, that you can later transcribe as you want.
So, In order to get “canonical” real-time transcriptions, you need to integrate Otter with other Platforms, getting the transcript of the meeting held in those systems. Two integrations are currently available:
- Integration with Zoom: the business plan is required. Both the “native” zoom integration and the “otter’s assistant” solution (where an otter user dials in the meeting as a separate participant) allow you to get real-time transcripts, but only if you’re the host, not in case you’re the participant. And this is again, another limitation.
- Integration with Google Meet: you download and install the Chrome Extension of Otter.ai to integrate with Google Meet meetings and by enabling the microphone recording button, you get the transcription. It’s available also in the free tier.
Quick note: for the Calendar management, it’s possible to integrate with Google Calendar and Gmail and import data from there.
The free tier offers 600 minutes of dictation transcriptions via the input microphone per month, and 3 videos/audio files to upload and transcribe.
Pro is 8.33$ per month, and offers more features still for individual use, like custom vocabulary, video/call recordings, and more file transcriptions.
The business plan is 20$ per month and includes the Zoom integration plus more collaboration features for Teams and working groups.
Finally, there’s also a custom plan for enterprise, where the price and features are defined ad-hoc.
Verbit is a very famous big platform for Transcriptions mostly oriented to the B2B world, including large enterprises in specific industry segments, like Legal and Court Depositions, Media, Education, and Online Learning.
Verbit leverages a huge network of native professional transcriptionists, and therefore the primary business is aimed at the companies that need contextualized accuracy, human-supervised, and reviewed transcriptions (like exactly the Legal business).
Anyway, here we don’t dive into the Transcriptionists service, let’s see the automatic transcription services offered.
First of all, the Services (including automatic software-based) are only available for Business users; hence, you need to provide a business email in order to be reached by the sales team, which will contact you to discuss the services you need and the price.
This aspect doesn’t make Verbit easily accessible to all users, since it’s not for mass-market but only for business.
In short, the main features are:
- Automatic Transcriptions of Audio/Video files that are uploaded by the Users.
- Automatic Real-Time transcription with Zoom, based on specific ad-hoc integration. Here we have some limitations that we’ve seen in other cases, with host vs participants limitations, etc.
- Video Captioning still by integrating with Zoom, LMS, and video hosting platforms
There is no standard, fixed price, subscription, pay-as-you-go plan available upfront. Once you provide your business contacts, a verbit team member reaches you to enable the desired services and to agree on the price.
Sonix provides Automated Transcriptions through the, as we call it, “batch-mode”. That means, as usual, you upload the audio/video file and you get the transcription in minutes, depending of course on the size of the file.
For the meetings and conference calls, Sonix declares to provide transcriptions for almost all Meetings Platforms (like MS Teams, Zooms, Skype, etc.), but actually, the process still consists in uploading the recording of the meeting (as for any other audio file) and then getting the transcription after the usual processing time.
Currently, there is no real-time transcription capability for live events, although it claims that it’s a coming-soon functionality, and if you’re interested in experimenting as a beta-user you can leave your (company) contact to be reached offline for more details.
There are many transcription languages, about 30 or more, and the automatic translation service is also available on the transcriptions previously generated.
Another feature is the Subtitling, which you can achieve by working on merging the videos/audio files and the transcripts, and adjusting where needed.
30 Minutes of Free Transcription are available.
With the “Standard” plan, you get a pay-as-you-go price of 10$ per Hour. The “Premium” offers 5$ per Hour plus 22$ per user monthly, and additional features are available like custom dictionaries and others. Finally, the custom plan is for enterprise and dedicated services, volumes, and features.
Rev.com is among the largest (if not the largest) online transcription services companies. The services are individuals and enterprises with dedicated pricing and contract. They are grouped into:
- Human Transcriptions, done by professionals with the highest level of accuracy (a huge network of more than 70000 transcriptionists).
- Automatic Transcription, through audio/video file upload (or alternatively URL or a youtube link) and batch transcription processing. Turnaround times are 5 minutes declared, but it increases obviously with the audio length.
- Captions on-screen generation for audio/video, only in English done by Human transcribers
- Subtitles Translation on-screen, provided by Human Professionals
- Live Captions for Zoom Meeting: you need to download and install the Rev Live Captions application that will be integrated with the Zoom desktop application. Note: you need to have a paid or enterprise/business account with Zoom. This is an important constraint and it could probably represent a limitation not accepted by many Enterprises because the existing company account will be transferred and assigned to another Zoom account. Sensitive and confidential information during meetings cannot be shared outside of the company, therefore it could be an hurdle for several Companies’ employees.
Let’s focus on the Automatic Transcription Services Prices:
- Automatic Transcriptions Price is 0.25$ / Minute (which is really expensive). The first 45 minutes are free.
- Zoom Live Captioning Price is 20$ / Month per User. But consider all potential limitations highlighted above related to authorization and business zoom accounts. However, a 7-days trial is available.
Temi is simple and easy to use: the core service is the transcription of Audio/Video files of any type that the user uploads online. The transcripts are ready in minutes and delivered to the email provided by the user. Turnaround times are from 5 minutes and increase with the size of the file.
Few other functionalities are available: speaker identification, editing of the transcripts, media playback speed adjustment, and timestamps customization. Not so many features actually, but interestingly the AI Algorithm is proprietary, which is for sure a very valuable asset for the company. However, the transcription accuracy heavily depends on the quality of the source audio, which is fair and valid in general, but some very well-trained algorithms are able to perform very well even in cases of non-supreme audio quality.
Free Trial is available for 45 minutes and one single file. The paid model is a full pay-as-you-go-service, therefore no subscription plan is required.
0.25$ per minute of Transcription is the cost. No volume discount and no payment or credits managed monthly or annually. Which could represent a strong point and a fit for many users.
Transcribe by Wreally offers the following Transcription methods:
- Uploading media files, so that you get transcription automatically after some turnaround time depending on the size of the file. Accuracy also depends on the quality of the audio. The processing time, however, is less than the length of the media (and this is valid for all other solutions).
- Dictation through your microphone. In this case, you can also play an audio file, slow it down if you need it, and dictate what you hear to get the transcription. It’s basically a manual way to transcribe an audio file by re-dictating it.
- Video or audio file manual processing to get transcription step by step. In this way, you can play the audio, pause it, slow it down, auto loop and at the same time you can use the text editor to get the transcription, edit the scripts, timestamps and more
- Integration with foot-pedal devices. These connects via USB to your PC, and by getting a Chrome extension, you can integrate it with Wreally’s Transcribe. The foot pedal, in short, allows controlling the speed of the media audio file that you imported before for the transcription.
- Transcribing meetings from Zoom or MS Teams is possible by uploading the recordings, therefore no real-time capability is provided.
Two pricing options are available:
- Self Transcription at 20$ / year, which is really cheap, but the core feature is only the dictation, which is actually quite common and free already (even in Windows). 1-week free trial is available.
- Automatic Transcriptions at 20$ / year plus 6$ / hour of transcription. 1 minute is free for preview, which seems quite low for a trial.
In conclusion, the capabilities and functionalities offered don’t seem distinctive and the user experience is also not superior. Turnaround times are not the best in the market and dictation is actually something that is largely provided for free by many others.
SpeechText.ai provides Video/Audio File batch transcription with variable but generally short turnaround times.
The platforms rely on the open-source LibriSpeech dataset, as also stated on the website, consisting of a large amount of training English speech (which is the most accurate language among the others). It claims to have greater accuracy than other proprietary speech-to-text algorithms (like AWS, Google, Microsoft), but there are actually no data or facts confirming that.
Few predefined categories of models are available that you can select upfront, in order to improve the accuracy and adherence of the transcript to the specific industry or domain of the speech. You can select many transcription languages when uploading the file, but the accuracy, as usual in many cases, is higher for English.
Subtitles generation along with speakers identification features are available to apply on the uploaded file. The accuracy is strongly depending (as in other vendors) on the number of speakers, and the quality of audio.
After the processing, you can export and save the transcript as well as edit as you prefer.
There are currently 4 Subscription Plans available, from “Starter” at 10$/Month, through Personal (19$/Month) and Standard (49$/Month, the most pushed), to the Business at 99$/Month; the main variables are the number of transcription minutes and the total size of files to upload. The Standard plan offers 990 minutes, which is a bit expensive if compared to similar solutions (at 49$/Month, it’s 0.0494 per minute, higher than the 0.037 of One Transcriber). Finally across the plans, more or less features are available, like domain selection that is available from Personal plan only.
oTranscribe is actually not an Automatic Transcription Software. It’s just a free online text tool that you can use to write manually by yourself the transcription of audio files or youtube video audio, while you listen.
That is, you upload in the web online interface an audio or video file, or you can paste a youtube video link, and play it in the browser UI. You have nothing more than an editor where you can manually write the transcription while listening to it.
You can put in pause, forward, or rewind and the text can be edited and formatted as you prefer. You have the chance to save your work and access the history of the documents previously worked on.
It’s free, and no payment is requested.
Colibri’s solution is focused on meetings transcriptions and provides real-time captions, which is a non-common feature. It integrates with several native business meetings applications like MS-Teams, Zoom, and others as well as with Chrome for online web meetings attended via browser.
Colibri does not provide one single solution or software, that is depending on the application and platform used for the business meetings, the automatic transcription is managed differently. And this aspect affects the User Experience, which is not uniform across all the scenarios.
For Zoom meetings, you’ll need to use a dedicated Colibri App from the Zoom Marketplace. It will connect and integrate with the Zoom app, and you get the meeting captions in real-time. We can say that it covers a sort of gap in Zoom, which could also directly provide such capability.
For other Windows-native Applications, like Microsoft Teams, Webex, Google Meet, BlueJeans, etc., you basically need to provide the Dial-In instruction: meeting ID or Code, URL or Phone Number to connect to, Participant Names, etc. and Colibri will dial in the meeting from a fixed US phone, and it will appear in the meeting’s participants list as an additional attendee joining from the phone; and of course, the Organizer of the meeting must admit it to join. This can be a limitation because in many cases the organizer will reject any non-authorized user trying to join in “shadow-mode” any meeting, especially in business contexts. Anyway, once done, a small popup browser window will display the live captions.
The third scenario is the web-based meetings, where you dial in via Browser (actually only Google Chrome seems supported). In this case, you select the Chrome Tab where the meeting or the audio is played, you give the authorization to Colibri to access your microphone and to record the audio played in the Tab, click on “live transcription” and you get a new popup chrome window with the real-time transcripts. Afterward, you can save and edit the transcripts by entering the speaker on each portion of the transcripts (even if, you might have two speakers in the same sentence) or any other correction.
Finally, the fourth scenario is the classic audio/video file upload, where you get the transcription of a previously recorded audio in the usual batch mode. The Calls recording functionality is available to record any audio from Calls or Podcast or similar, which you can then use for such batch-transcription.
One limitation: only English is available as a transcription language across the different real-time or file-based scenarios.
Free Trial is available with 40 minutes of call recording (done via the web online or other tools) and 300 minutes of Transcription (with any scenario).
The paid subscriptions are: “Starter” at 16$/Month with 20 Hours of Transcriptions and 90 minutes of recordings; “Pro” at 40$/Month with 100 Hours of Transcriptions and 4 hours of recordings. The Cost-per-Minutes of Transcription is convenient and competitive since 100 Hours of transcriptions is definitely a lot for one single user (that would mean, a person holding meetings for 6000 minutes in a month, which would mean 4.5 hours per day… hard even for call center agents :)).
Colibri is for sure very interesting because it offers the real-time and live captioning feature that is rarely provided by the other competitors. Only One Transcriber offers it too.
However, compared to One Transcriber, the user experience is not simplified, that is, depending on the meeting platform, the user has to choose different solutions.
Additionally, apart from Zoom which requires a dedicated component installation, the real-time transcriptions of the meetings are managed through the Dial-in of the external “Colibri Participant”, via a Phone Number, to the different Platforms Meetings Room, which is not applicable in many cases (since a “spy transcriber” won’t be probably accepted by others in business contexts).
On the other side, the competitor One Transcriber works seamlessly and independently from the application used, either a Windows-native App or Browser or Video/Media Player (e.g. VLC).
GoTranscript is a solution for those who are looking for Transcriptions provided by professional transcriptionists, which means no automatic transcriptions via online service or software, but human-based work.
In short, the transcription service works in three steps: first you upload the file (audio or video) that you want to get the transcript; second, you pay directly online; third, you get via mail the transcription done by the human transcriber, with a turnaround time of 6 hours for fast project (could be higher for long audio or non-critical projects).
The cost is 0.73€/minute of transcription. There are many transcription language available, about 50.
Streamr by VidToon (provided by Atlas Web Solutions) allows to transcribe Audio/Video files or a Youtube video via its url.
You need to download the Desktop App and authenticate with your account.
From the App, you can create project to work with videos and audio files. You can either upload your files from local or you can search directly for youtube videos that you have to download to your pc (it provides the download functionality where you can also select the resolution). When downloading the video from youtube, or using your local file, you need to specify the source language of the video.
At this point, you can run the video transcription (again, it’s on the local file, not in real-time) and get the captions after some turnaround time. You can work on the video in order to edit or adjust the script and generate the subtitles that can be synchronized with the video’s timestamp. Once you generated the transcript, you can also run the Translation of the video; but it’s not done automatically on the video, you need first to generate the transcript and then the translation is provided on the transcript itself.
The transcription algorithm adopted is the Google Speech-to-Text API, which is also used for the translation. So, it basically integrate the Google Cloud APIs to get the job done, no proprietary language (and this is also common, very few are the well-trained proprietary machine learning algorithms for speech to text). Given the Google API integration, also the available languages are many, for both transcription and translation.
And another big thing: you need a Google Cloud Account to use this software, obviously because it uses the Google APIs as AI algorithm for transcription and translation, and the dedicated cost of the google cloud account are in addition.
It is currently in a Launch phase, with beta customers. The one-time price currently claimed is 49$, which means that it still is trying to acquire customers and get volumes of traffic. There is no mention of any subscription plan with some sort of X $ / Months including Y number of transcription minutes, or similar. Therefore the solution is probably not mature yet.
Maestra provides web-online application to get transcriptions and translations from audio and video files that upload from your local PC or from other locations (dropbox, Instagram, facebook, ..).
After uploading the file, you select the number of speakers in the audio or video, and then what functions you want to apply (one or more together): transcription (selecting the original spoken language), translation (selecting the target language), custom dictionary (in case you have one), your own subtitles if you already have (SRT or VTT, so that you only would get the translation).
Turnaround times, when selecting both transcription and translation, are in the order of 70% of the video duration (in our case, a video of 2’52” took about 2 minutes).
The results are fair good: the transcription in the original language is accurate and speaker’s identification is correct (we tested with good audio quality and two speakers).
You can play the audio again, moving fast, putting in pause, and you have on the left side the transcription divided into portions/sentences with timestamps (start and end time in seconds) and the number of characters per second. For every numbered portion (= sentence), you get both the transcription and the translation in the target language you’ve chosen before, and the transcribed sentences are highlighted and scrolled in sync with the video playing. You can look at the entire script, edit where you prefer, and export the transcription in doc, txt, and pdf format; and you can export the subtitles as well, in many formats like SVB, SRT, and VTT.
Another functionality is the Voiceover: you can apply it on the video/audio file, and you can get the transcription and the translation spoken, rather than written as a subtitle, through a synth voice in synch with the video or audio. It’s an interesting feature, although not very useful since it does not provide a real advantage compared to the subtitle (having a voice pronouncing without expressivity and as a robot, can make harder the understanding).
Free trial of 15 minutes of transcription and translation available. The pricing models are two:
- pay-as-you-go with 10$ / per hour of transcription and no subscription; which means 0.167 $ per minute, quite expensive and above the average
- subscription plan with 5$ / per hour, plus 29$ / per month per user (or 19$ if you pay annually). That is still expensive, as for example, 10 hours of transcription, assuming one single user, will cost [(5 * 10)+29] $ / 60 mins = 0.13167 $ / minute (other competitors offer 0.04 per minute).
Murf.ai is primarily a video and audio editor for automatic voice-over generation. The Transcription service is offered on video/audio files that you upload (only mp3 and mp4 are supported), and as usual you get the timestamps of the sentences as well as the possibility to edit the scripts.
However, the value proposition is based on the Video / Audio content creation and applying voice-over automatically generated through AI. Therefore Murf.ai is more oriented to the Text-to-Speech, rather than speech-to-text.
You access the “Open Studio” suite, which allows creating videos for e-Learning, Advertising, or even Podcast and Demo videos, by applying voice-over, music background, text, and more effects. The Voice-over is available in many flavors and languages, it promises to be “not robotic” but natural and “artistical” since it originated from the real voice tone of professional speakers.
10 minutes of free usage are available for Trial. Then 3 Subscription plans are offered:
- Basic at 13$ / Month, including 24 hours of voice generation and transcription.
- Pro at 26$ / Month, with 96 hours of voice generation and 48 hours of transcription
- Enterprise at 166$ / Month, with unlimited voice generation/transcription and multiple users.
The Automatic Transcription works similarly to the majority of other competitors: you upload the media file (audio or video) and after the processing, you get the transcription that you can edit and export in txt, srt, vtt or other formats.
Looking more in detail, Amberscripts provides 6 main solutions, including automatic and manual transcription services:
- Automatic Transcription: as said, you get the transcription in batch mode from a media file that you have to upload.
- Automatic Subtitles: similar to above, the captions are generated on top of a video file
- Data Annotation: this solution is interesting for businesses or privates that want to obtain training data sets in specific languages and semantic areas like manufacturing, media, automotive, energy, telecom, etc. This is useful in case you need to improve the accuracy of a machine learning algorithm for speech-to-text since we know that in general the accuracy of Artificial Intelligence algorithms strongly depends on the amount and the quality of Training Data that you feed the algorithm itself with. For this service, you need to contact directly the sales department to get a quote.
- Manual Transcription: native transcriptionists provide the transcripts with higher accuracy
- Manual Subtitle: same as point #2 but done by human
- API availability: intended for developers, in case you develop a software that you can integrate with the speech-to-text APIs to get transcriptions
We focus here only on pricing solution for Automatic Transcription & Subtitles that are the same (the manual of course are much more high in price):
- Free Plan available with 10 minutes of audio/video Transcription.
- Pre-Paid with pay-as-you-go: 10€ per 1 hour of transcription. It’s prepaid since you decide the number of hours and you pay in advance without an expiration date or recurring payment. It’s quite expensive, 0.167 € / Minute, identical to Maestra competitor
- Subscription: 32€ per month including 5 hours of audio/video transcription. It’s again similar to Maestra but a bit more expensive
SpeedScriber is a software to download on your local PC and it’s only available for Mac OS.
It works on Video and Audio files processed by the local software with very fast Turnaround times: the website claims faster than in real-time because you get the transcription of a 60 minutes audio file in just 10 minutes.
The Software allows to select the files, but the same files are then moved/transferred to the server where the transcription is really applied.
Actually, apart from the convenience of using the macOS native App features, the core of the work, i.e. the transcription, is done server-side, which means basically that the App is mostly a way to upload the audio/video files on the cloud server. So, why couldn’t simply be done online as SaaS service? This would have been better since there is no complex local processing that would require the macOS native capabilities.
However, the transcripts received can be edited, reviewed, and exported in many usual formats, like doc, txt, pdf, and many more.
15 Free minutes of transcriptions are offered.
Then it’s a pay-as-you-go service, with no subscription or minutes expiration. Pricing rates are:
- 0.50$ per minute up to 120 minutes of transcription
- Doubling up the minutes of transcription bought, you get a lower cost per minute, from 0.47$ / min with 300 minutes bought, to 0.37$ / min with 3000 minutes.
In any case, the Price is definitely super expensive when compared to all others.
isLucid works integrated with Microsoft Teams for transcription in real-time of meetings and calls held there.
It’s a Plug-in tool installed on top of Microsoft Team, therefore the additional icon and buttons appear directly in the user interface of Microsoft Team itself.
The real-time transcription and live captions on meetings, as we know, is already natively provided by Microsoft Team. Hence isLucid only provides the functionality to view the entire script, modify it and download/export it. The core functionality is the Tasks creation on top of the Transcripts registered on Microsoft Teams.
So, isLucid allows you to create Tasks, that you derive for example from meeting minutes, in Jira, Microsoft Planned, Azure DevOps, with the assignee and the owner, as per all usual standard Tasks management collaboration tools. It works by selecting a sentence within the Transcript, clicking the “tree dots” typical of Microsoft Teams, choosing from the small drop-down menu the action to execute (i.e. Add to Planner, Add to Jira, etc.).
That’s all: a simple plug-in for MS Teams, mainly a Task planner creation and integration tool.
Although Tasks creation, assigning, and planning are definitely hard jobs, the available prices look higher than average.
We mention just the “Team” option, consisting of a subscription monthly plan at 190$/Month which includes 60 meetings in the same month.
So, not properly a solution for those who search for pure Transcription services, but probably people working a lot with Microsoft Team and Tasks planning, should give a look at it.
Ebby’s automatic transcription service applies to audio/video files that you upload to the online platform from your local PC or from online google drive, dropbox, box.
You can set a few preferences for the transcription: many transcription languages (level of accuracy is higher with well-trained languages like English US), a multiple speaker flag (that helps the algorithm to better distinguish the sentences), the Email sending flag (you receive the transcript via email too), two words-filter options on profanity (replacing with *** when identified) and fillers (like uhh/uhm/ahh).
Accuracy is a function of the audio quality and of the language used (whether it’s well trained or not in the algorithm). It’s high for the audio we tested which was of high quality, with two speakers (correctly identified with labels), not overlapping each other and speaking a linear language, and timestamps on each sentence.
Turnaround time is variable since it depends on the size (it takes less than one minute for a 2 minute audio) and on the quality of the audio (i.e. it takes longer if the quality is low).
You can edit, review, replay, adjust words and timestamps in order to better sync with the video or audio file, export/download the subtitles in various formats.
Free Trial includes 2 minutes of transcription preview for 2 audio files.
The Price offerings are two:
- pay-as-you-go with credits purchase. The basic price is 15$ / Hour of transcription with volume discounts applied progressively, e.g. for 3 and 4 hours 10% discount (i.e. 13.5$/hour), 5 to 9 hours 20% discount, up to above 40 hours you get 50% discount (7.50$/hours, corresponding to 0.125$/Minute). Price is quite expensive even in the max discount case, which is an edge case because 40 hours of transcriptions are a lot and not for everyone’s needs.
- pay a one-time fee of 30$ and you get a fixed 60% off discount always on the credits, i.e. 6$/hour. This results in a 0.1$/Minute, which is still not cheap compared to others, but if you have a constant need for transcriptions over time, it’s more convenient as long as you can optimize the 30$ initial buy-in.
Go-Transcribe offers the automatic transcription service on audio/video files uploaded online. The transcription is processed and delivered with turnaround times depending on the size and the quality of the audio. Then the online tool also provides document editing functionalities in order to correct or modify the transcript, highlight key sentences or parts of the transcripts, pause and play the audio to enhance the transcript manually, etc.
You can add a custom vocabulary, manually specifying a list of specific words or acronyms that are present in the files to transcribe.
Languages available are more than 30, and English (with three different accents US, UK, AUS) is the most accurate (since all algorithms, in general, are better trained in English).
In summary, it’s quite similar to other online tools, not so many of features or functionalities, just basic.
Also, UX is similar to or with fewer functionalities than other vendors, e.g. in files upload and in transcription editing/revision, but also in the pricing structure (see above, Amberscript, Maestra, Ebby, ..).
10 minutes of transcription are free for Trial. Paid versions are:
- Pay-as-you-go at 12€/Hour. More expensive than other pay-as-you-go also because there’s no discount volume
- Subscription plans: Standards at 36€/Month with 4 Hours of transcriptions (and additional hours at 9€ each); and Business at 90€/Month with 10 hours included (and additional hours at 8€ each).
Conclusion: basic/lean solution, quite expensive, recent solution in the market.