How I used GCP to create the transcripts for my Podcast

Reading Time: 4 minutes

I’m currently working on a series of episodes for a Podcast I’ll be publishing soon. The Podcast will be in Italian and I wanted to make sure to publish the episode transcripts together with the audio episodes.

The idea of manually typing all the episodes text wasn’t really appealing to me so I started looking around.

What are the tools out there?

From a quick Google search, it seems that some companies are offering a mix of automated and human-driven transcription services.

I wasn’t really interested in that for now. I was, of course, just interested in consuming an API I could push my audio to and get back some text in a reasonable amount of time.

For this reason, I started looking for speech-to-text APIs and, of course, the usual suspects figured among the first results.

To be quite honest, I didn’t spend too much time investigating the solutions above. I probably spent more time reading about them to write this blog post.

I decided to go with Google Cloud because I’ve never used GCP before and wanted to give it a try. Additionally, the documentation for it seemed quite straightforward, as well as the support for Italian as language to transcribe from (the podcast is in Italian). I also had a few free credits available because I’ve never used GCP for personal use before.

Setting up

If you want to try transcribing your episodes too, follow this quick setup guide to get started.

Head over to Google Cloud and set up an account. Make sure you create a project and enable the Speech-to-Text API. If you forget to do so gcloud will be able to take care of that for you, later.

Google Cloud Speech-to-Text
Google Cloud Speech-to-Text

Second thing I did was installing gcloud, the CLI Google Cloud provides for interacting with the APIs. This time I was only interested in testing the API so it seemed to me that this tool was the only way to get started quickly.

Additionally, there’s not much you can do from the Google Cloud Web Console if you want to deal with Speech-to-Text APIs.

Get your file ready for transcription

Sampling rate for your audio file should be at least 16 kHz for better results. Additionally, GCP recommends a lossless codec. I only had an mp3 of my episode handy at the time so I gave it a try anyway and it worked well enough.

Make sure you know the sample rate of your file, though, because specifying a wrong one might lead to poor results.

You can usually verify the sample rate by getting info on your file from your Mac’s Finder:

File's context menu
Just click Get Info on your file’s context menu
File's Finder Info section with sample rate
There’s the sample rate

You can read more about the recommended settings on the Best Practices section.

Upload your episode to the bucket

GCP needs your file to be available from a Storage Bucket so, go ahead and create one.

Storage Bucket creation example

You’ll be able to upload your episode from there.

Time to transcribe

Once you have your episode file up there in the cloud go back to your local machine terminal were you have configured the gcloud tool.

Gcloud used to trigger the speech-to-text transcription

If your episode lasts longer than 60 seconds (😬) you’ll want to use recognize-long-running and most likely specify --async.

As I said before, make sure you specify the right --sample-rate: in my case 44100. This will help GCP transcribe your file with better results.

The --async switch creates a long-running asynchronous operation. It took around 5 minutes for me to have the operation complete.

Oddly, I wasn’t able to find any reference to the asynchronous operation from my Google Cloud Console. So, if you want to be able to know what happened to your transcription job, make sure you take note of the operation identifier. You’ll need it to query the speech operations API for information about your transcription job.

The speech operation metadata

The transcribed data

Once your transcription operation is complete the describe command will return the transcript excerpts, together with the confidence rate.

The speech transcript excerpt

I wasn’t particularly interested in the confidence rate, I only wanted a big blob of text to be able to review and use for SEO purposes as well as to be able to include it with the episode. For this reason, jq to the resque!

I love jq, you can achieve so much with when it comes to manipulate JSON.

In my case, I only wanted to concatenate all the transcript fields and save them to a file. Here’s how I did:

$ ./bin/gcloud ml speech operations describe <your-transcription-operation-id> | jq '.response.results[].alternatives[].transcript' > my-transcript.txt

And that’s it!

Conclusion

I thought of sharing the steps above because they’ve been useful to me in producing the transcripts. I think GCP Speech-to-Text works quite well with Italian but, of course, the transcript is not suitable to be used as it is, unless your accent is perfect. Mine wasn’t 😅.

If you want to know more about my journey towards publishing my first podcast follow me on Twitter were I’ll be sharing more about it.

Photo by Malte Wingen on Unsplash

How I used Chisel to pull Gitlab pipelines stats

Reading Time: 4 minutes

I built chisel.cloud in my spare time to automate something I did to derive insights about my Gitlab pipeline times.

In this blog post I’m going to show you how I did it in the hope that it might be useful to you too.

chisel app main screen

As you can see from the picture above, Chisel is still pretty early stage. I decided to publish it anyway because I’m curious to know whether something like this can be useful to you too or not.

Understanding deployment time

The goal of this exercise was for me to better understand the deployment time (from build to being live in production) of my project and have a data-driven approach as to what to do next.

Since the project in question uses Gitlab CI/CD, I thought of taking advantage of its API to pull down this kind of information.

Gitlab Pipelines API

The Gitlab pipelines API is pretty straightforward but a few differences between the /pipelines and the /pipelines/:id APIs means that you have to do a little composition work to pull down interesting data.

Here’s how I did it.

1. Pull down your successful pipelines

First thing I did was fetching the successful pipelines for my project.

chisel app first screen

As you can see, this API returns minimal information about each pipeline. What I needed to do next in order to understand pipeline times was to fetch further details for each pipeline.

Chisel – Transform

Chisel provides a handy transformation tool that uses JMESPath to help you manipulate the JSON returned by the API you are working with. I used it to extract the pipeline IDs from the returned response.

chisel app second screen

Chisel shows you an live preview of your transformation. Something as simple as [*].id is enough for now. The result is an array of pipeline IDs.

Right after obtaining all the IDs I need I can apply another transformation to turn those IDs into pipeline objects with all the relevant information I need for my stats.

Chisel has another kind of transformation type called Fetch that helps you transform the selected values into the result of something fetched from a URL.

chisel app third screen

In particular, you can use the ${1} placeholder to pass in the mapped value. In my case, each ID is being mapped to the /pipelines/${1} API.

The result is pretty straightforward.

chisel app fourth screen

2. Filter out what you don’t need

As you can see, some of the returned pipelines have a before_shaof value 0000000000000000000000000000000000000000. Those are pipelines triggered outside of merges into master so I’m not interested in them.

Filtering those out is as simple as [?before_sha != '0000000000000000000000000000000000000000]

chisel app fifth screen

The transformation history

As you can see, on the right of the screen there’s a little widget that shows you the transformations you have applied. You can use it to go back and forth in the transformation history and rollback/reapply the modifications to your data.

chisel app transformation history

3. The last transformation

The last transformation I need to be able to start pulling out useful information has to turn my output into a set of records.

chisel app sixth screen

I’m selecting only a few fields and turning the result into an array of array. This is the right format to be able to export it as a CSV.

chiel app csv download screen

Google Sheets

Finally, I can upload my CSV export to Google Sheets and plot the information I need.

google sheets import

Conclusion

Chisel is still at its earliest stage of development and it is pretty much tailored on my specific use case but if you see this tool can be useful to you too, please head to the Github repo and suggest the improvements you’d like to see.

If you liked this post and want to know more about Chisel, follow me on Twitter!


Featured image by Dominik Scythe on Unsplash