A Data Set of YouTube Audio Transcriptions
May 1, 2024
Featured in Jeremy Singer-Vine’s newsletter “Data is Plural“, a French startup has built YouTube Commons, a data set of openly licensed audio transcripts from over 2 million videos. From the newsletter: “The dataset indicates each video’s YouTube ID, title, channel, and date, as well as each transcript’s original language, translated language, word count, and character …
Continue reading “A Data Set of YouTube Audio Transcriptions”