Developing Aviation ASR and NLP Datasets and Tools
PI Jianhua Liu
The goal is to create an ATC ASR dataset for open access. We have obtained 300 hours of audio data and processed 30 hours using the bootstrap approach: Using Whisper to provide the initial transcripts, Correcting the transcripts by hired transcriber team, reviewing the corrected transcripts.