Developing Aviation ASR and NLP Datasets and Tools

PI Jianhua Liu

The goal is to create an ATC ASR dataset for open access. We have obtained 300 hours of audio data and processed 30 hours using the bootstrap approach: Using Whisper to provide the initial transcripts, Correcting the transcripts by hired transcriber team, reviewing the corrected transcripts.

Researchers

  • Jianhua Liu
    Department
    Electrical Engineering and Computer Science Dept
    Degrees
    Ph.D., University of Florida
  • Andrew Schneider
    Department
    Flight Department
    Degrees
    M.A., University of Massachusetts-Boston
    B.F.A., Southern Methodist University