Section 2.1 Speech and Language laboratory
ΒΆThe speech and language research group in SCSE was founded in 2007 by Chng Eng Siong and Prof Li Haizhou
(now in CUHK-Shenzen, China). The group is now situated within HESL Lab - N4-B2b-05
in SCSE. We also founded the AISG Speech Lab
funded by NRF since 2018~current.
Subsection 2.1.1 Research Focus
Our research interest is primarily speech and language processing, classifications using ML:
-
ASR and LLM
- Using LLM to improve ASR: see
Hyporadise
- Code-switch multi-lingual speech recognition: see
Audio to Byte
- Robust Large vocabulary continuous speech recognition: joint end-to-end ASR with speech enhancement module, wave2vec2, speaker extraction
- Speech enhancement: speaker extraction, denoising, feature enhancement, overlapping speech extraction
- Using LLM to improve ASR: see
-
Classification
- Deep Fake Detection (and generation)
Link
- Speaker identification and speaker diarization: diarization, VAD, and speaker extraction issues, see
Microsoft diarization approach
- Deep Fake Detection (and generation)
-
Towards Speech Understanding
- some aspects of NLP such as depression classification, summarization, name entity recognition, text normalization. See a demo of our ASR for ATC speech with NER highlighting.ATC with NER
Subsection 2.1.2 Demos
Some of our previous works:
-
Youtube recordings: Our code-switch speech recognition in action:
Source separation - Separating Hillary Clinton and Trump voice from Youtube recording, from Chenglin's
Demo slide
(Oct 2018)Speech indexing using our MAGOR system (Code-switch English/Mandarin and Malay system)
See a demo of our ASR for ATC speech with NER highlighting.
ATC with NER
Subsection 2.1.3 Our past demos using our speech engine
2020 FYPs demo:
Subsection 2.1.4 Some of our past works in git
PhD Student Hou Nana's work in NTU (2018~2021), single channel speech enhancement,
github
PhD Student Xu Chenglin's work in NTU (2015~2020), single channel speech separation/extration,
github
Intern GeMeng's work (intern from Tianjin 2020~2021), tutorial speech separation,
github
Intern Shangeths work (intern from BITS) (2020 Aug- 2021 June), Accent, Age, Height classification
Pdf link
MSAI student Samuel Samsudin (2020~2021), emotion detection,
github depository
,kaggle iEmoCap
Language Identification by EEE's PhD student Liu Hexin (2021)
github link
Intern Shashank Shirol's work (2020 Jan-June), using GAN to create noisy speech,
github depository