Welcome to the BUET CSE NLP Group. We are a group of researchers focusing on tackling problems on natural language processing and machine learning, specifically machine translation, multi-lingual NLP, adapting NLP techniques for programming language and natural language understanding.


Recent News


2022

2021


2020

Publications


XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages

In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


PDF Code

CoDesc: A Large Code–Description Parallel Dataset

In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


PDF Code

Preprints

CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

ArXiv Pre-print, 2022


PDF Code


BERT2Code: Can Pretrained Language Models be Leveraged for Code Search?

ArXiv Pre-print, 2021


PDF

Text2App: A Framework for Creating Android Apps from Text Descriptions

ArXiv Pre-print, 2021


PDF Code

Ongoing Projects

  1. Adapting XL-Sum for Cross-Lingual Summarization

    The target language of a multilingual model for cross-lingual summarization is limited to only the language it is fine-tuned on, and we have observed that fine-tuning with multiple languages without cross-lingual supervision can not help control the language of the generated summaries.

    In this work, we are aiming to generate summaries in any target language for a given article by fine-tuning multilingual models with explicit (albeit limited) cross-lingual signals. By aligning identical articles across languages via cross-lingual retrieval on the XL-Sum dataset, coupled with a multi-stage sampling technique, we are aiming to perform large-scale cross-lingual summarization for 45 languages.

  2. Paraphrase Generation via Knowledge Distillation from Machine Translation Models

    Synthetic paraphrase datasets are typically generated with round-trip machine translation. Since these back-translation-based data generation approaches have been shown to generate appropriate paraphrases.

    In this work, we are trying to directly distill the knowledge of translation models into a paraphrase generation model. We are aiming to use two teachers, namely a forward translation model and a backward translation model, to distill two types of knowledge into the paraphrase model: the cross-attention distribution and the output distribution. In constrast to traditional knowledge distillation, here we have two teacher models instead of one and the task of the student model is different from the teacher models.

Meet the team


Dr. Rifat Shahriyar
Dr. Rifat Shahriyar

Professor, Dept. of CSE, BUET



Dr. M. Sohel Rahman
Dr. M. Sohel Rahman

Professor, Dept. of CSE, BUET



Dr. Anindya Iqbal
Dr. Anindya Iqbal

Professor, Dept. of CSE, BUET



Dr. Wasi Uddin Ahmad
Dr. Wasi Uddin Ahmad

Research Scientist, AWS AI



Madhusudan Basak
Madhusudan Basak

Asst. Professor, Dept. of CSE, BUET



Md. Saiful Islam
Md. Saiful Islam

Asst. Professor, Dept. of CSE, BUET



Tahmid Hasan
Tahmid Hasan

Lecturer, Dept. of CSE, BUET



Abhik Bhattacharjee
Abhik Bhattacharjee

RA, Dept. of CSE, BUET



Kazi Samin Mubasshir
Kazi Samin Mubasshir

ML Engineer, ACI



Kazi Sajeed Mehrab
Kazi Sajeed Mehrab

Lecturer, Dept. of CSE, UIU



Ajwad Akil
Ajwad Akil

Undergraduate student



Nazrin Shukti
Nazrin Shukti

Undergraduate student

Alumni
Masum Hasan
Masum Hasan

RA, Dept. of CSE, ROC



Md. Mahim Anjum Haque
Md. Mahim Anjum Haque

TA, Dept. of CSE, VTech



Abdullah Al Ishtiaq
Abdullah Al Ishtiaq

TA, Dept. of CSE, PSU



Tanveer Muttaqueen
Tanveer Muttaqueen

Software Engineer, TigerIT