Welcome to the BUET CSE NLP Group. We are a group of researchers focusing on tackling problems on natural language processing and machine learning, specifically machine translation, multi-lingual NLP, adapting NLP techniques for programming language and natural language understanding.
The rapid advancement of large language models, such as OpenAI's GPT-3, has raised concerns about their potential to generate biased or harmful outputs. Ensuring that these models align with human values is crucial for their responsible deployment in various domains. This research aims to investigate and develop techniques for aligning large language models with human values. The BUET CSE NLP Group is actively engaged in pretraining billion-scale GPT and T5 models in Bangla, which are serving as primary testbeds for our proposed research. The project is exploring directions such as instruction fine-tuning and reinforcement learning with human feedback to enhance the model's alignment with specific human values in the Bangla language context. Additionally, the project is focusing on developing interpretability frameworks to shed light on the decision-making processes of these models, enabling better understanding and control over their output. Additionally, we are also exploring ideas from game theory, cognitive science, and behavioral economics to design objective/reward functions aligned with human rationales.
This research is focusing on exploring the potential of retrieval-augmented large language models, aiming to enhance their performance in natural language understanding and generation tasks. Retrieval-based methods have shown promising results in improving the quality and relevance of generated responses in conversational AI systems. This project proposes to investigate techniques that combine the strengths of large language models with effective retrieval mechanisms to generate more contextually relevant and coherent responses. The research involves designing and training retrieval models that can efficiently retrieve relevant information from large knowledge bases or corpora to support the language model's generation process. Furthermore, the project exploring methods for fine-tuning large language models using retrieval-based objectives, enabling it to leverage retrieved information for more accurate and informed responses. Potential applications include open-domain and/or cross-lingual question answering.
This research aims to investigate the integration of vision in large language models to enhance their capabilities in visual understanding and generation tasks. While large language models have achieved remarkable success in natural language processing, they often lack the ability to comprehend and generate content related to visual information. This project proposes to explore techniques that combine the power of large language models with computer vision methodologies, enabling the models to understand and generate text based on visual input. The research involves designing and training models that can effectively process and analyze images, extracting meaningful visual features to enrich the language model's understanding. Furthermore, the project explores methods for fine-tuning the language model using visual-based objectives, allowing it to generate text that is coherent and aligned with the visual content. The proposal has potential applications in areas such as visual question answering and multimodal summarization.
Synthetic paraphrase datasets are typically generated with round-trip machine translation. Since these back-translation-based data generation approaches have been shown to generate appropriate paraphrases.
In this work, we are trying to directly distill the knowledge of translation models into a paraphrase generation model. We are aiming to use two teachers, namely a forward translation model and a backward translation model, to distill two types of knowledge into the paraphrase model: the cross-attention distribution and the output distribution. In constrast to traditional knowledge distillation, here we have two teacher models instead of one and the task of the student model is different from the teacher models.