Language model

Psychological Understanding of Textual journals using Natural Language Processing approaches

Recent advancements in artificial intelligence models that accept textual inputs are becoming more and more accurate. However, because of the differences between the nature of the artificial intelligence models and human functioning, understanding the AI outputs are becoming harder for humans. In this project, the aim is to utilize top AI models in the field of natural language processing to provide meaningful insight from psychological real-world documents that contain complex structures. The project involves two main chapters each including a different dataset. The first chapter is related to binary classification on a personality detection dataset, while the second one is about sentiment analysis and Topic Modeling of sleep-related reports.

Interpretable Representation Learning for Personality Detection

In this paper, different models are introduced and evaluated in terms of their capability in understanding psychological context for personality detection which also resulted in a new state-of-the-art in this field.

Multitask Learning for Emotion and Personality Detection

Our more computationally efficient CNN-based multitask model achieves the state-of-the-art performance across multiple famous personality and emotion datasets, even outperforming Language Model based models.

Bottom-up and top-down: Predicting personality with psycholinguistic and language model features

A state-of-the-art novel deep learning-based model which integrates traditional psycholinguistic features with language model embeddings to predict personality from the Essays dataset for Big-Five and Kaggle dataset for MBTI.

personality detection using bagged SVM over BERT word embedding ensembles

A novel model which feeds contextualized embeddings along with psycholinguistic features to a Bagged-SVM classifier for personality trait prediction. This model outperforms the previous state of the art by 1.04% and, at the same time is significantly more computationally efficient to train.