Detecting Mental Health Disorders Using NLP
The concept of mental fitness pertains to the state of possessing a sound and adaptable mind, characterized by good health and resilience. It encompasses the capacity to navigate challenges and setbacks in a constructive and flexible manner. Consequently, the cultivation and sustenance of mental fitness contribute significantly to the improvement of individual well-being and overall quality of life. In 2019, the WHO observation showed that close to a billion people around the world were living with a mental disorder especially depression and anxiety being the most common. There are various NLP projects which deal with classifying a specific mental health disorder based on analysis of text or speech but very few deal with multi-level classification. To address this we are doing a single model with multilevel mental health disorder classification-combining the text data from many data sets of individual classes. To address this we are creating a composite framework with multilevel mental health disorder classification-combining the text data of different classes from a dataset of reddit posts. Our experiments have shown that fine-tuned domain-specific transformer models and hybrid models combining transformer models and recurrent neural network models like LSTM and Bi-LSTM can detect mental health disorder with an fl-score of 88% and above.
Mental health conditions, such as depression, anxiety disorders, autism spectrum disorder (ASD), suicidal thoughts, schizophrenia, and bipolar disorder, are highly prevalent globally and pose significant public health challenges. Timely detection of mental illness is crucial for effective management and treatment. However, identifying these conditions without the individual's active participation can be time-consuming and resource-intensive, highlighting the importance of early detection for improved outcomes.
There are different textual ways nowadays, in which people express their feelings such as social media platforms, user-generated content platforms like Reddit, text messages, and in the medical setting it could come from the patient's electronic health record (EHR) or from clinical notes that contain the description of a patient's mental state. Traditional methods for detecting mental illness, such as face-to-face interviews and self-report questionnaires, are reactive and time-consuming. They require ongoing observation or assessments and may involve expensive sensors. Additionally, the accuracy of these methods relies on the assumption that the individual's responses accurately reflect their mindset.
However, with the rise of technology, people's lives have changed. Many individuals share their thoughts and ideas on social media by posting statuses, comments, or discussing current events. This information can provide insight into whether a person may be experiencing a mental disorder. Cambria, Eric, and others make a valid argument in their book [5] arguing that sentiment analysis requires the next generation of sentiment mining techniques. A summary of research on early detection of mental disorders in social media is available at Crestani et al, 2022 [8].
Natural language processing (NLP), a branch of artificial intelligence (AI), has become increasingly important in analyzing and managing large amounts of text data. NLP can be used for a variety of tasks, including sentiment analysis [25], information extraction, emotion detection, and mental health surveillance [20] [24]. The detection of mental illness in text data can be classified under text or sentiment analysis. Early identification of mental illness could lead to improved prevention and treatment strategies. In this paper we are using NLP to detect and classify between various mental illness based of text data.
According to Zhang [30], the identification of mental illnesses such as depression and suicide, has numerous applications. Other disorders like stress, eating disorders, anorexia, PTSD, ASD, bipolar disorder, schizophrenia, and anxiety have also been examined using NLP techniques. The most common features used in NLP tasks are linguistic patterns, such as part-of-speech (POS) tags, and semantic features, such as bag-of-words (BoW) and term frequency-inverse document frequency (TF-IDF). These features can be easily extracted from text using text processing tools, and they are widely used in a variety of NLP tasks. Until recently, most of the research in mental health disorder detection was revolving around feature engineering. It was primarily based on the extraction of lexical features and identifying what is unique in a particular disorder. The techniques used included NLP methods like bag of words (BOW), Term Frequency/inverse Term Frequency (TF-IDF), Part of Speech (POS) and Linguistic Inquiry Word Count(LIWC). This further evolved into the use of topic modelling techniques like Latent Dirichlet allocation and clustering methods like Glove Word Clusters. Together with other attributes like posting frequency, age, gender and retweet/repost rates, these techniques offered significant improvements in the detection of mental health disorders.
Feature engineering and extraction are essential for traditional machine learning models, but deep learning frameworks have shown that they can learn important features from text data without the need for feature engineering. This has led to significant improvements in the performance of deep learning models for tasks such as mental illness detection. Orabi, Amed Husseini et.al. [31] optimized the word embeddings and conducted binary classification with 5-fold cross-validation for depression. Their comparative evaluations on different models CNN, RNN and BiLSTMS proved that CNN models perform better with F1 scores of 86.967 on the CLPsych -2015 dataset of Twitter posts.
Mohsen Ghorbani et al. (2020) [14] used deep learning techniques with word embeddings for sentiment analysis and opinion mining. They used CNN in combination with his LSTM algorithm. The proposed model ConvLSTMConv was developed for the binary classification of mood categories with positive or negative connotations. Borah et al. [4] utilized multiple classifiers to identify mental emotions, including stress, with a predetermined set of inputs. Consequently, they were able to predict five emotion categories: 'Angry', 'Surprised', 'Happy', 'Sad', and 'Fear'. Mental health experts can use this information to determine whether an individual's mental state is stressed or relaxed. The combination of LSTM and BERT outperformed other algorithms such as Logistic Regression, SVM, Random Forest classifier, LSTM, and BERT, with accuracy and F1 score of 93.28.
Deep learning frameworks for text analysis typically have two layers: an embedding layer and a classification layer. The embedding layer converts the text input into a dense vector that preserves semantic and syntactic information. This allows the deep learning model to be better trained. There are several embedding techniques available, such as word2vec, word embedding, Elmo, and contextual language encoder representations such as BERT and ALBERT. For e.g. Kim J et.al. [32] developed six binary classification models one for each category of mental health disorder namely depression, anxiety, bipolar, schizophrenia, BPD and autism.
They used XGBoost and CNN, they observed F1 scores ranging from 38.07 to 79.49. Another example, a study by Peng et al. (2019) [26] found that a pre-trained BERT model trained on PubMed-edited compositions and clinical admission notes from MIMIC-III outperformed the state-of-the-art models on ten datasets. They aimed to explore the potential of using deep learning techniques in NLP to develop a classification model that could improve the accuracy of psychiatric evaluations and diagnoses, even with a small dataset. Dinu et al.. [11] used 50000 posts for each mental health group. They ran binary classifiers for each and discovered that Roberta and XLNet offered better results. They observed F1 scores between 0.70 and 0.81 for the different mental health classifications. They also discovered that depression has the lowest F1 score because it is difficult to identify depression in linguistic forms. It is also evident that large datasets do not mean better training and hence better classification.
Mental health disorder detection from social media posts involves sentiment and content analysis of the text in these posts using NLP techniques. Depression detection has been widely explored as a binary classification problem with success. However, multi-level classification of mental health illnesses has its challenges. The classification boundaries are not very well defined. Certain mental conditions like Schizophrenia need a second person to confirm a symptom like hallucination and PTSD requires prior information about a traumatic experience or incident that has occurred. One of the earliest works on multi-class classifiers for mental health detection was the research done by Murarka et.al [33]. They used the sequence pair classification technique to give more importance to the title. In addition, they also used sub-reddits like music, politics etc to get the posts with no mental health illness.
The Roberta classifier gave them an average F1 score of 89%. This was achieved after collecting around 300 posts for each mental illness and also considering posts with a minimum token length of 30 and only those posts having upvotes of more than 10. They also conducted experiments with synonym replacement and masking. They concluded that the model performance drops with the synonym replacement test, meaning that the model depends on the presence of the root word in the input text. The researcher also conducted extensive preprocessing of the data. They ensured that the general topic sub-reddits did not have high similarity with the other mental health sub-reddit posts. A bag of words (BOW) analysis was conducted and ensured that the classes have a good distance between them when it comes to these high frequency words.
Ameer, Iqra et. al. [3] also compared different models for multi class classification of mental health disorders. They discovered the best results among deep learning algorithms with Bi-LSTM. The pre-trained RoBERTa performed better than others with an average Fl score of 0.83. For their experiment, they compared the ROBERTA based classifier with LSTM based classifier as well as BERT based classifier. The sequence lengths were of size 512 and and the classification layer used a dropout of 0.5. 10 to 15 epoch runs were conducted with learning rates of le-5 and Adam as the optimizer. They also concluded that the label is not sufficient for classification. It needs a deeper understanding of the post and its context in order to predict mental illness.
The researchers also indicate that there is a high possibility of a post pointing to different labels for e.g. a post showing symptoms of both PTSD and depression. So, a multi-label classification is also seen as a possible future development. Dionysis Goularus et al. (2019) [15] tested sentiment analysis of Twitter data using several deep learning configurations based on CNN and LSTM networks. They found that the CNN and LSTM networks in combination gave better results than both alone. Using multiple CNN and LSTM networks improved system performance. The authors emphasized the limitations of CNN and LSTM networks in this area. They also used a multilayer perceptron (MLP) with a hyperbolic tangent activation function for classification.
The model provided Fl values between 58.58 and 94.23 for all datasets. Bishal Lamichhane [23] used LLM-based ChatGPT (using GPT 3.5 Turbo backend) for stress, depression, and suicidal tendencies detection. F1 scores reached 0.73, 0.86, and 0.37 compared to the baseline model that predicted the dominant class with scores of 0.35, 0.60, and 0.19. This experiment utilizes zero-shot classification accuracy and can be further improved. In their paper [22], Kant, Neel et. al. describe how Transformer models outperform LSTM model with fine tuning on emotional classification tasks. Their conclusion is that Unsupervised language modeling can be performed on general text datasets without the need for labels. However, domain-specific labeled data is required for downstream tasks to perform well.
We considered various pre-trained models. Most of these are domain-specific models based on BERT. Microsoft created a benchmark for understanding and thinking in the biomedical language BLURB [17]. Several pre-trained models relevant to health, medicine, and biomedical fields are available for various NLP tasks, such as relationship extraction, document classification, question answering, named entity recognition, sentence similarity, and PICO (problem intervention comparison results).
Tested and benchmarked against the nlp task. PubMedBERT, the model trained from scratch in PubMed, is supported by other pre-trained models BERT, ROBERTa, SciBERT (pre-trained on scientific texts from semanticsscholar.org), ClinicalBERT (It performs better than trained clinical data and notes, on targeted readmission tasks) and BlueBERT (trained on PubMed excerpts). PubMedBERT has the highest BLURB score of 81.1. The model performed well on named entity recognition and sentence similarity, but not on machine understanding tasks.
This paper suggests that for biomedical NLP applications, domain-specific pre-training from scratch is more effective than using a general domain language model. This research shows how important it is to develop domain-specific models. In 2021, Ji, Shaoxiong, et al. [21] published MentalBERT and MentalROBERTa as pre-trained language models for mental health care. The model was evaluated against multiple datasets including Reddit (SWMH, Dreaddit, CLPsych, eRisk18 T1, DepressionReddit, UMD), Twitter (T-SID), and SMS (SAD). In these databases, the model performed well in detecting depression, stress, anorexia, and suicidal ideation. They are specifically designed to detect mental disorders on social media.
Prompts Tuning was an important part of our model training. Prompts can be created manually or automatically. Manual tokens involve prior knowledge of the domain and implementing domain-based rules. One of the recommended [18] approaches is to create several simple sub-prompts and then combine them to make task-specific prompts according to the logic rules. Compared to automatically creating prompts, applying logic rules to create prompts is more efficient and easier to interpret and is followed in our architecture.
A. Dataset There are several mental health database. In Harrigian, Keith et al. [19] study, they have compiled 35 proprietary mental health datasets, categorized into four types based on accessibility: DUAs (datasets with a use agreement), APIs (via a publicly available application programming interface) accessible), AUTH (with upstream author's permission), FREE (hosted on publicly accessible servers), SMDH (Self-Reported Mental Health Examination). To achieve our goal, we have considered textual data from the Reddit posts made in several mental health-related subreddits, including 'depression', Anxiety', 'bipolar', 'ADHD', and 'ptsd'. These 5 classes make up approximately most of the data and were collected by extracting Reddit's publicly available data. The remaining data acts as a control group and was pulled from subreddits related to happy conversations, family, or friends, including happy, smiles, Humans Being Bros, and happycrowds. This control group is labeled as 'NoMentalIllness to separate it from the other 5 classes which are related to a specific illness. The downloaded dataset is summarized below:
TABLE 1 DOWNLOADED DATASET DETAILS
B.Preprocessing The following steps where taking while preprocessing:
Remove duplicate posts Combined Title and Post Content in a single field Remove HTML, Reddit markdown formatting, URLs, references to other posts Remove posts made by bots and replace emojis with their inferred meaning
TABLE II CLEANED DATA EXAMPLES FOR EACH CLASS
Sample post and data size after preprocessing were:We considered a 9:1 ratio for splitting our dataset into train and test
TABLE III DISTRIBUTION OF TRAIN AND TEST IN RATIO 9:1
C. Proposed Architecture The proposed architecture for this project is shown in Fig. 1. The architecture diagram covers the two approaches that we are using on top of MentalBERT 1) MentalBERT with prompt tuning and 2) MentalBERT with stacked ensemble (LSTM/Bi-LSTM)
D. Models Explorations The approaches for exploring model were as follows: Plain vanilla MentalBERT Plain vanilla MentalRoBERTA MentalBERT with Fine Tuning MentalBERT with prompt tuning MentalRoBERTa with prompt tuning MentalBERT with ensemble (LSTM/Bi-LSTM) MentalBERT with ensemble (LSTM / Bi-LSTM) and prompt tuning
1) Vanilla classification: Out first step was to check the transformer models with basic fine-tuning. The maximum length for tokenization is set at 512 since some of the posts are quite long. We used Adam optimizer with a learning rate of le-5 and eps (epsilon value for numerical stability) of le-8. The optimizer was set on a linear schedule with an initial value of "warmup" $=0.$ We trained using MentalBERT and MentalRoBERTa and got a macro-avg fl score of 0.85 and 0.80 respectively. We proceeded further with MentalBERT as MentalRoBERTa two flavors weren't satisfying our requirements.
2) Finetune MentalBERT: Fine-tuning a pre-trained Transformer model refers to the process of enabling the model to perform a specific downstream task. Usually, this is done by adding another layer after the last layer of the model and training the entire network for a few epochs. The Adam optimizer is the most popular optimizer and uses learning rates from le-5 to 5e-5. Fine-tuning primarily makes the model learn for the task-specific patterns and since the early layers are stable and it is the last layers that change during fine-tuning. Many authors have based their work on a study by [9], which designed two experiments to evaluate the effectiveness of fine-tuning the BERT model [10] on a dataset of text-based mental health assessments. From the papers, we have seen that many authors have used classification layer used a multi-layer perceptron(MLP) with hyperbolic tangent activation and a learning rate of 3 e-05. We implemented a similar MLP for the classification layer. Further, we set the following hyperparameters:
Dropout 0.1 and 0
Learning rate 1 e-05, 2 e-05 3 -05
Linear warmup schedule with warmup =0 and warmup =0.1$ of total steps
For all layers, the weight decay was set at 0.1 and bias decay at 0. With Fine-tuning parameters: no dropout, Learning rate=3c-5 Warm-up Schedule =0.1, Weight Decay =0.1, Bias Decay=0, we got a fl-score of 0.88
3) Prompt Tuning: Fast learning is used as a powerful mechanism to improve the performance of the NLP task. Prompts can be created manually or automatically. Manual tokens involve prior knowledge of the domain and implementing domain-based rules. The choice of the pretrained Transformer model is crucial for prompt learning. BERT class of transformer are well suited since they already implement MLM [13]. To create prompts manually we extracted prompts from different mental health questionnaires available in different medical health datasets and explored the possibility of classifying mental health problems based on their responses [6]. For Prompt tuning, we used the OpenPrompt library with ManualTemplate and ManualVerbalizer. The ManualTemplate is where the prompting takes place. Example of adding prompts to our dataset: INPUT STATEMENT: this made me smile maybe it will make you smile too PROMPT: The feeling is like ? ANSWER: 4 (index 4 stands for NoMentallIllness) These prompts are fed to our language model to aide them in understanding the classification
Another attribute, which we passed to our language model is a list of words which are pivotal in understanding the category of the mental health issue. This attribute is the verbalizer. For our experiments we got the best results with Manual Verbalizer. The following label-words were used for Manual Verbalizer (Fig. 2):
Fig. 2. Prompt Tuning OpenPrompt with Manual Verbalizer
We were able to achieve a fl-score of 0.85 on MentalBERT with prompt tuned text.
4) Ensemble:
An ensemble of classification was planned to replace the classification layer subsequent to the MentalBERT layer. This experiment was conducted with an ensemble of Random Forest, KNearestNeighbours, and GaussianNB with final stacking done using LogisticRegression similar to experimentation by authors [1] [2] [27]. The classifiers in the ensemble cannot perform well on sequential data and hence the classification results did not meet expectation but LSTM and Bi-LSTM offer considerable advantages due to their ability to handle long-distance dependencies [28]. Hence we implemented a Hybrid Stacked model of MentalBERT and LSTM/Bi-LSTM. We set Istm_hidden_size =128, num_layers=2, batch_first=True and bidirection=True(for Bi-LSTM only) as the parameters for LSTM/Bi-LSTM. BERT's sequence output (last_hidden_state) was passed to the LSTM model. The fl-score of the LSTM/Bi-LSTM were 0.87 and 0.88 respectively.
Precision, recall, and Fl scores are common metrics for NLP classification models, defined by true positive, true negative, false positive, and false negative measures [29]. Performance metrics used for scoring include accuracy, precision, recall, confusion matrix, AUC, macro-mean, and micro-mean. In addition to basic summary statistics [16], probabilistic approaches and soft precision, recall, and fl-scores [12] can also be used to evaluate models as they give more meaningful results than crisp variants. A comparison of the results for the various experimental runs conducted is presented in Table IV highlighting the top-scoring items for each category in bold. The comparison report indicates minimal variation between the various model configurations. They all provided weighted average fl-scores between 85% and 88% except for MentalRoBERTa. Average accuracy also lies in the range of 86% and 88%.
1) Analysis of classification results by mental health disorder: Among the various classes, No Mental Illness and ADHD provide the top f1-scores. PTSD and Anxiety fl-scores are close to the average fl-score. Depression is 1-2% less than the average fl-score. But the lowest fl-score is seen for Bipolar. Our understanding of bipolar mental disorder is that it is temporal and is characterised by mood swings. So, a single post will not suffice to determine bipolar disorders. A deeper analysis (refer to Fig 3) of the input data during the exploratory phase of our research also shows that the dataset for bipolar has a high number of short sentences ( < 10 words in a sentence) which may not convey the full sentiment. Since Bipolar is an outlier in this set of classes, we also analyzed the average fl scores for the models if we exclude Bipolar and we see that the fine-tuned version of Mental BERT with a learning rate of 3e-5, gives an average fl score of 90%. Depression has a multitude of sentiments and many of them can overlap with other mental disorders. As a result, there is a high chance of False Positives (FPs) and False Negatives (FNs) where the emotions overlap. This is also perhaps the reason why the literature survey shows that depression has been handled in a binary classification model rather than a multi-class classification model. "No Mental Illness" class provides a high FI score which indicates that there are very few False Positives (FPs) and False Negatives(FNs). The distinction between a text indicating a mental health disorder and one without is significant. And within mental health disorders, ADHD is also easily identifiable. For all other disorders, it is very clear that further effort needs to be put into data augmentation techniques either in the form of better prompts or lexically stronger datasets.
Fig. 3. Frequency distribution of post length (number of words) by mental disorder
2) Prompt tuning + MentalBERT: The run for Prompt tuning followed by MentalBERT did not give good results in comparison to the plain MentalBERT runs. We used the Manual Template and Manual Verbalizer with high-probability words that indicate each mental health disorder. We tried multiple approaches Automatic and Knowledgeable verbalizers and different Manual templates for prompting. However, the results were not very encouraging. For hard prompts, the label words were picked from the symptom scales for various disorders described in the subject matter expert compilation CRIS NLP service library [7]. Further analysis of the word cloud and suitable addition of more words in label words can be explored. Research[32] has shown that when the label words from different classes appear in a text, the classification accuracy will lower. For example, if a post says "Post the accident, I feel very sad", the prompt tuner would most likely classify it under depression (use of "sad") though the post is very clearly related to PTSD. So the hypothesis behind the labeling also needs to be explored. Prompt engineering with different verbalizations covering different aspects of the context needs to be explored.
3) MentalBERT + LSTM/Bi-LSTM: Our second approach involved Stacking LSTM or BiLSTM on MentalBERT. The idea was to leverage the contextual embeddings provided by MentalBERT and combine them with the sequential dependencies that LSTM or Bi-LSTM capture. The performance did not meet expectations. LSTM and BiLSTM help capture long-distance dependencies. However, we are considering social media posts where the average length of words in a post is much lower. A self-attention layer after the LSTM or Bi-LSTM needs to be explored. Self-attention with its ability to weigh tokens independent of their distance can help.
The summary of our observations from the experiments conducted:
MentalBERT (mental-bert-base-uncased) performed better than MentalRoBERTa (mental-roberta-base)
Dropouts are not making a significant impact
Learning rates of 2e-5 and above improve performance
Prompt Tuning and Classification did not provide the expected outcomes
Exploring prompt generation in multiple ways is a task that can be taken up as a future direction
Hybrid model of MentalBERT + LSTM and MentalBERT + Bi-LSTM also did not provide a significant improvement over the base model
We demonstrated that using the various fine-tuning and model configurations, we can achieve fl-scores of 88 and above for multi-class classification of mental health disorders which is a good improvement over the prior work reported in the past. In their research, [3] have demonstrated and average f1 score of 82.67 with the fl scores for the various classes ranging between 0.70 and 0.98. The original MentalBERT f1-score for multi-classification on the DReddit dataset, Depression was classified using a binary classification model of MentalBERT with an fl score of 0.94. In comparison, we get an fl-score of 0.86 in multi-classification mode. However, for the multi-classification exercise involving more than 5 mental health disorders MentalBERT achieved an fl score of 0.80. Thus we conclude that our experiments yielded on average fl score of 0.86 and above.
As future activities, we look forward to working on the following areas:
Deeper audit of the dataset to weed out outliers and short posts.
Experiment with binary classification for depression.
We could consider further pre-processing techniques and increasing the hidden-size parameter to 256 for Hybrid model of MentalBERT + LSTM/Bi-LSTM
Try MentalRoBERTa with higher hardware specifications (CPU/GPU/memory).
Explore Autoprompt generation.
Consider images attached to posts in the dataset as well.
REFERENCES
- N. AlGhamdi, S. Khatoon, and M. Alshamari, "Multi-aspect oriented sentiment classification: Prior knowledge topic modelling and ensemble learning classifier approach," Applied Sciences, 12(8), 2022. DOI: 10.3390/app12084066. [cite: 353, 354]
- D. R. Amancio, C. H. Comin, D. Casanova, G. Travieso, O. M. Bruno, F. A. Rodrigues, and L. da Fontoura Costa, "A systematic comparison of supervised classifiers", PLOS ONE, 9(4):1-14, 04 2014. DOI: 10.1371/journal.pone.0094137. [cite: 355, 356]
- I. Ameer, M. Arif, G. Sidorov, H. Gomez-Adorno, and A. Gelbukh, "Mental illness classification on social media texts using deep learning and transfer learning", 2022. [cite: 359, 360]
- T. Borah and S. Ganesh Kumar, "Application of nlp and machine learning for mental health improvement", In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2022, Volume 3, pages 219-228. Springer, 2022. [cite: 361, 362]
- E. Cambria, D. Das, S. Bandyopadhyay, and A. Feraco, "A Practical Guide to Sentiment Analysis", Springer, New York City, 2017. [cite: 363, 364]
- N. M. B. R. Centre, "Cris nlp service library of production-ready applications", Mar. 2023. [cite: 364, 365]
- A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian, "SMHD: a large-scale resource for exploring online language usage for multiple mental health conditions", In Proceedings of the 27th International Conference on Computational Linguistics, pages 1485-1497, 2018. [cite: 366, 367, 368]
- F. Crestani, D. E. Losada, and J. Parapar, "Early Detection of Mental Health Disorders by Social Media", Springer, New York City, 2022. [cite: 370, 371]
- H.-J. Dai, C.-H. Su, Y.-Q. Lee, Y.-C. Zhang, C.-K. Wang, C.-J. Kuo, and C.-S. Wu, "Deep learning-based natural language processing for screening psychiatric patients", Frontiers in psychiatry, 11:533949, 2021. [cite: 372, 373]
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding", 2019. [cite: 374]
- A. Dinu and A.-C. Moldovan, "Automatic detection and classification of mental illnesses from general social media texts", In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 358-366, 2021. [cite: 375, 376]
- P. Fränti and R. Mariescu-Istodor, "Soft precision and recall", Pattern Recognition Letters, 167:115-121, 2023. [cite: 377]
- GeeksForGeeks, "Understanding bert nlp", 2020. [cite: 378, 379]
- M. Ghorbani, M. Bahaghighat, Q. Xin, and F. Özen, "Convlstmconv network: a deep learning approach for sentiment analysis in cloud computing", Journal of Cloud Computing, 9(1):16, mar 5 2020. [cite: 380, 381]
- D. Goularas and S. Kamis, "Evaluation of deep learning techniques in sentiment analysis from twitter data", In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pages 12-17, 2019. [cite: 382, 383]
- C. Goutte and E. Gaussier, "A probabilistic interpretation of precision, recall and f-score, with implication for evaluation", In Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, pages 345-359. [cite: 384, 385]
- Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, "Domain-specific language model pretraining for biomedical natural language processing", ACM Transactions on Computing for Healthcare, 3(1):1-23, Oct. 2021. [cite: 387, 388]
- X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, "Ptr: Prompt tuning with rules for text classification", 2021. [cite: 390, 391]
- K. Harrigian, C. Aguirre, and M. Dredze, "On the state of social media data for mental health research", 2021. [cite: 392]
- J. Ive, N. Viani, J. Kam, L. Yin, S. Verma, S. Puntis, R. N. Cardinal, A. Roberts, R. Stewart, and S. Velupillai, "Generation and evaluation of artificial mental health records for natural language processing", NPJ digital medicine, 3(1):69, 2020. [cite: 393, 394]
- S. Ji, T. Zhang, L. Ansari, J. Fu, P. Tiwari, and E. Cambria, "Mentalbert: Publicly available pretrained language models for mental healthcare", 2021. [cite: 395, 396]
- N. Kant, R. Puri, N. Yakovenko, and B. Catanzaro, "Practical text classification with large pre-trained language models", 2018. [cite: 397]
- B. Lamichhane, "Evaluation of chatgpt for nlp-based mental health applications", 2023. [cite: 398]
- S. S. Mukherjee, J. Yu, Y. Won, M. J. McClay, L. Wang, A. J. Rush, and J. Sarkar, "Natural language processing- based quantification of the mental state of psychiatric patients", Computational Psychiatry, 4, 2020. [cite: 399, 400]
- P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, "Natural language processing: an introduction", Journal of the American Medical Informatics Association, 18(5):544-551, 2011. [cite: 401, 402]
- Y. Peng, S. Yan, and Z. Lu, "Transfer learning in biomedical natural language processing: an evaluation of bert", 2019. [cite: 403]
- T. Pranckeviius and V. Marcinkevicius, "Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification", Balt. J. Mod. Comput., 5, 2017. [cite: 404, 405, 406]
- N. Rai, D. Kumar, N. Kaushik, C. Raj, and A. Ali, "Fake news classification using transformer based enhanced Istm and bert", International Journal of Cognitive Computing in Engineering, 3:98-105, 2022. [cite: 407, 408]
- M. Wankhade, A. C. S. Rao, and C. Kulkarni, "A survey on sentiment analysis methods, applications, and challenges", Artificial Intelligence Review, 55(7):5731-5780, 2022. [cite: 409, 410]
- T. Zhang, A. M. Schoene, S. Ji, and S. Ananiadou, "Natural language processing applied to mental illness detection: a narrative review", NPJ digital medicine, 5(1):46, 2022. [cite: 411, 412]
- A. Husseini Orabi, P. Buddhitha, M. Husseini Orabi, and D. Inkpen, "Deep learning for depression detection of Twitter users", In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology, pages 88-97, 2018. [cite: 413, 414, 415]
- J. Kim, J. Lee, E. Park, and J. Han, "A deep learning model for detecting mental illness from user content on social media", Scientific Reports, 10, 07 2020. DOI: 10.1038/s41598-020-68764-у. [cite: 416, 417, 418]
- A. Murarka, B. Radhakrishnan, and S. Ravichandran, "Detection and classification of mental illnesses on social media using roberta", 2020. [cite: 419]
Get Free Access to Exclusive Content — Courses, Job Tips, Quizzes, Guides & More Delivered Straight to Your Inbox!
Go Beyond Learning. Get Job-Ready.
Build real, in-demand skills for today's jobs by joining our free expert-led courses with hands-on projects and practical AI tools.
Go Beyond Learning. Get Job-Ready.
Build in-demand skills for today's jobs with free expert-led courses and practical AI tools.
Explore All Courses