Dartmouth College researchers have developed an artificial intelligence (AI) model that can be used to predict mental disorders using data from conversations on Reddit, according to a university post.
Researchers Xiaobo Guo, Yaojia Sun, and Soroush Vosoughi presented a paper titled “Emotion-Based Modeling of Mental Disorders on Social Media” at the 20th International Conference on Web Intelligence and Intelligent Agent Technology.
According to the article, most of these AI models that currently exist operate on the basis of psycho-linguistic content analysis of user-generated text. Despite showing high levels of performance, content-based representation models are affected through domain and subject bias.
Vosoughi explained to a science writer from Dartmouth when talking about the possibility that if a model learns to correlate the word “COVID” with “sadness” or “anxiety”, it will automatically assume that a scientist researching COVID and posting about it suffers from depression and anxiety.
The new model removes these subject-specific biases by being entirely based on emotional states while learning nothing about the subject described in the messages.
To train the model, the researchers collected two datasets between 2011 and 2019: the first was a dataset of users with one of the three emotional disorders of interest (major depressive, anxiety, and bipolar disorders) and the second was a set of user data. without known mental disorders, which served as a control group.
The first set of data was collected on the basis of self-reported mental disorders, i.e. the researchers looked for users who had posted messages or comments that said something similar to “I received a diagnosis of bipolarity/depression/anxiety”. Only posts posted before the self-report were considered for research, as previous work had shown that making users aware that they had a disorder would alter their online behavior and create bias.
The researchers then ensured that the data belonging to the four classes (one for each of the users with each disorder of interest and a control group) had similar temporal distributions: this means that the data from the four classes had a similar temporal distribution publications. The datasets were also balanced with 1,997 users for each of the classes.
After that, the researchers divided the data into training (70%), validation (15%) and testing (15%). After training the model on the data and then testing it, the researchers found that the emotion-based representation model they used was more accurate in predicting disorder than the TF-IDF content-based method ( Term Frequency – Inverse Document Frequency). TF-IDF is used to calculate the importance of a keyword, based on its frequency and the importance of the post.