Research Publications

"Hold on honey, men at work": A semi-supervised approach to detecting sexism in sitcoms


Television shows play an important role in propagating societal norms. Owing to the popularity of the situational comedy (sitcom) genre, it contributes significantly to the overall development of society. In an effort to analyze the content of television shows belonging to this genre, we present a dataset of dialogue turns from popular sitcoms annotated for the presence of sexist remarks. We train a text classification model to detect sexism using domain adaptive learning. We apply the model to our dataset to analyze the evolution of sexist content over the years. We propose a domain-specific semi-supervised architecture for the aforementioned detection of sexism. Through extensive experiments, we show that our model often yields better classification performance over generic deep learning based sentence classification that does not employ domain-specific training. A quantitative analysis along with a detailed error analysis presents the case for our proposed methodology. 

#Whydidyoustay? Using NLP to analyze causes of long lasting domestically abusive relationships

WhatsApp Image 2021-06-10 at 3.58.25 PM.

The Pandemic has caused an increase in domestic violence cases and has made it even more difficult for victims to leave. Given the number of resources available to help people stuck in domestically abusive relationships walk away, there is a huge gap between the number of resources existing and those actually availed. This project, still in its infancy, aims to answer the research question, how can we use NLP to examine the reasons behind long-lasting domestically abusive relationships? We aim to conduct an analysis based on multiple sources of data and leverage various modeling approaches to come up with an optimal answer to our research question.

Toward the early detection of child predators using deep learning

WhatsApp Image 2021-06-10 at 3.58.25 PM.

Due to the Pandemic, children have more time and unfettered access to electronic devices. This, combined with the desire to interact with new people, has given child predators more opportunities to seduce children. Our work aims to leverage the PAN12 dataset and deep learning methods such as BERT-based approaches and Bi-LSTMs to develop a model that can identify child predators within minutes of the conversation being initiated. 

Personal Projects

Polly - An AI driven platform for PCOS Awareness

Feb 2021 - June 2021

To provide young women around the world with the opportunity to assess their likelihood of having the Polycystic Ovarian Syndrome, I have developed a web application for PCOS that comprises of a conversational agent named Polly. Polly is a retrieval based chatbot that considers factors like menstrual cycle regularity, visible symptoms of excessive androgens, body mass index, period pains and family history to determine whether they are prone to having PCOS. This application is currently in the deployment phase and there is also an android application we have developed for the same cause. This project was a team effort and my responsibility involved developing and integrating Polly with the rest of the application. Polly is powered by the Google Dialogflow API.

Analyzing Sentiment with the IMDb Dataset

Feb 2021 - June 2021

Implemented the research paper published at IEEE CICN in 2020. This paper used the IMDb Dataset to examine how certain supervised machine learning algorithms could be leveraged to classify movie reviews as negative or positive. I also investigated whether additional supervised models could be used to outperform the existing classifiers in the original paper. All models were evaluated against five metrics and it was found that the support vector classifier has the best performance, with an accuracy of 90%. The frameworks used include Pandas, Numpy, Seaborn, Matplotlib and scikit learn.

Review Bay: A sentiment analysis platform

Dec 2019 - Feb 2020

For the first round of the Smart India Hackathon 2020, we developed a sentiment analysis platform as per the requirements of a problem statement given by ISRO. This platform is capable of aggregating sentiment, providing in depth product analytics and classifying product reviews. This platform was developed using Django, Flask, Tensorflow, Keras, Pandas, Numpy and Matplotlib. 

My Flair Detector: A BERT based approach to text classification

April 2020 - June 2020

I developed an end to end web application that is capable of detecting the flair of a reddit post given the post URL. The frameworks used for this project include Flask, Heroku, BERT, NLTK, Pandas, Numpy and WordNet. The code is available here

© 2021 by Smriti Singh. Proudly created with

  • White Twitter Icon
  • White LinkedIn Icon
  • github-153-675523