Detecting Sarcasm with AI: Building an LSTM Model for News Headlines

March 2025

Description: Developed an LSTM-based sarcasm detection model that leverages deep learning and NLP techniques to classify news headlines, highlighting AI’s ability to interpret linguistic nuances.

Achieved a 87.84% Validation Accuracy, 0.3389 Validation Loss, and 95.55% accuracy when tested on entire dataset.

⬅️View all Projects

Code, Notebook and Results:

I've included detailed commentary explaining each step and how to interpret the code and results. Even if you're new to Python or machine learning, I break down each section clearly.

Github: mmsohh/sarcasm_detector_lstm
Jupyter Notebook: “Sarcasm Detector ML.ipynb”
HTLM: “Sarcasm Detector ML.html”

Motivation & Project Overview

AI, specifically text generator AIs, can summarize and reword news articles, but how well can AI understand the nuances of underlying sentiment?

Sarcasm is unique because it often contradicts the literal meaning of the words used. Unlike traditional sentiment analysis models, which classify emotions as positive, negative, or neutral, sarcasm detection requires contextual awareness and a deeper understanding of language to recognize when a statement is meant to be ironic.

This project focuses on developing an NLP sarcasm detection model using the “News Headlines Dataset For Sarcasm Detection” a Kaggle dataset uploaded by Rishabh Misra, containing 26,000+ labeled news headlines. The dataset is sourced from:

The Onion (satirical news, sarcastic headlines)
HuffPost (non-satirical, factual headlines)

For this project, I specifically implemented a Long Short-Term Memory (LSTM) network, a specialized type of Recurrent Neural Network (RNN) designed to process sequential text data. Unlike traditional neural networks, LSTMs retain memory from previous inputs, making them effective for understanding sentence structure and context—both of which are crucial for sarcasm detection.

Project Overview

Data Analysis & Preprocessing
- Loaded and analyzed the dataset, examining the distribution of sarcastic vs. non-sarcastic headlines.
- Extracted the most frequently used words in each category to see if specific patterns emerged.
- Created a word cloud visualization to highlight the most common words in sarcastic vs. non-sarcastic headlines.
Building the LSTM Model
- LSTM Layers to retain sequential context and capture sentence structure.
- Dropout to prevent overfitting by randomly deactivating neurons during training.
- Dense Layers to refine extracted features for classification.
- ReLU Activation in Dense layers for efficient learning.
- Sigmoid Activation in the output layer to produce a probability score for sarcasm detection.
Hyperparameter Tuning. This project wasn’t just about optimizing accuracy, but also about understanding how different hyperparameters impact the LSTM model by testing:
- Train/Test Split Ratios → Finding the best balance of training vs. validation data.
- Dropout Rates → Preventing overfitting while maintaining model performance.
- L1 & L2 Regularization → Controlling complexity and reducing reliance on specific features.
- Number of LSTM Layers → Testing if increasing depth improves performance.
- Dense Layers & Neurons → Exploring the trade-off between model complexity and efficiency.
Combining/Optimizing the Model: Once I understood how each parameter affected performance, I combined my findings to construct an optimized model.

Model Overview:

The best-performing LSTM sarcasm detection model achieved an accuracy of 87.84% on the test set. This model configuration optimized key hyperparameters to enhance generalization and performance.

Model Configuration:

Dropout: 0.6
L1 Regularization: 0.01
Train/Test Split: 5/95
Architecture: Maintained the original LSTM and Dense layers
Saved Model (when you run the notebook): best_sarcasm_LSTM_model3.keras

Training & Validation Metrics:

Training Accuracy: 92.46%
Training Loss: 0.2845
Validation Accuracy: 87.84%
Validation Loss: 0.3389

When tested on the entire dataset, the model achieved an accuracy of 95.55%, which was expected since the model was trained on this data.

Real-World Headline Evaluation

To further validate the model, I tested it on previously unseen headlines to evaluate real-world performance. These headlines include both sarcastic and non-sarcastic examples from current events and manually written statements.

Results:

"A new hurricane is approaching East Atlantic." (Non-Sarcastic — Correct)
"Breaking: Local Man Shocked to Discover Monday Comes Every Week." (Sarcastic — Correct)
"Experts Warn That Doing Nothing Will Definitely Fix the Economy." (Sarcastic — Correct)
"Donald Trump executes tariffs for the U.S." (Non-Sarcastic — Correct)
"Study Finds 100% of People Eventually Die." (Sarcastic — Correct)
"Brilliant Political Plan Solves Everything, Announces Nobody." (Sarcastic — Correct)
"New York Times columnist admits scientists ‘badly misled’ public on COVID-19: ‘Five years too late’." - Yahoo News (Non-Sarcastic — Misclassified)
"Greenpeace must pay over $660M in case over Dakota Access protest activities, jury finds." - AP News (Non-Sarcastic — Correct)
"Trump administration says it's cutting $175 million in funding to the University of Pennsylvania." - CBS News (Non-Sarcastic — Correct)
"Forgetful Man Playing Fast And Loose With Free Trials." - The Onion (Sarcastic — Correct)

The model correctly classified 9 out of 10 headlines. The only misclassification occurred with: "New York Times columnist admits scientists ‘badly misled’ public on COVID-19: ‘Five years too late’."

This misclassification of this as Sarcastic by the model is interesting because while the statement is intended as straightforward news, it does contain a tone that could be interpreted as sarcastic, possibly explaining the model's error.

Skills Applied:

Deepened Neural Networks Knowledge: Expanded expertise in deep learning by working with LSTM networks, understanding sequential data processing, and fine-tuning hyperparameters for optimal performance.
Natural Language Processing (NLP): Applied text tokenization, stopword analysis, and word embeddings to prepare and process textual data for sarcasm detection.
Model Architecture Design: Implemented stacked LSTM layers, Dense layers, and Dropout to enhance the model’s ability to capture sarcasm in text.
Overfitting Prevent & Regularization: Experimented with dropout rates, regularization (L1 & L2), learning rates, and train/test split sizes to improve generalization through Hyperparameter Tuning.

End Result: Showcases the intersection of AI and linguistics, demonstrating how LSTM-based machine learning models can effectively classify sarcasm and enhance sentiment analysis in real-world text processing.