What is Market Basket Analysis?
Market Basket Analysis is a data mining technique used to identify patterns of co-occurrence among items in transactional data. It uncovers which products are frequently purchased together and is widely applied in retail for product bundling, personalized recommendations, cross-selling strategies, and inventory optimization.
In this project, I applied Market Basket Analysis to a context that merges data science with behavioural economics. The objective was to test and validate the Diderot Effect, a consumer behaviour bias that suggests a single purchase can trigger a cascade of related purchases. This effect has long been discussed in qualitative studies; here, I took a data-driven approach using publicly available data and sequential pattern mining.
Project Overview

Objective
- To validate the Diderot Effect using transaction-like data inferred from online reviews. The goal was to:
- Determine whether certain products (e.g., laptops) trigger follow-up purchases
- Quantify the volume, timing, and cost of those complementary purchases
- Demonstrate how machine learning and consumer psychology can be combined to drive marketing insights
Key Technologies
> Python (Pandas, NumPy, Matplotlib, Seaborn)
> Web scraping: Selenium, BeautifulSoup
> Cloud infrastructure: Google Cloud VM, Microsoft Azure ML Studio
> Data mining: Generalized Sequential Pattern Algorithm (GSP)
Source: Amazon.com reviews
Context Understanding
The study was grounded in the Diderot Effect theory, which posits that acquiring a single new item can lead to a sequence of purchases to restore aesthetic or functional consistency. 
From a marketing perspective, identifying these “departure products” and their complementary sequences offers valuable insights for product recommendation, bundling, and customer segmentation.
Research question:
Can we empirically detect patterns consistent with the Diderot Effect in consumer behaviour using sequential analysis of publicly available review data?

Data Understanding
Due to the lack of access to proprietary transaction data, I used verified purchase reviews on Amazon.com as a proxy. I focused on laptops, hypothesizing that they could serve as “departure products” due to their high cost, visibility, and potential for symbolic value.
- Data source: Reviews from Amazon’s best-selling laptops
- Sample: 11,678 unique users
- Observations: 118,582 verified purchase reviews
- Time window: 6 months before and after each laptop purchase
Data Preparation
A custom scraping tool was developed to collect product reviews and metadata. 
Key tasks included:
- Filtering for verified purchases
- Extracting product categories, prices, and timestamps
- Creating derived features such as time between purchases, purchase sequence rank, and categorizing purchases as antecedent/consequent/laptop
- Data cleaning steps included handling missing values (especially for price), resolving category inconsistencies, and filtering invalid product links.
Modelling
To identify purchase sequences and assess the presence of the Diderot Effect, I applied the Generalized Sequential Pattern (GSP) algorithm.
- Sequences were mined across 4 levels of product taxonomy (e.g., Electronics → Laptop Accessories → Cases → Basic Cases)
- Minimum support thresholds were adapted to uncover meaningful but infrequent patterns
- Metrics such as days between purchases and sequence frequency were computed
Evaluation
The results confirmed the existence of strong sequential patterns:
> Over 10% of users who purchased a laptop also bought a case, mouse, or headphones within 10 purchases of the laptop
> The first complementary item was often bought within 10 days
> Median total spending on accessories was $26, with significant outliers above $3,000
Frequent sequences included:
(Laptop → Bag)
(Laptop → Keyboard)
(Laptop → Headphones)
This supported the hypothesis that laptops can act as departure products, triggering the formation of new Diderot unities.
Deployment / Communication
The insights have direct applications in marketing:
- Recommendation engines can prioritize accessories immediately after major purchases
- Bundling strategies can be tailored based on likely complementary items
- Temporal data suggests that promotional windows should target the initial 10-day post-purchase period
- Segmentation models can be enriched with behavioural bias indicators
The methodology and findings were documented as an academic study, but this case study was specifically crafted to demonstrate how behavioural economics, consumer psychology, and data science can intersect to generate measurable business value.
Why This Project Matters
This work goes beyond standard association rule mining. It’s an example of how data science can be used to validate behavioural theory, offering actionable insights for real-world marketing strategies. It also demonstrates how to creatively use public data as a proxy for transactional databases, an essential skill when working with limited access to proprietary datasets.
Back to Top