Email Harvester 🕵️♂️
📋 Overview
Email Harvester is a full‑stack web application that automatically discovers email addresses under a given domain, checks each against public data‑breach APIs to flag compromised accounts and presents the results in a simple, interactive interface. It combines a Flask‑powered Python backend for scraping and breach lookups with a Node.js/EJS frontend, all orchestrated via Docker Compose for one‑step deployment.
🎯 Key Features
- Domain email scraping: Crawl websites and public resources to extract email addresses.
- Asynchronous processing: Backend tasks run in parallel for high throughput.
- Secure vs. predicted: Classify emails found directly (“secure”) versus those inferred by pattern (“predicted”).
- Breach verification: Integrate with HaveIBeenPwned (or similar) to highlight compromised addresses.
- Intuitive UI: Built with Node.js and EJS templates for real‑time feedback.
- Persistent storage: Store results in MongoDB for later analysis and export.
- Containerized deployment: Docker Compose script brings up all services (backend, frontend, database) in one command.
🏗️ Architecture
- Backend (Flask/Python)
- Exposes REST endpoints to initiate domain scans, retrieve stored results and perform breach checks.
- Utilizes asynchronous HTTP requests and background workers to handle large crawls.
- Frontend (Node.js/EJS)
- Renders search form and results pages.
- Interacts with the backend API to start scans and fetch updates.
- Database (MongoDB)
- Persists scraped email addresses, classification tags and breach‑status metadata.
- Orchestration
- Docker Compose defines three services:
backend
,frontend
andmongodb
, ensuring consistent environments and easy scaling.
- Docker Compose defines three services:
🔧 Prerequisites
🚀 Getting Started
-
Clone the repo
git clone https://github.com/AlBovo/emailHarvester.git cd emailHarvester
-
Configure
- Copy
.env.example
to.env
and set your MongoDB URI and breach‑API key.
- Copy
-
Build & run
docker-compose up -d --build
-
Access the app
Open http://localhost:3000 in your browser.
⚙️ Usage
- Enter the target domain in the search field.
- Click “Harvest” to start scraping and breach verification.
- View the results table, with each email’s classification and breach status.
- Export or clear results as needed via the UI controls.
🖼️ Screenshots