AI Data Scraper
Minimal AI web scraper
Year
2025
Type of Project
Web scraping & data preprocessing utility
My Role
Frontend Engineer & AI Integrator

Case Study
Objective
Create a lightweight, browser-based tool to responsibly scrape and preprocess website text for AI training, supporting both raw text extraction and structured JSON output driven by custom prompts. Provide non-technical users with an easy way to batch-process multiple URLs and export AI-ready text or JSON data while enforcing basic ethical checks.
https://github.com/ujjwalredd?tab=repositories
Process
Implemented a React + TypeScript single-page application with Tailwind CSS for a clean, responsive UI that works on desktop and mobile.
Integrated Google Gemini API as the backend logic to transform scraped content into either cleaned text or structured JSON based on user-defined prompts.
Added dual modes (Text and JSON), batch URL input via textarea or
.txtupload, concurrent processing for multiple URLs, and real-time status display per URL.Built utilities to copy results to clipboard or download them as
.txt/.jsonfiles, and wired an initial copyright “respect-first” check that blocks scraping when restrictions are detected.
Outcome
Delivered an interactive web app that can process multiple URLs at once and return AI-ready text or structured JSON suitable for building training corpora or small domain-specific datasets.
Improved data collection workflows by combining scraping, prompt-based structuring, and export options into a single, minimal interface requiring only a browser and an API key.
Standout Features
Dual-mode, prompt-driven extraction
Batch Processing
Ethical Copyright Check
Concurrent Processing
Download & Copy
Clean & Minimalist UI
Responsive Design