SchemaScribe-AI
Link to open source: https://github.com/Anoop-singh225/SchemaScribe-AI
Link to Live Project: https://schemascribeai.netlify.app/
Manual data profiling, messy schemas, and LLM math hallucinations make data analysis a tedious and error-prone process for data teams. SchemaScribe-AI solves this by introducing an enterprise-grade, agentic approach to data management and analytics.
Instead of overwhelming the LLM with millions of raw rows (which triggers token limits and logical hallucinations), SchemaScribe-AI acts as an Intelligent Agentic Planner. It automatically profiles heavy datasets, generates AI-powered data dictionaries, performs automated missing-value imputation, and visually maps multi-file Entity-Relationship Diagrams (ERDs) using semantic key matching.
With its built-in secure Agentic Sandbox, users can chat with their datasets in plain natural language. The AI seamlessly translates human queries into high-performance Python/Pandas vectorized code, executes it on the backend, and returns 100% mathematically accurate business insights and analytical charts."
Key Achievements / Highlights:
-
100% Accurate Analytics: Vectorized data engine prevents LLM hallucinations by executing deterministic Python code in a secure sandbox.
-
Semantic Multi-File Mapping: Automatic Primary/Foreign key detection to render interactive Mermaid.js ERD diagrams.
-
Smart Data Studio: One-click IQR-based outlier mitigation and missing data profiling.
This build was uploaded as a hackathon project



