DataSage AI β Intelligent Metadata & Data Quality Copilot
Link to open source: https://github.com/kunals2007-jpg/DataSage-AI.git
Overview
DataSage AI is an AI-powered metadata intelligence platform that automatically extracts database schema, evaluates data quality, and generates business-friendly documentation. It bridges the gap between technical database structures and business understanding.
In many organizations, database documentation is outdated or too technical, making it difficult for business users to understand what data means and how reliable it is. DataSage AI solves this by combining automated profiling with AI-driven interpretation and conversational querying.
Problem We Solve
Enterprise databases often lack:
-
Updated documentation
-
Business context for technical schema
-
Clear data reliability indicators
-
Easy accessibility for non-technical users
This leads to slow analytics, low data trust, and inefficient decision-making.
Our Solution
DataSage AI automatically:
-
Connects to PostgreSQL databases
-
Extracts complete schema metadata (tables, columns, keys)
-
Performs intelligent data profiling (null %, uniqueness, DQ score)
-
Generates AI-powered business summaries
-
Enables natural language chat with the database schema
-
Generates SQL queries from user questions
We donβt just document databases β we make them understandable.
Key Features
πΉ Automated Metadata Extraction
Extracts tables, columns, primary keys, and relationships automatically.
πΉ Intelligent Data Profiling
Calculates:
-
Null percentage
-
Distinct count
-
Completeness
-
Uniqueness
-
Data Quality Score
πΉ AI-Generated Business Documentation
Converts technical schema into:
-
Business summary
-
Use cases
-
Risk insights
-
Data quality recommendations
πΉ Conversational Schema Intelligence
Users can ask:
-
βWhich table contains revenue?β
-
βIs customer data reliable?β
-
βWhich columns have high null values?β
πΉ SQL Generation
Natural language β Correct PostgreSQL query.
Tech Stack
Backend:
-
Flask
-
SQLAlchemy
-
Pandas
AI Layer:
-
Gemini API
-
RAG (Vector Search using ChromaDB)
Database:
-
PostgreSQL
Frontend:
-
Bootstrap Dashboard
Future Scope
-
Multi-database support (Snowflake, SQL Server)
-
Real-time schema monitoring
-
Data lineage visualization
-
Enterprise SaaS deployment
-
Role-based access control
-
ML-based anomaly detection
Impact
DataSage AI:
-
Reduces manual documentation effort by up to 80%
-
Improves data trust across teams
-
Empowers non-technical users
-
Accelerates analytics workflows
This project has strong potential to evolve into a scalable enterprise data intelligence SaaS platform.
This build was uploaded as a hackathon project







