DataLoom Intelligent Data Dictionary
Link to open source: https://github.com/Happy829/DataLoom-Intelligent-Data-Dictionary
🌐 DataLoom: Intelligent Data Dictionary Agent
Grounded AI. Zero Hallucinations. Fact-Based Business Context. Built by Team BrainBots
DataLoom is a software-only, AI-powered platform designed to eliminate the enterprise data bottleneck. By connecting directly to your existing relational databases (Snowflake, PostgreSQL, SQL Server), DataLoom automatically generates, maintains, and serves comprehensive, business-friendly data dictionaries. It bridges the gap between cryptic technical schemas and actionable business intelligence.
⚠️ The Core Problem
Modern data teams are paralyzed by undocumented systems and silent failures. DataLoom targets these critical bottlenecks:
-
The "Tribal Knowledge" Trap: Cryptic schemas make data discoverability a nightmare, forcing analysts to rely on word-of-mouth to understand if "Revenue" includes freight costs.
-
The Data Engineering Bottleneck: Non-technical users (Marketing, Sales) cannot write SQL, creating endless request backlogs.
-
Complex Lineage Labyrinths: Answering simple questions requires traversing undocumented foreign keys and complex joins.
-
Silent Data Quality Failures: Dashboards fail quietly when underlying data degrades (e.g., unexpected nulls).
-
Schema Drift: Manual documentation becomes instantly obsolete the moment a database updates.
💡 The Solution: Hybrid Contextualization Architecture
DataLoom solves these issues through a unique four-step "Thought Process" that guarantees zero schema hallucination by grounding Generative AI in empirical statistical data.
The 4-Step Logic Flow
-
Discovery (The "What"): Scans the database catalog via our SQLAlchemy Universal Connector to map raw structures (tables, columns, explicit and inferred keys).
-
Profiling (The "Health"): Queries the data to calculate mathematical distributions (min/max, null percentage, distinct count, variance).
-
Inference (The "Why"): The Hybrid AI Context Engine merges the schema "skeleton" with the profiling "health" to generate the "flesh"—accurate, context-aware business descriptions.
-
Interaction (The "How"): Exposes this knowledge via a conversational UI (RAG), translating natural language directly into accurate SQL lookups.
✨ Key Features
-
Automated Contextualization: Instantly generates clear, business-friendly descriptions for tables and columns. (Look for the Caribbean Green "AI Generated" badges).
-
Conversational SQL Generation: Allows non-technical users to ask questions in plain English (e.g., "What is the average delivery time for electronics in São Paulo?") and outputs executable SQL.
-
Continuous Health Profiling: Monitors completeness scores, freshness, and data distributions to flag anomalies before they break downstream reports.
-
Visual Data Lineage: Provides intuitive, interactive node-based relationship diagrams mapped out with React Flow.
This build was uploaded as a hackathon project








