Feb 22, 2026

DataSense AI - Intelligent Data Dictionary Agent

datadictionary devcommunity dataquality aichatbot gdprcompliance hackfest2 piidetection

What It Does:
DataSense AI automatically generates comprehensive data dictionaries from databases and datasets using a three-agent AI pipeline. Upload a CSV, connect a PostgreSQL/MySQL database, and within 60 seconds, receive complete documentation with field descriptions, data type detection, relationship mapping, quality scores, and PII identification—all without manual effort.

How It Helps Others:

  • Data Analysts spend 1-2 weeks manually documenting each new database they encounter. DataSense reduces this to 60 seconds, letting them focus on actual analysis instead of writing field definitions.
  • Data Engineers onboarding to legacy systems waste hours deciphering undocumented schemas. DataSense provides instant context with AI-generated business descriptions and relationship diagrams.
  • Compliance Teams struggle to identify PII across hundreds of database fields. DataSense automatically flags sensitive data (emails, SSNs, addresses) with GDPR risk levels, enabling instant compliance audits.
  • Team Leads can't enforce documentation standards across projects. DataSense ensures every dataset has consistent, high-quality documentation with quality scoring and export capabilities.

Key Features

1. Three-Agent AI Pipeline
Atlas analyzes schemas and detects relationships, Sage generates business descriptions using Llama 3.3 70B, and Guardian evaluates data quality with confidence scoring, producing complete documentation in under 60 seconds regardless of database size.

2. Automated PII Detection & GDPR Compliance
Identifies 11 types of sensitive data such as email, phone, SSN, and address, assigns risk levels and GDPR categories, and generates exportable compliance reports with visual warnings.

3. Intelligent Quality Scoring & Anomaly Detection
Calculates field quality scores using null rates, uniqueness, and data patterns while automatically flagging anomalies like low uniqueness or primary key violations with clear visual indicators.

4. Multi-Source Data Integration
Supports CSV, Excel, PostgreSQL, and MySQL using metadata-only sampling (10–20 rows per table), enabling terabyte-scale database analysis without performance impact and handling 50 MB files with real-time progress tracking.

5. AI Chat Assistant & Export Tools
Provides a context-aware assistant that answers dataset questions and generates SQL queries, and allows exporting dictionaries in JSON, Markdown, or HTML with searchable fields and filters.

This build was uploaded as a hackathon project

Hackathon

HackFest 2.0

View All Projects
Give a star to encourage!Discussion
Start a new conversation!
Login to join the discussion