DataMorphix
Link to open source: https://github.com/YugPatel11/DataMorphix/
DataMorphix is an AI-powered Intelligent Data Dictionary platform designed to automatically analyze and document datasets. Users upload CSV, Excel, or JSON files, and the system uses Google Gemini AI to analyze column structures, generate descriptions, suggest clean column names, compute dataset health scores, and track data flow and lineage. It is built on a React and Tailwind CSS frontend with a Django REST backend, utilizing Pandas for data manipulation.
This project is highly useful to data analysts, teams, and enterprises because it automates the tedious and error-prone process of manually documenting data. Instead of spending hours deciphering cryptic column names or dealing with undocumented schemas, users get an instant, AI-generated data dictionary, quality alerts, and cross-dataset relationship mappings. Additionally, users can query their data using natural language, enabling non-technical stakeholders to get instant visual charts and insights without writing code.
I built DataMorphix to solve the common industry challenge of undocumented datasets and poor data governance. By automating metadata generation and validation checks, I wanted to create a tool that ensures documentation is always up to date and reliable. This project also allowed me to integrate advanced LLM capabilities with data processing frameworks, building a robust, full-stack application that bridges the gap between raw data and actionable user understanding.
This build was uploaded as a hackathon project











