Skip to main content
Software Engineering Daily

Context-Aware SQL and Metadata with Shinji Kim

41 min episode · 2 min read
·

Episode

41 min

Read time

2 min

Topics

Science & Discovery

AI-Generated Summary

Key Takeaways

  • Automated metadata collection: SelectStar parses SQL query logs to track which tables join together, join conditions, and usage frequency across users, creating a knowledge graph without manual documentation that reveals actual data relationships and trust signals through behavioral patterns.
  • Three-layer metadata architecture: Physical assets form layer one, usage signals like popularity and lineage comprise layer two, and business context including semantic models and metrics definitions make layer three. This structure enables AI to find correct datasets and generate accurate queries.
  • Cost optimization through usage tracking: Organizations reduce cloud warehouse billing by identifying unused tables and unviewed BI dashboards through popularity metrics. Combining lineage with usage data reveals which data models consume resources without delivering value to end users or downstream systems.
  • MCP server for AI workflows: SelectStar's Model Context Protocol server provides four tools—metadata search, asset details, lineage traversal, and impact analysis—that enable AI agents in Claude and Cursor to generate queries with higher accuracy by accessing popularity scores and example queries.

What It Covers

SelectStar founder Shinji Kim explains how automated metadata platforms solve data discovery challenges by analyzing query logs to build knowledge graphs, enabling AI agents to generate accurate SQL through popularity scores, lineage tracking, and semantic models.

Key Questions Answered

  • Automated metadata collection: SelectStar parses SQL query logs to track which tables join together, join conditions, and usage frequency across users, creating a knowledge graph without manual documentation that reveals actual data relationships and trust signals through behavioral patterns.
  • Three-layer metadata architecture: Physical assets form layer one, usage signals like popularity and lineage comprise layer two, and business context including semantic models and metrics definitions make layer three. This structure enables AI to find correct datasets and generate accurate queries.
  • Cost optimization through usage tracking: Organizations reduce cloud warehouse billing by identifying unused tables and unviewed BI dashboards through popularity metrics. Combining lineage with usage data reveals which data models consume resources without delivering value to end users or downstream systems.
  • MCP server for AI workflows: SelectStar's Model Context Protocol server provides four tools—metadata search, asset details, lineage traversal, and impact analysis—that enable AI agents in Claude and Cursor to generate queries with higher accuracy by accessing popularity scores and example queries.

Notable Moment

Kim reveals that foundation models trained on world data fail against real enterprise databases because messy data with similar table names, denormalized structures, and multi-level calculations causes hallucinations that example queries and popularity context prevent.

Know someone who'd find this useful?

You just read a 3-minute summary of a 38-minute episode.

Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Software Engineering Daily

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Software Engineering Daily.

Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime