Skip to main content
Software Engineering Daily

Foundation Models for Structured Data

44 min episode · 2 min read
·
Jura Leskiewicz

Episode

44 min

Read time

2 min

Topics

Productivity, Relationships, Startups

AI-Generated Summary

Key Takeaways

  • Relational Deep Learning Architecture: Treat any enterprise relational database as a graph where tables are nodes and foreign key relationships are edges, then apply graph transformer attention mechanisms across this structure. This eliminates the need to manually join tables or pre-define SQL aggregations before training, letting the model learn which data combinations matter for a given prediction task.
  • Foundation Model for Tabular Data (Kumo RFM): A single pre-trained foundation model can connect to any structured database schema and answer predictive queries without task-specific retraining. Users define predictions using a structured "predictive query" language — for example, specifying churn as zero transactions in the next 30 days — and the frozen model fetches relevant subgraphs and returns probability scores on demand.
  • Feature Engineering Cost Reduction: Traditional predictive modeling requires approximately two full-time data scientists per production model, making 10 simultaneous models a 20-person commitment. Relational deep learning eliminates manual feature engineering by attending directly over raw database rows, columns, and linked tables, compressing that multi-person effort into a single reusable model infrastructure.
  • Model Size and Inference Efficiency: Relational deep learning models operate below one billion parameters, significantly smaller than large language models, enabling faster and cheaper inference. However, they require a specialized graph engine backend that optimizes database representations into graph form and supports subgraph sampling at scale across tens to hundreds of billions of nodes and edges.
  • Synthetic Pre-Training Data via Plural Method: A Stanford research paper called Plural demonstrates that synthetic structured tabular data can be generated at scale to pre-train relational foundation models, solving the data scarcity problem that would otherwise limit training. For high-throughput production use cases like real-time fraud scoring at millions of transactions per second, fine-tuning a smaller distilled version of the pre-trained model delivers speed and cost efficiency.

What It Covers

Stanford professor Jure Leskovec and Kumo AI co-founder explains how relational deep learning applies transformer-based graph attention mechanisms directly to structured enterprise database tables, enabling foundation models that generate fraud detection, churn prediction, and recommendation outputs without manual feature engineering.

Key Questions Answered

  • Relational Deep Learning Architecture: Treat any enterprise relational database as a graph where tables are nodes and foreign key relationships are edges, then apply graph transformer attention mechanisms across this structure. This eliminates the need to manually join tables or pre-define SQL aggregations before training, letting the model learn which data combinations matter for a given prediction task.
  • Foundation Model for Tabular Data (Kumo RFM): A single pre-trained foundation model can connect to any structured database schema and answer predictive queries without task-specific retraining. Users define predictions using a structured "predictive query" language — for example, specifying churn as zero transactions in the next 30 days — and the frozen model fetches relevant subgraphs and returns probability scores on demand.
  • Feature Engineering Cost Reduction: Traditional predictive modeling requires approximately two full-time data scientists per production model, making 10 simultaneous models a 20-person commitment. Relational deep learning eliminates manual feature engineering by attending directly over raw database rows, columns, and linked tables, compressing that multi-person effort into a single reusable model infrastructure.
  • Model Size and Inference Efficiency: Relational deep learning models operate below one billion parameters, significantly smaller than large language models, enabling faster and cheaper inference. However, they require a specialized graph engine backend that optimizes database representations into graph form and supports subgraph sampling at scale across tens to hundreds of billions of nodes and edges.
  • Synthetic Pre-Training Data via Plural Method: A Stanford research paper called Plural demonstrates that synthetic structured tabular data can be generated at scale to pre-train relational foundation models, solving the data scarcity problem that would otherwise limit training. For high-throughput production use cases like real-time fraud scoring at millions of transactions per second, fine-tuning a smaller distilled version of the pre-trained model delivers speed and cost efficiency.

Notable Moment

Leskovec draws a direct parallel to computer vision: just as hand-engineering elephant-detection features became obsolete once neural networks learned directly from pixels, manually engineering database features for churn or fraud models will become equally obsolete as graph transformers learn directly from raw relational data.

Know someone who'd find this useful?

You just read a 3-minute summary of a 41-minute episode.

Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Software Engineering Daily

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Software Engineering Daily.

Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime