Ankur Goyal
@ankrgyl
Thoughts on GPT-3/LLM is a "better database" from someone who has worked on relational databases for over a decade and AI for half. tl;dr I think they have the potential to be (1/n)
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
First, a quick example. Let's query some knowledge about the S&P 500. Assuming you had growth rates saved in a database, this would be a SQL query like: > SELECT "year" FROM sp_growth ORDER BY "growth_rate" DESC LIMIT 1 https://i.imgur.com/zOpUdoM.jpg
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
One way of thinking about this pattern is "LLMs are an index". This is limiting b/c LLMs support unstructured data, but let's start here. Indexes are measured by query perf, storage size, and update cost.
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
LLM queries are remarkably fast (constant time) for almost any query (w/ tunable cost via max length). The tradeoff of course is accuracy, since you cannot guarantee correctness. Note in my prompt I asked for citations. Verifying truthfulness is an active area of research.
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
Storage size/compression also exciting. Emad has a great line about Stable Diffusion is Pied Piper because of how efficiently it compresses ~5B images into ~4GB of weights. Columnar compression is usually 4-10x, not 1000x.
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
However, data loading is very, very difficult. Database indexes incrementally update w/ new data. With LLMs, you need to fine tune a model with both a representative set of input questions and the underlying data jointly.
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
Memory transformers (https://arxiv.org/abs/2006.11527) are one attempt at solving for this. Generally speaking, I think splitting out the data from the reasoning capabilities will be a requirement for this use case.
1 reply
0 recast
0 reaction
Ankur Goyal
@ankrgyl
I'm personally very optimistic about this intersection. Imagine a database that can answer SQL queries, natural language questions, or a combination of both! Lots of challenges to solve around accuracy & perf, but the basic pieces are all there.
0 reply
0 recast
0 reaction