Content pfp
0 reply
0 recast
0 reaction

clun.eth pfp
Any teams working on having multimodal models “drive” a UI? For instance I show Bing screenshots of calendar UI and ask it what buttons to press and what to put in each text field. It clearly knows what to do so it should be possible to have it output some structured data that can generate button clicks etc.
1 reply
0 recast
2 reactions

manansh ❄️  pfp
manansh ❄️
Also curious to know if anyone is building here.
0 reply
0 recast
1 reaction