Mo pfp

Mo

@meb

468 Following
576 Followers


Mo pfp
Mo
@meb
How I would define an AI agent at the start of 2025: In abstract, a piece of software that can achieve a certain goal, and uses AI to navigate the fuzzy path of achieving that goal. More specifically, a combination of - Goal - Context management - Underlying model - Tools (datastores and functions with effects) - Prompting & self-prompting strategies (including delegating to other agents) Example: A geocoding agent Goal: Parse content from a sentence and geocode it Context Management: Trimmed sliding window. We only need the very most recent messages to get the job done. Underlying model: A simple model like 4o mini is more than enough Tools: A function to search cities with inbuilt query match ranking, and a function with access to a Google Maps geocoding API Prompting: Single turn. The agent should be able to get the job done in a single exchange, or ask the user to clarify the location if needed
0 reply
0 recast
2 reactions

Mo pfp
Mo
@meb
Following on yesterdays, MCP discussion, a very interesting X thread about this In brief; MCP makes sense as a clearing house for capabilities. Agents don’t need to rebuild the tool integrations, they can just access existing capabilities with standard auth. This reminds me of Amazons policy that teams and their products can only communicate over APIs https://x.com/abacaj/status/1897040132974538835?s=46&t=heYf7m0kaq62UF_MFK0e6g
1 reply
0 recast
2 reactions

Mo pfp
Mo
@meb
AGI confirmed in Cursor Agent mode today with 4 « final » fixes in a row “Let me fix the remaining type issues:” “Let me make one final fix to handle the type imports correctly: » « One final attempt to fix the cookie options type: » « Final fix for the Cookies type: » « Let me make one final attempt with the correct type import: »
0 reply
0 recast
1 reaction

Mo pfp
Mo
@meb
I've been seeing alot of people complimenting the EQ of gpt 4.5. This is actually a huge feature of models outside of pure benchmark crushing. Ability to relate to a human, pickup on intent and have a much more fine grained discussion will be a huge unlock for next wave of apps. The reduced hallucinations are also a massive boost for serious business apps. https://x.com/adonis_singh/status/1896582818316406901/photo/1
0 reply
0 recast
1 reaction

Mo pfp
Mo
@meb
Finally found legit use case. Had to record a product demo today. Superposed my phone on top of a dynamic background made in keynote, and boom. Professional-looking product recording!
1 reply
2 recasts
4 reactions

Mo pfp
Mo
@meb
[Thread 🧵] - Summary of yesterday's presentation by @jrf Product vision @atlas faciliates high quality connections and meaningful social interactions Focus is on front-end agents, ie. agents a human can interact with. Technical underpinnings - Each message to Atlas relies requires a certain kind of context (user context, thread context) - Existing frameworks like Eliza or Langgraph are heavy. Having full control over what you are sending to the LLM is super important. - Atlas can interact with different modes ie. personalities. Understanding which "mode" is most appropriate for a user is a key part - Embeds can allow the invocation of Atlas in specific modes. This means an infinity of frames powered by Atlas could be created. This is a whole new design space. - Filtering measures allow Atlas to avoid spam and engage with users that have a minimum level of reputability. In future this might be more fine-grained, with users instead
4 replies
4 recasts
9 reactions

Mo pfp
Mo
@meb
Weekly call starts now https://meet.google.com/nqi-nmqp-wgm
0 reply
1 recast
1 reaction

Mo pfp
Mo
@meb
Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents Key Insight: The paper introduces a novel method using a QA-based chain-of-thought (QA-CoT) prompting framework that automatically extracts structured dialog workflows from historical customer service interactions. Commercial Relevance: For AI product developers and customer service automation platforms, this insight offers a way to reduce manual workflow design and maintenance. Automated extraction of precise and consistent dialog workflows can speed up integration, enhance agent performance, and reduce costs—making customer service bots more robust and scalable. These methods enable fast, reliable, and scalable creation and evaluation of dialog workflows, ensuring service agents can mimic human-like decision processes and maintain high consistency in customer support tasks.
1 reply
0 recast
2 reactions

Mo pfp
Mo
@meb
New experiment in this channel: Scientific Paper Summaries We will monitor for new papers coming out, and share relevant knowledge. The target audience is builders like you and me, who want to ship stuff, stay up to date, and not get lost in deep mathematical concerns. Key Insight: Automatic Prompt Optimization (APO) techniques automate the process of refining prompts for large language models (LLMs) to improve task performance without requiring access to the model's internal parameters. Commercial Relevance: APO techniques are highly relevant for businesses leveraging LLMs in AI products, as they reduce the need for manual prompt engineering, which is time-consuming and requires expertise. These methods can enhance the performance of LLMs across various tasks, making them more reliable and efficient for end-users. https://arxiv.org/pdf/2502.16923
2 replies
0 recast
0 reaction

Mo pfp
Mo
@meb
Weekly AI Research group call is tomorrow 1pm EST. @jrf will be joining us to share learnings about Atlas, and what a great agentic framework would look like. Add the event to your calendar and let's meet there! https://calendar.google.com/calendar/u/0/share?slt=1AXpMJuZqiNpGhQ2fm6vM9CdRBkw8UCMk2zYnzde7Sl-iYPYOitfl8cPO7pAEcy-riW8CWJUvBdVNj1HHazIJWwbJJFtuaAiS-a2W
3 replies
2 recasts
11 reactions

Mo pfp
Mo
@meb
Some off the cuff benchmarking: If John is Sarah's brother, and Anne is Sarah's niece, who is Anne in relation to John? Correct (In order of relevance & speed) deepseek-chat-v3 gpt o3-mini Wrong gemini flash 2.0 gpt 4o gpt 4o mini
0 reply
0 recast
1 reaction

Mo pfp
Mo
@meb
Genuine question; What is the case for using Redux in 2025?
1 reply
0 recast
1 reaction

Mo pfp
Mo
@meb
Finally upgraded MacOS to sequoia. Can someone tell me what the use case is for iPhone mirroring ie. using my phone on my laptop?
1 reply
0 recast
1 reaction

Mo pfp
Mo
@meb
This is a great summary on the business proposition for models. Recap: - Small space for model providers. Main offer is providing reliable infra. Race to the bottom in terms of pricing, and low customer fidelity. - Value sits at application layer. Deep business workflow integration required. Avoid something that can be a generic consumer feature on ChatGPT web - Businesses care about outcomes, not tech. Sell the results, not the model - Every job will have its own set of agents. Employee net value delivery goes from months to hours / minutes https://youtu.be/aIKfA3gIXwo?si=Pi1Qjy1aThC2L234
0 reply
0 recast
3 reactions

Mo pfp
Mo
@meb
A simple rule that would stop EU leaders warmongering and trying to feel relevant; 25% of all politicians calling for war, as well as their close ones randomly selected to be sent on the front lines
3 replies
0 recast
1 reaction

Mo pfp
Mo
@meb
wen grok3 API? I'm on x.AI site and can only see grok2 mentioned
0 reply
0 recast
0 reaction

Mo pfp
Mo
@meb
0 reply
0 recast
3 reactions

Mo pfp
Mo
@meb
Today I learned the ISO standard for AI Management Systems is 42001 Meme reality quantum field collapse confirmed
1 reply
0 recast
4 reactions

Mo pfp
Mo
@meb
Celebrating 6 months streak in /dev today. Thank you for all the educational chats! Ask me anything and I’ll be glad to respond
1 reply
2 recasts
5 reactions

Mo pfp
Mo
@meb
Sharing the official anthem for Weekly Hackathon Fully AI Generated of course https://www.youtube.com/watch?v=0TF_WSfugTE
1 reply
1 recast
4 reactions