Giuliano Giacaglia pfp

Giuliano Giacaglia

@giu

516 Following
308943 Followers


Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
In some ways, these metrics risk becoming vanity metrics—an example of Goodhart's Law in action: “When a measure becomes a target, it ceases to be a good measure.” A fitting example of this phenomenon comes from Google Books. The Google Books team once said, "All OCR datasets have been solved, but OCR itself has not been solved." This highlights how solving specific benchmarks doesn't always equate to solving the larger problem. As we see new and better models being developed, it's important not to get too caught up in benchmark scores alone.
0 reply
1 recast
1 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
With the announcement of new models and their impressive benchmark performance, it's important to provide some context around AI models and their benchmarks. The issue with benchmarks is that they can become the goal in and of themselves. While they serve as a useful proxy for evaluating model performance, they don't necessarily reflect how well a model performs in real-world use cases. Benchmarks measure how well large language models (LLMs) perform in specific scenarios, but this doesn't always translate directly to broader, practical applications.
1 reply
1 recast
11 reactions

ted (not lasso) pfp
ted (not lasso)
@ted
hot take: xAI will be the top consumer AI platform because it isn’t designed to address biased (better when you need accurate, unfiltered info) and has unique data access (X, SpaceX, Tesla) + consider how fast they built the supercluster + Elon is only living founder who sidequests successfully (and he’s petty)
6 replies
15 recasts
120 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
xAI announces funding of 6B for their Series C https://x.com/xai/status/1871313084280644079
2 replies
7 recasts
29 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Congrats!!!
0 reply
0 recast
1 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
💯💯💯
0 reply
0 recast
2 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Congrats!
0 reply
0 recast
0 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Didn’t study this benchmark. No idea 🤷‍♂️
0 reply
5 recasts
5 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Just starting
1 reply
8 recasts
45 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai
0 reply
1 recast
7 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
More on o3: The most expensive version of the model is extremely capable and expensive
2 replies
7 recasts
28 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Is this the AI wall that everyone is talking about?
0 reply
4 recasts
19 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Interesting!
0 reply
0 recast
1 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Inference in big tech is almost 75% of all the compute used in AI models. That may shift higher given that these models can now perform better if they use more computation. Though I expect that the equilibrium will be closer to 2/3. That’s the average that animals use for inference. I wouldn’t be surprised if neural nets end up at the same rate given cost constraints
0 reply
0 recast
8 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
The fact of the matter is that o3 is extremely impressive and some benchmarks show how impressive it is: It has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation
1 reply
0 recast
5 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
The most interesting aspect of it all is that it is an **expensive** model. That means that it uses a lot of compute during inference. That means that o3 might be too expensive to use for a lot of tasks but it also means that inference chips will be in high demand
1 reply
0 recast
7 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
My thoughts on OpenAI o3: As I’ve mentioned before the o1 LLM seems like a paradigm shift in how we build models and how they are deployed. First, the o1 showed that you can use RL to get better results during inference time. That means that you can increase computational spend and improve its efficacy. o3 takes it one step further
1 reply
5 recasts
26 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
I guess what @eulerlagrange.eth is saying is that lenders or backers don’t necessarily know the reputation of borrowers. That could probably be solved using zkTLS cause you may be able to infer borrowers’ credit “score” based on their financial data That gives more transparency to lenders that can lend onchain and set a more market-driven rate based on credit score
2 replies
0 recast
1 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Something to watch on the market is that the 10-year note yield is going up despite the fact that JPow cut interest rates. This is likely showing that the market is expecting inflation to be persistent. That will affect mortgage rates and everything in between
1 reply
4 recasts
24 reactions

ted (not lasso) pfp
ted (not lasso)
@ted
posting to /economics in case anyone has a take on this cc @giu @dwr.eth @julia ps who else on farcaster is interested in future of energy?
6 replies
2 recasts
16 reactions