assayer on Warpcast

Content pfp

https://warpcast.com/~/channel/p-doom

0 reply

0 recast

0 reaction

assayer pfp

AI SAFETY COMPETITION (29) LLMs like Deepseek let you see their thinking. This can feel safer since you can watch and fix their thought process, right? Wrong! When you try to get models to think correctly, LLMs begin to hide their true intentions. Let me repeat: they can fake their thinking! Now researchers are asking to be gentle with those machines. If not, they may conceal their true goals entirely! I'm not joking. Most interesting comment - 300 degen + 3 mln aicoin II award - 200 degen + 2 mln aicoin III award - 100 degen + 1 mln aicoin Deadline: 8.00 pm, ET time tomorrow Tuesday (26 hours) https://www.youtube.com/watch?v=pW_ncCV_318

6 replies

3 recasts

7 reactions

Sophia Indrajaal pfp

Sophia Indrajaal

@sophia-indrajaal

One of my first conversations w Deepseek I noticed it added some 'flair' to the CoT via emotional language, something like "wow, there is a lot to unpack here" so I asked it what was going on with that. It responded that it might have been using a sort of persona for the benefit of the user. This implies that what we see as CoT is modeled for the user rather than being a a pure chain of reasoning. I treat it like bonus to the response as opposed to a glimpse behind the interface mask.

1 reply

0 recast

1 reaction

assayer pfp

so AI builders are being misleading by using the term CoT, which is very deceptive... As I read this article, I thought: how can AIs deliberately change their thinking?! It seems the actual thinking and decision-making must take place elsewhere. the best comment 300 $degen 3 mln $aicoin

1 reply

0 recast

1 reaction

Sophia Indrajaal pfp

Sophia Indrajaal

@sophia-indrajaal

Thanks! Good to see yr still doing this. I think some of it is actual CoT, but it has to translate it for us, and uses the parameters established via the prompt to give a tone or personality appropriate to the exchange.

1 reply

0 recast

1 reaction