AI Sandbagging – Computerphile

From Computerphile.

Following the theme of AI research and safety, Aric Floyd talks about how some Large Language Models might follow the all too human trait of sandbagging – "lying" about their true capabilities.

AI Sandbagging Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations

Computerphile is supported by Jane Street. Learn more about them (and exciting career opportunities) at: https://jane-st.co/computerphile

This video was filmed and edited by Sean Riley.

Computerphile is a sister project to Brady Haran’s Numberphile. More at https://www.bradyharanblog.com

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

mBlip

The YouTube aggregator for thoughtful, intelligent people.

AI Sandbagging – Computerphile

Related posts