AI Sandbagging – Computerphile

From Computerphile. Following the theme of AI research and safety, Aric Floyd talks about how some Large Language Models might follow the all too human trait of sandbagging – "lying" about their true capabilities. AI Sandbagging Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Computerphile is supported by Jane Street. Learn more about them (and exciting career opportunities) at: https://jane-st.co/computerphile This…

‘Forbidden’ AI Technique – Computerphile

From Computerphile. The so-called ‘Forbidden Technique’ with Chana Messinger — Check out Brilliant’s courses and start for free at https://brilliant.org/computerphile/ (episode sponsor) — More links in full description below ↓↓↓ Chana Messinger from 80,000 Hours talks about why we shouldn’t give AI access to its own chain-of-thought. More from Chana: https://youtube.com/@chana_messinger OpenAI paper: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf Computerphile…

Subroutines in Low Level Code – Computerphile

From Computerphile. Bashing out low-level code, it can be annoying to re-type the same commands over and over when you need to repeat a routine. Matt Godbolt explains how we can save frequently used code as a subroutine and reuse whenever its needed. Matt Godbolt is known as the creator of Compiler Explorer, among other…

Shortest Path Algorithm Problem – Computerphile

From Computerphile. A seemingly simple problem that’s "in general" incredibly difficult! CEO of Redwood Research Buck Shlegeris explains his favourite algorithmic fact! Buck wants to thank his friend Peter Schmidt-Nielsen for telling him this great fact. More on the sums of square roots problem: https://cstheory.stackexchange.com/questions/79/problems-between-p-and-npc#4010 Computerphile is supported by Jane Street. Learn more about them…

Ai Will Try to Cheat & Escape (aka Rob Miles was Right!) – Computerphile

From Computerphile. As Large Language Models improve, the tokens they predict form ever more complicated and nuanced outcomes. Rob Miles and Ryan Greenblatt discuss "Alignment Faking" a paper Ryan’s team created – ideas about which Rob made a series of videos on Computerphile in 2017. The Alignment Faking paper: https://tinyurl.com/C-Paper-AlignmentFaking Ryan Greenblatt is chief scientist…