THIS is the REAL DEAL 🤯 for local LLMs

Post Content

This is the stack that gets me over 4000 tokens per second locally.
Download Docker Desktop here: https://dockr.ly/4mOdGMO to get up and running with Docker Model Runner quickly.

🛒 Gear Links 🛒
💻☕ Thunderbolt 5 external SSD: https://amzn.to/3XqetZO
💻☕ Favorite 15″ display with magnet: https://amzn.to/3zD1DhQ
🎧⚡ Great 40Gbps T4 enclosure: https://amzn.to/3JNwBGW
🛠️🚀 My nvme ssd: https://amzn.to/3YLEySo
📦🎮 My gear: https://www.amazon.com/shop/alexziskind

🎥 Related Videos 🎥
🏆 Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING – https://youtu.be/bAao58hXo9w
💻 Smallest RTX Pro 6000 rig | OVERKILL – https://youtu.be/JbnBt_Aytd0
🔧 Cheap mini runs a 70B LLM 🤯 – https://youtu.be/xyKEQjUzfAk
🌙 RAM torture test on Mac – https://youtu.be/l3zIwPgan7M
🚀 FREE Local LLMs on Apple Silicon | FAST! – https://youtu.be/bp2eev21Qfo
🪞 REALITY vs Apple’s Memory Claims | vs RTX4090m – https://youtu.be/fdvzQAWXU7A
📦 Set up Conda – https://youtu.be/2Acht_5_HTo
🤖 INSANE Machine Learning on Neural Engine – https://youtu.be/Y2FOUg_jo7k

-Julia Turk’s FP4 video: https://www.youtube.com/watch?v=-cRedoYETzQ
-NVIDIA post on quantization: https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/

* 🛠️ Developer productivity Playlist – https://www.youtube.com/playlist?list=PLPwbI_iIX3aQCRdFGM7j4TY_7STfv2aXX
🔗 AI for Coding Playlist: 📚 – https://www.youtube.com/playlist?list=PLPwbI_iIX3aSlUmRtYPfbQHt4n0YaX0qw

— — — — — — — — —

❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺
Click here to subscribe: https://www.youtube.com/@AZisk?sub_confirmation=1

— — — — — — — — —

Join this channel to get access to perks:
https://www.youtube.com/channel/UCajiMK_CY9icRhLepS8_3ug/join

— — — — — — — — —

📱 ALEX ON X: https://twitter.com/digitalix

#rtxpro6000 #llm #macbook Read More Alex Ziskind