A recruiter at a mid-size AI startup told me she reviewed 312 ML engineer applications in January 2026 and scheduled exactly 19 first-round calls. That’s a 6% pass rate before any technical screen. The resume filter wasn’t even an ATS. It was her, skimming GitHub links and looking for one thing: evidence you’ve shipped something with a model attached.
That detail stuck with me because most ML interview prep advice focuses entirely on what happens inside the interview loop. The filter before you get there is a different problem. This guide covers both.
What the ML interview loop actually looks like in 2026
The standard loop at most companies with a dedicated ML team runs four to six rounds: a recruiter screen, a take-home or timed coding challenge, a technical deep-dive on fundamentals, a systems design round, and sometimes a research presentation. At larger companies (Google DeepMind, Meta FAIR, Cohere, Mistral) you’ll also hit a “research fit” conversation where they probe whether you can read a paper and critique it on the spot.
Startups compress this. A Series B company might do one 90-minute technical call and a quick team fit chat. The compressed format feels friendlier but it’s actually harder, because you have less room to recover from a weak answer in one area.
One thing I see candidates consistently underestimate: the system design round is often weighted more heavily than the coding round for senior roles. I’ve seen strong LeetCode performers get passed over because they couldn’t walk through a realistic feature store architecture without hand-waving.
The fundamentals questions that still show up everywhere
Despite all the noise about LLMs, interviewers at most companies still probe classical ML concepts in depth. The questions that trip people up aren’t the hard ones. They’re the ones candidates assume they can answer without actually practicing the explanation out loud.
- Explain the bias-variance tradeoff. Then explain it again as if the listener is a backend engineer who hasn’t touched ML since university.
- Why would you use AUC-ROC over F1 for a fraud detection model? (Hint: class imbalance is usually part of the answer, but interviewers want to hear you reason about threshold sensitivity.)
- Walk through gradient descent. What goes wrong at very high and very low learning rates? What does a loss curve that oscillates tell you?
- Describe attention in transformers without using the word “attention” in your first sentence. If you can’t, you probably don’t have the mental model yet.
The Stack Overflow Developer Survey 2024 found that 62% of developers now work with or adjacent to AI/ML tools at their job, but only about 28% felt confident explaining model architecture decisions. That gap is where most interview failures happen.
ML system design: the questions and a framework for answering them
The canonical ML system design question format is: “Design a recommendation system for [product].” Or: “How would you build a real-time fraud detection model at scale?” The trap is jumping straight to model architecture when the interviewer actually wants to see you think about the problem before the solution.
A structure that works:
- Problem definition (roughly 5 minutes). Clarify the objective, success metrics, and constraints. “What’s the acceptable latency?” and “Is this online or batch?” are not obvious questions. Ask them.
- Data layer (10 minutes). What signals exist? What’s the ground truth? How is training data generated? Where does label noise come from?
- Modeling decision (10 minutes). Start simple and explain why you’d iterate. A gradient boosted tree on tabular features is often the right first answer before moving to neural approaches.
- Infrastructure and serving (10 minutes). Feature stores, model versioning, A/B testing framework, latency requirements, fallback logic.
- Monitoring and drift (5 minutes). How do you know the model degraded six months after launch?
I’ve sat in on enough mock system design sessions to say honestly: most candidates skip step five entirely. That’s where senior interviewers probe hardest, because it reveals whether you’ve run models in production or only trained them in notebooks.
LLM and GenAI questions in 2026
This section is moving fast enough that anything I write here could be dated by the time you read it, so I’ll focus on the underlying concepts that are likely to stay relevant.
Expect questions about fine-tuning tradeoffs (LoRA vs full fine-tuning and when each makes economic sense), RAG architecture (when retrieval beats fine-tuning), hallucination mitigation patterns, and how you’d evaluate a generative system when ground truth is subjective.
The LinkedIn Economic Graph reported in late 2024 that job postings mentioning “LLM” or “large language model” grew 5x between 2022 and 2024 in engineering roles. Companies aren’t always clear in those postings about whether they want someone who builds LLMs or someone who builds products on top of them. The interview will tell you which one they actually want, usually in the first 15 minutes.
A study plan that doesn’t assume you have 8 free hours a day
Most prep timelines I see online are written for someone on a leave of absence. Most real candidates have jobs. Here’s a compressed 6-week version for someone fitting prep into evenings:
Weeks 1-2: Fundamentals refresh. Pick one textbook chapter per night (the Bishop PRML book or Goodfellow’s Deep Learning work fine). Don’t try to read linearly. Jump to the concepts you’re fuzzy on and read those sections.
Weeks 3-4: System design practice. One design problem per session, talked out loud, ideally with someone else in the loop to push back. Craqly’s mock interview mode lets you run through AI/ML system design prompts with real-time feedback on your reasoning structure, which is faster than trying to find a study partner with the same prep schedule.
Weeks 5-6: Company-specific prep and behavioral rounds. Look at recent papers from the team you’re interviewing with. LinkedIn and Semantic Scholar both index author affiliations. If you can mention a paper from the team in the research fit round, that signals you’re genuinely interested and not just cycling through interview loops.
The parts of ML interviews nobody writes about
Coding expectations are lower than in pure SWE interviews, but not zero. Candidates who struggle with implementing a binary search tree from scratch usually also struggle implementing a gradient descent loop from scratch. The underlying skill is the same: translating a concept you understand abstractly into working code under pressure.
Behavioral questions matter more than candidates expect at senior levels. “Tell me about a time a model you built performed worse than expected in production” is a real question that real interviewers ask. The answer they want isn’t a story where you heroically fixed everything. They want to see how you reasoned through failure.
And the thing I’m least confident about: I don’t know how much the bar has shifted in the past six months at companies that have gone all-in on internal AI tooling for their own engineering workflows. It’s possible the expectations for what you need to know without looking it up have compressed. Or it’s possible they’ve expanded because everyone’s assumed to know more. I’d ask the recruiter directly.