The most common ML interview mistake I’ve seen described, across dozens of engineering blogs and postmortems, isn’t failing to know an algorithm. It’s jumping to a solution before understanding what problem is actually being solved. Someone asks “how would you handle an imbalanced dataset?” and the candidate immediately goes to SMOTE, when the interviewer was hoping to understand whether you’d first ask about class imbalance ratio, business cost of false positives versus false negatives, and whether resampling is even the right intervention.
That’s the frame worth keeping in mind for everything below.
Fundamentals questions (entry level, 0-2 years)
These show up in phone screens and first-round technical interviews. The interviewers aren’t looking for textbook definitions. They want to see that you can explain things clearly to a non-specialist.
- What’s the difference between supervised and unsupervised learning? Can you give an example of a problem where you’d use each?
- Explain the bias-variance tradeoff in plain language.
- What does overfitting mean and how would you detect it in practice?
- Why would you use regularization? What’s the difference between L1 and L2?
- What’s cross-validation and why does it matter?
On the bias-variance tradeoff: the answer most people give is technically correct but abstract. A stronger answer ties it to something real. “High bias means the model is making simplifying assumptions that don’t fit the data, like fitting a line to something that’s actually quadratic. High variance means the model is memorizing noise from the training set and won’t generalize.” That’s still the textbook answer, but grounded in an image.
Algorithms and feature engineering (mid-level, 2-5 years)
At this level the questions get into why you’d choose one approach over another, not just what an approach is.
- When would you use a random forest versus a gradient boosting model?
- What’s the curse of dimensionality and how does it affect distance-based algorithms?
- How do you handle missing data? Walk through your actual decision process.
- What’s target encoding and when does it create data leakage?
- How would you evaluate a model on an imbalanced dataset where accuracy is misleading?
The imbalanced dataset question is a reliable tell for senior candidates versus mid-level. Someone who goes straight to accuracy versus F1 is showing they understand metrics. Someone who first asks “what does a false negative cost the business versus a false positive?” is showing they understand the problem. Those are different things.
Deep learning questions
These vary wildly by role. A computer vision role will go deep on CNNs and augmentation. An NLP role will go into transformers and tokenization. Generalist ML roles often stay at the conceptual level.
- How does backpropagation work? What are vanishing and exploding gradients?
- What’s the difference between batch normalization and layer normalization?
- When would you choose a transformer over an LSTM for a sequence task?
- What is attention, and why did it change NLP more than any single architectural change before it?
I’d be honest: most candidates I’ve heard about who failed deep learning rounds knew the architecture but couldn’t explain why the architectural choice mattered for the specific problem. “We used a transformer” is less interesting than “we switched from an LSTM because the LSTM couldn’t capture long-range dependencies in the input sequences.”
Model evaluation and production systems
This is where senior and staff-level interviews differentiate. Anyone can train a model. Fewer people have thought carefully about what happens after.
- How do you monitor a production model for performance degradation?
- What’s concept drift and how is it different from data drift?
- How would you design an A/B test to measure whether a new model is actually better?
- What does a feature store do and when does it matter?
- How would you handle a model that performs well offline but poorly in production?
The offline/online performance gap question is genuinely interesting and doesn’t have one right answer. The useful things to discuss: training-serving skew (the model was trained on data that doesn’t match what it sees at inference time), distribution shift over time, latency constraints that force model simplification, and feedback loops where the model’s own predictions affect future training data. Mentioning any two of these seriously is better than covering all four superficially.
The PROBLEM framework (for system design rounds)
ML system design interviews have become more common at senior levels. The questions are open-ended: “design a recommendation system for a streaming platform” or “how would you build a fraud detection system.” Having a consistent approach matters more than the specific answer.
One framework that works: Problem definition first (what are we actually trying to predict?), Requirements (latency, throughput, freshness), Options and tradeoffs, Best approach given constraints, Limitations of that approach, Evaluation metrics, and Monitoring plan. That roughly spells PROBLEM, if you want a mnemonic.
The BLS Occupational Outlook for data scientists projects 36% employment growth through 2033, much faster than average across occupations. The Stack Overflow 2024 Developer Survey found that ML/AI was the fastest-growing specialty area by developer interest, with 38% of respondents saying they were learning it or actively working in it.
The volume of candidates means companies are doing more filtering rounds, not fewer. The practical effect is that knowing the material well isn’t enough if you can’t explain it clearly under pressure. Practicing with an AI tool like Craqly that provides real-time prompting during mock ML sessions can help with the delivery side, though there’s no substitute for actually working through the problems until the reasoning is second nature.
What question type trips you up most? For most people it’s either the system design open-ended rounds or the “explain your tradeoff reasoning” behavioral-technical hybrid. Both get easier with repetition.