ML engineer data structures interview questions

6 questions on data structures for ml engineer candidates. Each entry has the question as asked, a sample answer outline, common follow-ups, and a reference implementation where applicable.

Showing 1 to 6 of 6 data structures questions.

As asked

Databricks catalog and metadata services often cache recently accessed entries to avoid repeated lookups. Implement a least-recently-used cache that supports get(key) returning -1 if not found, and put(key, value) which inserts or updates a key and evicts the LRU entry when the cache is at capacity. Both operations must run in O(1) time.

Sample answer outline

The canonical solution uses a hash map for O(1) key lookup combined with a doubly linked list to track recency order. get moves the accessed node to the head. put adds a new node to the head and removes the tail node if over capacity. Candidates should implement the linked list manually (not rely on Python's OrderedDict unless they explain it) and handle edge cases like capacity 1 and updating an existing key.

Reference implementation (python)

Python

class Node:
    def __init__(self, key=0, val=0):
        self.key, self.val = key, val
        self.prev = self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.map = {}
        self.head, self.tail = Node(), Node()
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def _insert(self, node):
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

    def get(self, key: int) -> int:
        if key not in self.map:
            return -1
        self._remove(self.map[key])
        self._insert(self.map[key])
        return self.map[key].val

    def put(self, key: int, value: int) -> None:
        if key in self.map:
            self._remove(self.map[key])
        self.map[key] = Node(key, value)
        self._insert(self.map[key])
        if len(self.map) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.map[lru.key]

Expect these follow-ups

How would you make this cache thread-safe in a multi-threaded Python environment?
How would you extend this to an LFU (least-frequently-used) cache?

company:databrickslru-cachehash-maplinked-listdata-structuresalgorithms

As asked

Cache management is central to Intel's CPU microarchitecture and to many software systems built on Intel hardware. Implement a Least Recently Used cache that supports get and put operations both in O(1) time. The cache has a fixed capacity and should evict the least recently used entry when full. After your implementation, explain how you would adapt the eviction policy for a hardware prefetcher hint scenario where recently used is not always the best predictor of future use.

Sample answer outline

A correct answer uses a hashmap for O(1) lookup combined with a doubly linked list to maintain recency order. The head of the list holds the most recently used entry and the tail holds the least recently used. On each get or put, the accessed node is moved to the head. On eviction, the tail node is removed and its key is deleted from the hashmap. The candidate should discuss Python's OrderedDict as a shortcut but be ready to implement from scratch.

Reference implementation (python)

Python

class Node:
    def __init__(self, key, val):
        self.key, self.val = key, val
        self.prev = self.next = None

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.cache = {}
        self.head, self.tail = Node(0,0), Node(0,0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def _insert_front(self, node):
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

Expect these follow-ups

How would you make this thread-safe for concurrent access without a global lock?
How would you extend this to support a TTL per key?

company:inteldata-structureshashmaplinked-listcachingcoding

As asked

Implement an LRU cache that supports get and put in O(1) time. The cache has a fixed capacity. When full, it evicts the least recently used entry before inserting a new one.

Sample answer outline

A strong answer uses a doubly linked list plus a hash map. The list maintains order by recency (most recent at head, least recent at tail). The hash map stores key to list node for O(1) lookup. On get, move the accessed node to the head. On put, if the key exists update and move to head; if not, insert at head and remove the tail if at capacity. The candidate should handle the edge case where capacity is 0 or 1, and discuss why a singly linked list is not sufficient without a tail pointer. In Python, OrderedDict achieves this with move_to_end but the interviewer may want the explicit implementation.

Reference implementation (python)

Python

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.cache = {}  # key -> node
        self.head, self.tail = Node(0,0), Node(0,0)
        self.head.next, self.tail.prev = self.tail, self.head

    def get(self, key):
        if key in self.cache:
            self._move_to_front(self.cache[key])
            return self.cache[key].val
        return -1

    def put(self, key, val):
        if key in self.cache:
            self.cache[key].val = val
            self._move_to_front(self.cache[key])
        else:
            node = Node(key, val)
            self.cache[key] = node
            self._insert_front(node)
            if len(self.cache) > self.cap:
                lru = self.tail.prev
                self._remove(lru)
                del self.cache[lru.key]

Expect these follow-ups

How would you make this thread-safe for concurrent access?
How would you extend this to an LFU cache?

company:nvidiadata-structureshash-maplinked-listcaching

As asked

Cycle detection is used in dependency graph validation inside Intel's build and package management tooling. Given the head of a singly linked list that may contain a cycle, return the node where the cycle begins. If there is no cycle, return null. Solve it in O(n) time and O(1) extra space, then prove why the second phase of Floyd's algorithm always finds the exact cycle entry node.

Sample answer outline

A correct answer uses Floyd's tortoise and hare algorithm: slow moves one step and fast moves two steps until they meet, confirming a cycle. Then, reset slow to head and advance both slow and fast one step at a time until they meet again. This meeting point is the cycle start. The candidate should prove why the second phase finds the cycle entry (the math shows the distance from head to cycle start equals the distance from the meeting point to cycle start).

Expect these follow-ups

What is the proof that the second phase always finds the entry node and not just any node in the cycle?
How would you find the length of the cycle once you have detected it?

company:intellinked-listalgorithmstwo-pointerscoding

As asked

Given an array of integers and a window size k, return an array of the maximum value in each window of size k as it slides from left to right. What data structure do you use and why?

Sample answer outline

A strong answer describes using a monotonic deque that maintains indices of elements in decreasing order of value. Before processing each element, remove indices from the back that are smaller than the current element (they can never be the window max), and remove the front index if it is out of the current window. The front of the deque is always the index of the window maximum. This gives O(N) time and O(k) space. The candidate should walk through an example with a small array to demonstrate the deque state at each step, and handle edge cases like k greater than the array length.

Reference implementation (python)

Python

from collections import deque
def maxSlidingWindow(nums, k):
    dq, result = deque(), []
    for i, n in enumerate(nums):
        while dq and nums[dq[-1]] <= n:
            dq.pop()
        dq.append(i)
        if dq[0] < i - k + 1:
            dq.popleft()
        if i >= k - 1:
            result.append(nums[dq[0]])
    return result

Expect these follow-ups

How would you find the minimum of each window instead?
Can you solve this with a segment tree and when would that approach be preferable?

company:nvidiadequesliding-windowarraysalgorithms

As asked

An online monitoring service at OpenAI tracks response latencies arriving in real time and needs to report the median latency at any point. Design a data structure that supports addNum(num) to ingest a new latency value and findMedian() to return the current median. Aim for O(log n) per addNum and O(1) per findMedian.

Sample answer outline

The candidate should reach for two heaps: a max-heap for the lower half and a min-heap for the upper half. After each insertion, rebalance so the heaps differ in size by at most 1. findMedian either returns the top of the larger heap (odd total) or averages the two tops (even total). They should be careful about the heap invariant when rebalancing: always push to the intended heap, then pop back if the new element violates the partition.

Reference implementation (python)

Python

import heapq

class MedianFinder:
    def __init__(self):
        self.lo = []  # max-heap (negate values)
        self.hi = []  # min-heap

    def addNum(self, num: int) -> None:
        heapq.heappush(self.lo, -num)
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return -self.lo[0]
        return (-self.lo[0] + self.hi[0]) / 2

Expect these follow-ups

What if the data stream can contain duplicates?
How would you adapt this to find the k-th percentile instead of the median?

company:openaiheapdata-structuresstream-processingmedianalgorithms

Practise these patterns on AlgoExpert

Recommended

200+ video-explained coding interview questions organised by the patterns covered on this page, with timed practice and solution walkthroughs.

Start practising

An external resource we recommend. AlgoExpert is not affiliated with us and we earn nothing from this link.

Tools to sharpen your prep

All tools

class Node: def __init__(self, key=0, val=0): self.key, self.val = key, val self.prev = self.next = None class LRUCache: def __init__(self, capacity: int): self.cap = capacity self.map = {} self.head, self.tail = Node(), Node() self.head.next = self.tail self.tail.prev = self.head def _remove(self, node): node.prev.next = node.next node.next.prev = node.prev def _insert(self, node): node.next = self.head.next node.prev = self.head self.head.next.prev = node self.head.next = node def get(self, key: int) -> int: if key not in self.map: return -1 self._remove(self.map[key]) self._insert(self.map[key]) return self.map[key].val def put(self, key: int, value: int) -> None: if key in self.map: self._remove(self.map[key]) self.map[key] = Node(key, value) self._insert(self.map[key]) if len(self.map) > self.cap: lru = self.tail.prev self._remove(lru) del self.map[lru.key]

class Node: def __init__(self, key, val): self.key, self.val = key, val self.prev = self.next = None class LRUCache: def __init__(self, capacity): self.cap = capacity self.cache = {} self.head, self.tail = Node(0,0), Node(0,0) self.head.next = self.tail self.tail.prev = self.head def _remove(self, node): node.prev.next = node.next node.next.prev = node.prev def _insert_front(self, node): node.next = self.head.next node.prev = self.head self.head.next.prev = node self.head.next = node

class LRUCache: def __init__(self, capacity): self.cap = capacity self.cache = {} # key -> node self.head, self.tail = Node(0,0), Node(0,0) self.head.next, self.tail.prev = self.tail, self.head def get(self, key): if key in self.cache: self._move_to_front(self.cache[key]) return self.cache[key].val return -1 def put(self, key, val): if key in self.cache: self.cache[key].val = val self._move_to_front(self.cache[key]) else: node = Node(key, val) self.cache[key] = node self._insert_front(node) if len(self.cache) > self.cap: lru = self.tail.prev self._remove(lru) del self.cache[lru.key]

from collections import deque def maxSlidingWindow(nums, k): dq, result = deque(), [] for i, n in enumerate(nums): while dq and nums[dq[-1]] <= n: dq.pop() dq.append(i) if dq[0] < i - k + 1: dq.popleft() if i >= k - 1: result.append(nums[dq[0]]) return result

import heapq class MedianFinder: def __init__(self): self.lo = [] # max-heap (negate values) self.hi = [] # min-heap def addNum(self, num: int) -> None: heapq.heappush(self.lo, -num) heapq.heappush(self.hi, -heapq.heappop(self.lo)) if len(self.hi) > len(self.lo): heapq.heappush(self.lo, -heapq.heappop(self.hi)) def findMedian(self) -> float: if len(self.lo) > len(self.hi): return -self.lo[0] return (-self.lo[0] + self.hi[0]) / 2

Questions

Implement an LRU cache with O(1) get and putData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Implement an O(1) LRU cacheData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Design and implement an LRU cacheData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Detect and find the start of a cycle in a linked listData structuresmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Sliding window maximumData structuresmediumCommon

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Find the median of a data stream in O(log n) per insertionData structureshardCommon

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Related questions

Implement an O(1) LRU cache

Design and implement an LRU cache

Detect and find the start of a cycle in a linked list

Sliding window maximum

More ml engineer topics

Tools to sharpen your prep

Questions

Implement an LRU cache with O(1) get and putData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Implement an O(1) LRU cacheData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Design and implement an LRU cacheData structuresmediumVery common

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Detect and find the start of a cycle in a linked listData structuresmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Sliding window maximumData structuresmediumCommon

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Find the median of a data stream in O(log n) per insertionData structureshardCommon

As asked

Sample answer outline

Reference implementation (python)

Expect these follow-ups

Related questions

Implement an O(1) LRU cache

Design and implement an LRU cache

Detect and find the start of a cycle in a linked list

Sliding window maximum

More ml engineer topics

Tools to sharpen your prep