As asked
Databricks catalog and metadata services often cache recently accessed entries to avoid repeated lookups. Implement a least-recently-used cache that supports get(key) returning -1 if not found, and put(key, value) which inserts or updates a key and evicts the LRU entry when the cache is at capacity. Both operations must run in O(1) time.
Sample answer outline
The canonical solution uses a hash map for O(1) key lookup combined with a doubly linked list to track recency order. get moves the accessed node to the head. put adds a new node to the head and removes the tail node if over capacity. Candidates should implement the linked list manually (not rely on Python's OrderedDict unless they explain it) and handle edge cases like capacity 1 and updating an existing key.
Reference implementation (python)
class Node:
def __init__(self, key=0, val=0):
self.key, self.val = key, val
self.prev = self.next = None
class LRUCache:
def __init__(self, capacity: int):
self.cap = capacity
self.map = {}
self.head, self.tail = Node(), Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _insert(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def get(self, key: int) -> int:
if key not in self.map:
return -1
self._remove(self.map[key])
self._insert(self.map[key])
return self.map[key].val
def put(self, key: int, value: int) -> None:
if key in self.map:
self._remove(self.map[key])
self.map[key] = Node(key, value)
self._insert(self.map[key])
if len(self.map) > self.cap:
lru = self.tail.prev
self._remove(lru)
del self.map[lru.key]Expect these follow-ups
- How would you make this cache thread-safe in a multi-threaded Python environment?
- How would you extend this to an LFU (least-frequently-used) cache?