As asked
Design the OTP supervision tree for a Phoenix service that handles chat rooms, user presence, and outbound webhooks. What processes exist and what restarts when something fails?
Sample answer outline
A strong answer starts with process boundaries, not modules. Put room processes under a DynamicSupervisor so rooms can start and stop independently, and isolate webhook delivery workers so a bad third-party endpoint cannot take down chat state. Choose restart strategies deliberately: one_for_one for independent room workers, rest_for_one only where later children depend on earlier ones. Presence should be treated as derived state that can be rebuilt, while durable chat messages belong in the database. Candidates often trip up by making one GenServer own too much state or by restarting a whole subtree for a local failure.
Expect these follow-ups
- When would you use Registry versus pg for finding room processes?
- How do you avoid a thundering restart loop after a deploy?
- What state is safe to keep only in a process?