Targeted at operations without adequate global navigation satellite system signals, simultaneous localization and mapping (SLAM) has been widely applied in robotics and navigation. Using data crowdsourced by cameras, collaborative SLAM presents a more appealing solution than SLAM in terms of mapping speed, localization accuracy, and map reuse. To bridge the gap of real-time collaborative SLAM using forward-looking cameras, this paper presents a framework of a client-server structure with attributes: (1) Multiple users can localize within and extend a map merged from maps of individual users; (2) The map size grows only when a new area is explored; (3) A robust stepwise pose graph optimization technique is used. These attributes are validated with real world KITTI benchmark and datasets crowdsourced by smartphones. It is shown that even a server hosted on a consumer-grade computer could process messages coming concurrently from several clients in real time and create compact and accurate maps.