The text below is from a recent paper on which we've been working:
McKenzie, G., Janowicz, K., Gao, S., Yang, JA., Hu, Y., (2014: Under Review) POI Pulse: A Multi-Granular, Semantic Signatures-Based Approach for the Interactive Visualization of Big Geosocial Data.
The POI Pulse application in action: http://stko-poi.geog.ucsb.edu/prod/lapulse (please use Chromium or Firefox)
The age of Big Data promises to offer access to a plethora of data at a spatial, temporal, and thematic resolution unthinkable just a few years ago. This data revolution is accompanied by the emerging 4th paradigm of science in which synthesis is the new analysis. Those changed realities cast off visions of information observatories in which complex systems, such as urban spaces, could be observed and better understood based on exploiting the variety, volume, and velocity of Big Data. Those, however, who tried to explore these new possibilities often encountered equally big challenges. First, major parts of Big Data still reside in closed proprietary silos with limited API access. Second, the metadata, e.g., provenance, and conceptual schemata required for any serious use by scholars are often not present, intransparent, or differ substantially to those established in science. Finally, the sheer volume and velocity makes interacting with or even just visualizing the data difficult to say the least.
For many of us, an information observatory for urban spaces in which user- generated real-time content reveals spatial, temporal, and thematic patterns and traits of human behavior, is a tempting idea as it aligns well with the Digital Earth vision. Consequently, a posting on Foursquare’s infographics blog in October 2013 raised a lot of attention. It linked to a series of videos showing the pulse of different cities such as San Francisco. The animations were entirely derived from mining massive amounts of user check-ins to the Foursquare Location-based Social Network and were aggregated to a single virtual day; see Figure 1a.
(a) The Foursquare video (b) POI Pulse interactive visualization
Fig. 1: The pre-generated video (a) and the interactive POI Pulse system (b).
While the visualization itself is absolutely stunning, the Foursquare videos have several shortcomings: (I) The videos are not interactive, e.g., one cannot click at any of check- in events or places to gain additional insights. (II) The videos are rendered based on a fixed geographic scale and focused on a particular part of the city. Thus, one cannot pan or zoom. (III) The millions of check-ins are aggregated to a single non-specific day, thus hiding well known patterns, e.g., weekdays versus weekends. (IV) Foursquare’s POI taxonomy consists of more than 400 POI types grouped into 9 top-level classes (see Figure 1a). While such generalized classes are necessary and useful, it is not clear how they were derived nor why certain POI types are categorized in specific ways. Furthermore, a binary class membership on such a coarse level will necessarily introduce arbitrary decisions and thus will significantly alter the observed temporal pulse of the city. For instance, Cemeteries are categorized under the Great Outdoors category. (V) Similar to other UGC, Foursquare contains data of widely varying quality. For instance, users often type their own houses as Castle or check-in to features of the types Road, Trail, or Taxi. While this is a consequence of UGC, it is important to clean the data. Inspired by Foursquare’s pulse videos and the theoretical and technical limitations of interacting and visualizing Big Data, we decided to address the aforementioned restrictions by designing a POI Pulse portal for Los Angeles; 4 see Figure 1b. Naturally, as scientists we are more interested in those theoretical and technical aspects than the application as such, but we will use it as the joint leitmotiv that connects the following research questions which make up the scientific contribution of this work:
R1: Given the >400 POI type defined by Foursquare users, is it possible to derive an alternative top-level classification that is informed by existing and well established POI schemata (e.g., defined by Ordnance Survey) and still true to the original Foursquare data and user-behavior?
R3: Given the legal API limits of closed data silos such as Foursquare, can we generalize check-ins, individual POI, and their attributes, e.g., tips, to a type-level default behavior that allows us to model the pulse of a city with minimal data requirement? Is it possible to seamlessly switch to a real-time, burst mode at zoom scales that do not exceed the daily API limits and thus also give access to real time data?
R4: Can we improve on the Foursquare baseline by offering a pulse for all hours of the full week instead of a single day? Can we show binary upper-level categories but seamlessly switch to a more nuanced view at a reduced zoom level to show a probabilistic category membership?
In the following, we present a multi-granular, data-driven, and theory-informed approach that addresses these research questions by introducing the theoretical and technical framework to interactively explore the pulse of a city based on social media.