Big Data in Geographic Information Science Panel 2012

In conjunction with the Geographic Vocabularies Camp Santa Barbara 2012 (GeoVoCampSB2012).

Presented by the Reginald Golledge Distinguished Lectureship in Geography and spatial@UCSB.

University of California, Santa Barbara, 3rd February 2012, 4:30pm-6:00, 1930 Buchanan Hall.


The rapidly increasing information universe with new data created at a speed surpassing our capacities to store it, calls for improved methods to retrieve, filter, integrate, and share data. The vision of a Big Data science hopes that the open availability of data with a higher spatial, temporal, and thematic resolution will enable us to better address complex scientific and social questions. However, on the downside, understanding, sharing, and reusing these data becomes more challenging. Big Data is not only big because it involves a huge amount of data, but also because of the high-dimensionality and inter-linkage of the involved data sets. The on-the-fly integration of heterogeneous data from various sources has been named one of the frontiers of Digital Earth research, Bioinformatics, the Digital Humanities, and other emerging research visions. The panel will discuss which role GIScience plays in the Big Data age. We hope to identify the research trends and major challenges behind the buzzword. Big Data is a big topic, instead of technical issues, e.g., addressed by Hadoop, the panel will focus on the problem of geographic data integration.


The Three V's of Big Geo Data

Big Data is often characterized by three V's, volume, variety, velocity.

  1. Volume: The volume component of Big Data refers to the size of the involved data sets as well as their inter-linkage which creates a global graph of linked data. For GIScience such high-volume data sources include Volunteered Geographic Information, Location-based Social Networks, Smart Dust and sensor networks in general, high resolution remote sensing data, complex transportation simulations, historical records, data made public by the government, and so forth. We are producing more data than can be stored. How do we mine for relevant patterns and reduce the amount of information we need to keep?
  2. Variety: The number of sources and type of data is increasing as well. Combining social media with authoritative sources and integrating different formats such as video, audio, photo, and plain text allows a more holistic analysis but raises new issues in data integration. Semantic interoperability, i.e., the meaningful combination of data from highly heterogeneous data sources, is the core challenge in creating a interlinked, global graph of data that can feed complex simulations and question answering systems such as a Watson-like Digital Earth.
  3. Velocity: Big Data is not only about large amounts of data but also the speed at which data is created and updated. A rapidly increasing number of data sources deliver near real-time data which poses new challenges for stream reasoning and rule systems. This higher temporal resolution also calls for faster processing circles, i.e., a reduced time to analyze the data and filter out relevant patterns.

Summary of the Panel

We will update this section over time

Related Activities

Please feel free to contact the organizers for further questions at jano @ geog . ucsb. edu.