Semantics for Big Data

Semantics for Big Data

AAAI 2013 Fall Symposium; Westin Arlington Gateway in Arlington, Virginia, November 15-17, 2013.

Workshop Description and Scope

One of the key challenges in making use of Big Data lies in finding ways of dealing with heterogeneity, diversity, and complexity of the data, while its volume and velocity forbid solutions available for smaller datasets as based, e.g., on manual curation or manual integration of data. Semantic Web Technologies are meant to deal with these issues, and indeed since the advent of Linked Data a few years ago, they have become central to mainstream Semantic Web research and development. We can easily understand Linked Data as being a part of the greater Big Data landscape, as many of the challenges are the same. The linking component of Linked Data, however, puts an additional focus on the integration and conflation of data across multiple sources.

Workshop Topics

In this symposium, we will explore the many opportunities and challenges arising from transferring and adapting Semantic Web Technologies to the Big Data quest. Topics of interest focus explicitly on the interplay of Semantics and Big Data, and include:

Symposium Format, Submissions, and Proceedings

The symposium will be highly interactive with spotlight presentations and small breakout groups interleaved with plenary sessions for reports on the breakout groups and for consolidation of results. To prime and channel discussions and group activities during the event, we call for the submission of position papers or extended abstracts of 2-4 pages, or of technical papers of 6-8 pages (in AAAI format). Paper talks will be 20min + 2-3 min for questions. Please address questions to Pascal Hitzler at pascal.hitzler{at}

Submissions shall be made through easychair at by May 24th, 2013.

Keynote Speakers

Michel Dumontier
Title: Generating Biomedical Hypotheses Using Semantic Web Technologies
Abstract: With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.

Peter Fox
Title: Geosemantics for weird data; mediation, integration, heterogeneity and vocabularies
Abstract: Geosciences are in the Big Data era. Earth as-a-system, is being observed with greater resolution, frequency, mode, and across disciplines (bio, geo, chem, ...). As a result the demand for discovery of relevant data from unknown sources (i.e. not the usual data portal) is increasing at a rate that is both frustrating researchers, and confounding data providers. The latter cannot keep up with the diversity of audiences that seek their data. The researchers want to discover, explore, access and integrate a variety of datasets. Yes, this is a job for semantics - but one with a distinct set of characteristics that geoscience engenders. This talk will frame the problem statement from the (at least) two viewpoints; producers and consumers, and then indicate the key curation role that geosemantics is playing in bridging/ mediating the two communities. Examples from several geo-communities wil lindicate the current state of development and future needs; of geoscientist and computer science developments.

Jennifer Golbeck
Title: Computing trust and building trust with users'social media data
Abstract: The data that users provide through social media provides a wealth of opportunities for computer scientists to infer information about users and their relationships, including trust. We have developed remarkably successful models for these kinds of inferences. At the same time, we have found in our research that users have very poor awareness of how their data is being shared and used. In this talk, I will discuss some of the computational models that we have for inferring trust, and also discuss steps that we should take - as big linked data becomes more common - to gain users' trust and consent as we take advantage of their personal information.

Preliminary Program

All paper talks will be 20min + 2-3 min for questions.

Friday November 15

9:00 AM - 9:30 AM: Introduction
9:30 AM - 10:30 AM: Keynote 1 by Jennifer Golbeck  on Computing trust and building trust with users'social media data
10:30 AM - 11:00 AM  Coffee Break
11:00 AM -12:30 PM Session 1: Crowdsourcing & Cognition (Chair: J. Hendler)
12:30 PM - 2:00 PM: Lunch 
2:00 PM - 2:45 PM Session 2:  Integration (Chair: P. Hitzler)
2:45 PM - 3:30 PM Panel 1: The human in the loop (speakers of sessions 1 & 2 and keynote speaker as panelists)  (Chair: T. Narock)
3:30 PM - 4:00 PM: Coffee Break 
4:00 PM - 5:00 PM: Keynote 2 by Peter Fox on Geosemantics for weird data; mediation, integration, heterogeneity and vocabularies
5:00 PM - 5:30 PM: Open discussion: Approaching domain scientists -- lessons learned from interdisciplinary research (Chair: K. Janowicz)
6:00 PM - 7:00 PM: Reception (by AAAI)

Saturday, November 16

9:00 AM -10:30 AM Session 3: Scale (Chair: F. Van Harmelen)
10:30 AM - 11:00 AM:  Coffee Break
11:00 AM -12:00 PM: Keynote 3 Michel Dumontier on Generating Biomedical Hypotheses Using Semantic Web Technologies
12:00 PM - 12:30 PM: Joint discussion session with Discovery Informatics (Chair: N. Villanueva-Rosales)
12:30 PM - 2:00 PM: Lunch
2:00 PM - 3:30 PM: Breakout groups (raw notes from the volume group and from the variety group)
3:30 PM - 4:00 PM: Coffee Break 
4:00 PM - 4:45 AM: Session 4: Evaluation & Testbed   (Chair: P. Hitzler)
4:45 PM - 5:15 PM: Report from breakout groups   
6:00 PM - 7:30 PM: Plenary session  (by AAAI); Charles Vardeman as S4BD speaker.

Sunday, November 17

9:00 AM -10:30 AM: Session 5: Deduction & Induction   (Chair: F. Van Harmelen);
10:30 AM - 11:00 AM:  Coffee Break 
11:00 AM - 11:45 PM: Where is the sweet spot for ontologies? Discussion (Chair: K. Janowicz)
11:45 PM - 12:30 PM: Panel 2 by Helen Lippell and Andre Freitas on Managing Data variety in a Context of Heterogeneous and Unstructured Data

Accepted Papers

Important Dates

Submission due: May 24, 2013

Acceptance Notification: June 21, 2013

Camera-ready Copies: June 28, 2013

Symposium: November 15-17, 2013


(in alphabetic order)

Programme Committee


Please feel free to contact the organizers for further questions at jano @ geog . ucsb. edu.