Data Science with OpenStreetMap and Wikidata

Nikolai Janakiev @njanakiev

Outline

Part I: Wikidata and OpenStreetMap

  • Difference between Wikidata and OpenStreetMap
  • Ways to connect data between Wikidata and OpenStreetMap

Part II: Data Science with Wikidata and OSM

  • Libraries and Tools
  • Exhibition of Various Analyses and Results

OpenStreetMap Elements

OSM Elements

Metadata in OpenStreetMap

OSM Key Amenity

OSM Salzburg

Wikidata is a Knowledge Graph

Wikipedia Wikidata Link

Wikidata Data Model

wikidata linked data graph

Querying Wikidata with SPARQL

Wikidata Query

All Windmills in Wikidata

SELECT ?item ?itemLabel ?image ?location ?country ?countryLabel
WHERE {
  ?item wdt:P31 wd:Q38720.
  OPTIONAL { ?item wdt:P18 ?image. }
  OPTIONAL { ?item wdt:P625 ?location. }
  OPTIONAL { ?item wdt:P17 ?country. }
  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
  }
}

Query link

Wikidata Windmills

OpenStreetMap and Wikidata in Numbers

OpenStreetMap (Source)

  • Started 2004
  • Number of users: 5,630,923
  • Number of uploaded GPS points: 7,459,170,764
  • Number of nodes: 5,424,072,098
  • Number of ways: 601,538,972
  • Number of relations: 7,038,670

Wikidata (Source)

  • Started 2012
  • Number of active users: 20,798
  • Number of items: 59,218,423
  • Number of edits: 1,000,545,117

Linking OpenStreetMap with Wikidata?

OpenStreetMap to Wikidata

  • wikidata=* tag (stable)

Wikidata to OpenStreetMap

  • OSM relation ID (P402), in total 97704 entities (unstable)
    Note: Should not be used for Nodes, Ways or Areas

Data Science

Used Tools and Libraries

Python Libraries

  • NumPy - numerical and scientific computing
  • Pandas - data analysis library
  • Matplotlib - 2D plotting library
  • Shapely - analysis and manipulation of GEOS features
  • GeoPandas - Pandas extension for spatial operations and geometric types
  • PySAL - spatial analysis library
  • Datashader - graphics pipeline system for large datasets

OpenStreetMap Elements with Wikidata Tag

wikidata europe osm points

osm europe wikidata

osm europe wikidata lisa clusters

Wikidata Instance of (P31) Property

wikidata europe points

wikidata europe most common instances

wikidata europe companies most common instances

wikidata uk companies most common instances

Analyzing Websites Regionally

websites percentage jquery histogram

websites percentage jquery

websites percentage jquery

Classifying Countries and Regions with OSM

OSM Data Science

Castle Dossier Map of Switzerland

castle dossier map switzerland

Conclusion

  • Naming things is hard, meaningfully categorizing even harder
  • Wikidata can tends to show variations in definitions between countries but tends to be consistent within countries (this hypothesis has not been tested)
  • Know Thy Data: Data Provenance and Completeness is crucial for data analysis and prediction

Data Completeness

OpenStreetMap

Wikidata

Data Science with OpenStreetMap and Wikidata

Nikolai Janakiev @njanakiev

Resources