Crowdsourcing and Mapping: Pieces of the Puzzle
How human perspectives and motivations can build maps
Today's post is sponsored by OpenCage, makers of a highly available, simple to use, worldwide, geocoding API based on open datasets like OpenStreetMap. Test the API, see the docs or the pricing (hint: it's radically cheaper than Google).
Until just after 10:56pm (EST - NYC) on July 20, 1969 no human had seen Earth from the perspective of the moon. Almost 23 years earlier, a captured German V-2 rocket, launched from White Sands Missile Range in New Mexico (you should visit if you haven’t yet), touched the point where the Earth’s atmosphere fades into space and took the first photograph of our planet from space with its onboard 35mm camera. only about 8 months prior to the moon landing, another photo was taken—shown above, “Earthrise”—as the Apollo 8 mission orbited the moon, though it might appear to be a photo from the surface.
The photos only served as evidence of what human eyes were experiencing, and these moments led to a transformation of how we perceive the Earth. We finally got a look, contributing new data to the body of human knowledge and experience that we never before had. Most of us have never been to space and probably (maybe) never will do so, but we all now have the common “Pale Blue Dot” in our mind’s eye.
Earth observation is now done by robots, more or less—unmanned spacecraft (satellites) carrying optical and other spectral sensors, and the photos of the Pale Blue Dot, up close and high resolution, are captured ceaselessly. The major obstacle to getting a clear view of the world is mainly clouds, but even this is solved by imaging with SAR for example. In the end, our planet is quite easy to observe thanks to the explosion of technology since that very first photo was taken by hand.
What is much more difficult is to get a constant optical imprint of our planet from the ground—from a human viewpoint. In one sense I am talk about photos and videos, which could include recording of weather events, geopolitics and war, security camera footage, social events like concerts and festivals, real estate photos, LiDAR scans of architecture and physical geography, and non-visual data like river flows, temperature, seismic events, or wind speed. Much of this data is generated in an inconsistent way—at random time intervals, or only in reaction to a catalyst—and is dispersed at well. We cannot measure the temperature with a ground based thermometer on every square meter of the earth, but instead we use specific weather stations and interpolate the temperature across distances.
While a single satellite or constellation of them can image and measure the earth as a concerted effort, modeling the earth from the surface means the subject matter is much more vast relative to any sensor. Just like before people went to space, on the ground we can still never see the whole system at once, but only pieces. Decentralized observations, while still rarely able to provide a complete view when summed, are the only way to strive toward a complete picture of the earth and its systems, whether built or natural. A puzzle must be constantly assembled, the reassembled. Patterns for this are developed, both for sourcing the data and for combining it and smoothing out the edges and gaps.
Geospatial data can be described in a few ways that relate to this global puzzle. I like to say that this data is:
Interdependent - each puzzle piece of an earth (or local) model needs its neighboring pieces to form complete context
Dispersed - One sensor can rarely capture the entire subject at once, whether across space or time
Symbolic - Maps get structure and clarity by summarizing or representing reality with symbols like color codes, icons, vector graphics, or labels
Symbiotic - Collections of observations can have a network effect, where a large quantity of data points means each individual piece is more valuable due to the others
This is where volunteered geographic information (VGI) comes in. Of course, this term is extremely academic, and more colloquially we call it crowdsourcing, with the specific data type being map data or geolocated data. Michael F. Goodchild’s 2007 publication talks about “citizens as sensors”, and aligns with other terms we might hear like “citizen science” and “participatory mapping”. From a tech industry perspective this can be seen as “user generated content”. Whatever name is used, this is essentially a strategy to model the world from distributed human observations, content generated as a result of voluntary action (even if that means setting up a passive sensor).
This approach to modeling the earth is not unique to the geography and geospatial domain. Economies and markets are another complex system that is truly impossible to observe from a single vantage point, and is constantly changing in a spatial and temporal sense. Friedrich Hayek’s 1945 essay “The Use of Knowledge in Society” describes the very same problem that mapping faces—while no one person can make an observation of the entire system, everyone who is observing brings a unique piece of knowledge toward assembling picture of the whole. He calls it “a problem of the utilization of knowledge which is not given to anyone in its totality”, where the data exists “solely as the dispersed bits of incomplete and frequently contradictory knowledge which all the separate individuals possess”.
Hayek also talks about knowledge, not just economic, as being unorganized. This is very much like the results of crowdsourcing, where different observations like mapping of buildings, photos of earthquake damage, drone imagery of construction sites, and so forth are are contributed from different people, different devices, different perspectives, different data formats, different timestamps. Some is not even contributed or aggregated anyway yet, but could be given the right tools and publicity.
One of Wikipedia’s founders, Jimmy Wales, cites Hayek’s essay as inspiration, particularly the statement that “practically every individual has some advantage over all others because he possesses unique information of which beneficial use might be made.” Building a crowdsourced encyclopedia of facts about the world and its history means drawing from the minds of the whole population, if possible—and constant updating and improvement.
Hayek acknowledges that every observer indeed only has partial knowledge—sometimes more important and advantageous parts than others have, changing with time. He makes one particular conclusion that alludes to crowdsourcing as a powerful way to observe a system: he says there is “need for a process by which knowledge is constantly communicated and acquired”.
The perfect earth observation system, from space, would be constantly acquiring and communicating data to a system that makes sense of it. From the human level, this is also a perfect ideal—but perhaps impossible to achieve.
Waze is an interesting case for striving toward this ideal, crowdsourcing within a specific domain. Waze is not concerned about every aspect of the globe, but instead about traffic on roadways. A community of users has rallied around the app. The primary function of the app—provided by the founders and developers—is navigation, but also a platform to share important information between users. All drivers on the road have a partial knowledge of road and traffic conditions, and always have for decades.
One of the founders of Waze described the app’s goal as allowing a user to sort of peer ahead on the road using a periscope, seeing where obstacles, jams, speed cameras and more were along the route, and to make decisions accordingly. The gain this knowledge, any one user relies on all the others, and the others rely on this user to also share what they observe. The users have a motivation for using Waze, and develop an incentive to both contribute to and use the information.
This two way exchange—contribution and consumption—is sometimes, but not always, the same group. Contributors need tools, a platform, and bring their own motivations, incentives, and passion. Consumers often have a more technical view, prioritizing quality, quantity, time, location, and format of data—and sometimes passion too. You may consider crowdsourcing of birdwatching data, where contributors may also be consumers, but environmental agencies also are consumers, relying on the crowdsourcing to generate data they otherwise do not have capacity to gather, and deriving value based on high quantities of good data, dispersed across areas of interest, and constantly updating to reveal patterns.
Tools and platforms are incredibly big enablers of crowdsourcing. Decades ago, before smartphones and apps, a vast amount of knowledge was essentially dormant. Today, new tools suddenly allow this knowledge to be, as Hayek describes, constantly communicated and acquired. One example is Mapillary and its inspired cohorts like KartaView and Panoramax, which make crowdsourcing of street view possible for much less cost than Google or Microsoft has had sending hired vehicles, while producing open data that benefits various case of industry projects, academic research, or government work.
Strava is another prime example, and has some key differences. In the case of Mapillary, there is a tool centered around street view—the contributors want to create street view, and they sometimes consume it for map editing, while other consumers also look for street view or derived map data. The input and output are very symmetrical. With Strava, users are asked to pay for a premium experience, and their primary motivation is not to create a vast database of open map data, but instead to be organized and get visibility about their own activity, and sometimes share in a social network way to friends. Strava, as a platform, creates an environment for users to pursue and thrive at their goals (like being a better runner).
With Strava, an asymmetry emerges: map data is created not as a primary goal of the contributor, but as a side effect. This data is anonymized and aggregated into heatmaps, and in cases like the screenshot above you can clearly see a map of the ski pistes in the Alps where OpenStreetMap has missing or wrong data.
OpenStreetMap itself is notable because the motivation of users is actually extremely symmetrical, like Mapillary but more—contributors create map data primarily out of a passion for doing it, and encouraged by a shared communal goal of improving the map. Many of the contributors also use OSM in their day to day life, mainly due to passion (although many also use Google Maps, mainly due to practicality). Billions of people are exposed to OpenStreetMap as consumers only—whether in Facebook/Instagram, PokemonGo, Organic Maps, weather apps, surfaces where MapTiler and Mapbox are used—but it seems rare that this visibility is what motivates OSM contributors. Truly, contributors map for love of the map.
In other crowdsourcing cases, like the briefly hot app StreetCred (acquired by Snap), Geolancer, or Hivemapper, users get some kind of financial incentive to generate map content. This can be an important form of crowdsourcing, because it forgoes the organic approach of letting users decide where to map, and realizes that areas where this approach fails to get any data—or to get a precise quality or format—can be more effectively sourced by offering rewards. When a particular area in OSM has not been mapped in 10 years, a small financial incentive with viable marketing reach can get someone to finally update it in detail within hours.
PokemonGo and other Niantic apps also look to leverage different user motivations. Rather than a passion for mapping alone, contributors to the spatial data that Niantic has are often motivated by interest in a game and the community around it, while passively scanning the world around them, or even editing OSM occasionally for reasons linked to game experience. The developer team behind these gamified apps then is able to produce spatial data in an organized way, along with developer tools on the Niantic Lightship platform, to enable use visual positioning system (VPS) and other resources.
In the cases where the contributors rely mainly on tools, and not so much platforms, with symmetric interests for both contributing and consuming the data, the growth of the dataset is often more linear (just by my anecdotal view). More users creates more data, in a sort of 1-for-1 relationship. Platforms like Strava and PokemonGo, however, can often have an asymmetric pattern of data generation, alongside asymmetric motivations for users to generate content versus consume it. This, in my view, is the way to make a parabolic growth of content—because things that are not mapping, such as fitness and gaming, tend to attract far more users who are far more regularly engaged than a smaller, although passion driven and dedicated OSM community.
Furthermore, the growth of the OSM userbase seems to be at a plateau by 2025, which is difficult to change. The number of people who map for the sake of mapping is always small, and always highly effective per capita, although often driven by power users, too. It relies on community structure around the activity, which can mean it is organically very strong somewhere like France and almost nonexistent somewhere like Wyoming (I struggled to start a multi-state community when I was living in my hometown near the Montana-Wyoming border, finding only one user resident for every few hundred kilometers). Surges of interest are stumbled upon at best.
In the past at least one interesting project attempted to replicate the ingenuity of reCaptcha, from an OSM perspective, but it has not materialized at any scale. The reCaptcha approach helps create data about images—often from Google Street View—by having humans validate or label detected entities like traffic lights or bridges or motorcycles. This eventually contributes to map data in some way, as well as computer vision algorithms, and probably multimodal LLMs today. People do not contribute to reCaptcha because they like verifying semantic segmentation of street images but because they are trying to login or signup for some other service, and go through the process as a way to filter out bots. This is incredibly asymmetric: the motivations of contributors and consumers are very different, but also there is a massively outsized contribution to the data (potentially billions of contributors) considering is very specific consumer case (a few dozen research engineers building and improving models).
Something like reCaptcha—maybe not a close copy of its UX, but rather something using its asymmetric strategy—could fundamentally accelerate the growth of OSM, but also by departing from the traditional passion of the community as the fuel for growth. Perhaps it is something that is altogether better suited for a project like Overture Maps, that builds on top of and around OSM, with various other motivations.
Crowdsourcing can take many approaches, and will certainly continue to. Geospatial data is particularly able to benefit from crowdsourcing because of its nature of being distributed over space and time, and because the complete picture of any global or local model often is never complete, always needing to be refreshed to reflect change over time. Crowdsourcing will likely only continue to grow as a data source, because it offers a large opportunity with more people having access to sensor devices—mobile phones and other gadgets—as well as because of the constraints of other tools that can never achieve full knowledge of the world in great detail with a single group of centralized sensors.
Finding new data types, new communities and motivations, new ways to create incentives—all these are ways that crowdsourcing for map data will evolve. Technological change may enable platforms to be bigger and better, or even smaller but more specific, while data capture tools can get more powerful and accessible (like LiDAR in iPhones), and new social trends often create new incentives and motivations. Creating niche tools to crowdsource specific pieces of data is a work of art on its own, but often favors passion over scale, while accidental discoveries of asymmetric relationships, where passion for something outside the domain of mapping can build huge community contributions that flow back into maps, remains as one of the most compelling opportunities.
Whatever happens, all of us in the geospatial community have roles to play in improving maps via crowdsourcing. For some of us, it means participating in communities, starting new groups, growing the contributor membership, or publicizing group activities to attract the interest of newcomers young and old. For others, it means building platforms and tools with code, design, and great vision. And finally, people from different walks of life, different companies and professions, different communities and interest groups, can forge alliances to help create jobs, products, open data, gatherings, and shared benefit by finding the intersections between different wants, needs, motivations, and problems to solve.
For all the biggest problems in missing data or impossible scaling challenges, it is always worth consider how crowdsourcing, particularly with asymmetric nature, can change the game.