This is the documentation for Urban Analyst platform (UA), providing open source analyses of urban structure and function across the world. The source for this documentation can be found in this GitHub repository.
Urban Analyst provides interactive maps of the properties of cities, including socio-demographic conditions and the structure and function of transport systems. Each property is measured in terms of a "variable". Relationships between individual variables are also analysed and presented, such as between socio-demographic conditions and frequency of transport services, or between distances to nearest schools and access to natural spaces.
UA also provides statistical summaries of all cities, enabling relationships between any pair of variables, such as transport and socio-demographic disadvantage, to be compared across all UA cities.
Urban Analyst present a variety of statistics for each city analysed, as well as relationships between these statistics. Values for each statistic are derived at every street intersection in each city. These values are then aggregated into the polygons shown in the "Maps" page, and across entire cities for the values shown in the "Stats" page. Aggregations are always weighted by local population densities, to generate values representing equivalent values per person as experienced in each city.
The values presented in Urban Analyst represent the first truly comprehensive routing analyses for each city, derived from estimates of travel times from every point in each city to every other point using any combination of possible modes of transport. The following table summarises numbers of street intersections, public transport ("PT") stops, and calculations for current Urban Analyst cities.
|PT stops||PT calcs|
One way to appreciate the scale of these calculations is through comparison with commercial alternatives. One service, traveltime.com, charges a flat subscription fee of €540 per month for a maximum of 60 requests per minute. That rate would permit 31.5 million queries per year. The city of Hamburg, for example, would then take almost 2,000 years to calculate, and would cost €12 million. Google also offers a commercial routing service, limited to a maximum of 500,000 queries per month, for a total price of US$2,000. At that rate, the analyses for Hamburg would cost US$224 million.
The results presented in Urban Analyst are simply not possible using commercial tools, or indeed any other open source tools. These analyses truly are uniquely powerful, and provide a depth of insight into how people move through cities not available in any other way.
Not directly, but feel free to open a GitHub issue to start a discussion about requesting full data sources.
This documentation includes the following five chapters:
- This introduction
- "Example": A walk-through example comparison between Berlin, Germany and Paris, France, illustrating the kinds of comparisons enabled by the Urban Analyst platform.
- "UTA Variables": Providing descriptions of all variables included in the Urban Analyst platform.
- "Data Sources": Providing descriptions of all data sources used to derive these variables.
- "Software and Algorithms": Providing descriptions of, and links to, all software used to generate the UTA variables.
Contributions to, or questions regarding, this documentation, are welcome at this GitHub repository.
This chapter demonstrates most of the capabilities of the Urban Analyst platform through exploring comparisons between the cities of Paris, France, and Berlin, Germany. It is important to remember throughout that lower values in all UTA statistics are always better. Values are also weighted by local population densities. This is important because, for example, public transport systems should be constructed to offer the fastest services to the areas where most people live. Not implementing this weighting would, in contrast, leave measures in some form of times per unit area, so that for example travel times from unpopulated parts of a city would be weighted equally to times from densely populated parts. Weighting travel times, and all other UTA variables, by population density converts them to values as experienced on average by each person in a city.
The comparisons in this chapter between Paris and Berlin are mostly drawn from the "Stats" page, which provides overviews of entire cities, and comparisons with all other UTA cities. The "Maps" page can then be used to examine the actual spatial distributions of particular variables or relationships within any given city.
This comparison starts by stepping through each variable to describe kinds of information able to be extracted, before examining pairwise relationships between these variables, and concluding with a general summary. The Urban Analyst platform currently measures 11 variables, along with strengths of relationship between all paired combinations of these. This amounts to 11 * (11 - 1) / 2 = 55 pairwise combinations. Strengths of relationship are standardised, so are comparable throughout between all pairs of variables, and between different cities.
The following table summarises the values of the individual variables for each city (each measured on its own distinct scale).
|Times (abs; min)||40.9||39.5|
|School Dist (m)||338||186|
The "Transport Absolute" variable measures the absolute time (in minutes) required to travel a distance of 10km from each point in a city, using any combination of travel modes except private automobile, including walking, bicycling, and any available public transport options. Travelling 10km in Paris takes under 40 minutes on average, while equivalent journeys in Berlin require almost 1.5 minutes more.
The "Transport Relative" values divide the absolute travel times described above by times for equivalent journeys with private automobile. Ratios of one imply automobile times equal to multi-modal times; ratios of less than one imply that multi-modal transport is faster than private automobile. Paris and Berlin both have comparably low values for this ratio, implying relatively fast multi-modal transport, with Paris notably faster than Berlin. This is likely influenced by Paris's recent introduction of a uniform maximum speed limits of 30km/hour through the city, whereas Berlin features a number of "autobahns" with much higher speed limits.
(Note that travel times with private automobiles include estimates of times required to park a vehicle and ultimately walk to any desired destination. Vehicular times calculated here are thus notably longer than with most commercial routing engines, which give vehicular travel times only, and ignore the critical need to park a vehicle and walk to a destination.)
All individual variables also enable comparison in terms of "Variation", rather than "Average" values. Comparing these reveals that Berlin generally has markedly lower variation than Paris. A comparison of these statistics on the "Maps" page reveals that this is largely because Paris is simply much larger than Berlin, and the ranges of both absolute and relative transport times are correspondingly greater. The fact that relative transport in Paris is still better on average than in Berlin is thus even more impressive considering this stark difference in scale.
Travelling in Paris requires notably greater numbers of transfers to travel equivalent distances than Berlin. The values in the "Number of Transfers" layer are for journeys of 10km total distance (including walking or cycling distances at either end). Travelling in Paris requires > 50% more transfers than journeys in Berlin.
The fourth transport variable, "Interval", measures the time to wait (in minutes) until the next equivalent service. Intervals in Paris are slightly under 5 minutes, whereas values in Berlin are just under 7 minutes.
Finally, the "Compound Transport" variable simply multiplies absolute travel times by intervals between services. Low values of this statistic reflect fast and frequent transport. This statistic also indicates considerably superior service in Paris compared with Berlin.
- "School Distance": Paris has notably shorter distances from each point in the city to the nearest school than does Berlin.
- "Bicycle Index": Paris has very notably better bicycle infrastructure than Berlin. This index is simply one minus the average portion of all bicycle journeys out to 5km from each point which may be taken on dedicated bicycle infrastructure. Around one quarter of all bicycle journeys in Paris may be taken along dedicated bicycle ways, compared to less than one fifth in Berlin. Moreover, comparing the maps for this variable reveal that the bicycle infrastructure in Berlin generally improves with distance away from the city centre, whereas Paris has the best bicycle infrastructure concentrated towards the centre of the city.
- "Nature Index": Access to natural spaces in the two cities is effectively the opposite of the bicycle index. Paris provides relatively little access to natural spaces for anybody not close to one of the two huge parks in the city, whereas Berlin provides an abundance of generally smaller natural spaces dispersed throughout the city. Note that natural space access includes walks alongside water bodies. Both cities include dominant rivers, yet Berlin also provides greater pedestrian access to the banks of its rivers and canals. Comparison of this layer on the maps reveals the comparably greater access in Berlin to walking paths alongside canals and rivers, whereas most of the Seine in Paris is effectively inaccessible to pedestrians.
- "Parking": Finally, both Paris and Berlin offer relatively little opportunity to park private automobiles compared with the other UTA cities, with Berlin notably less than Paris.
This section considers relationships between each individual variable and all other variables. All strengths of relationship shown in the "Stats" page are assessed in standardised ways, so they may be directly compared between cities. Moreover, the scales shown in the "Stats" page may also be directly compared. Values of one or greater indicate very strong relationships, whereas values less than 0.1 or so indicate weak relationships, and values less than around 0.01 should generally be interpret to indicate no relationship. Pairs of variables with very weak or negligible strengths of relationship are generally not interpreted in the following sub-sections.
The following table summarises the values of the strongest pairwise relationships for each city:
|Times (abs)||Pop. Dens.||-0.15||-0.11|
|Times (abs)||School dist||0.12||0.06|
This sub-section only considers transport times, both in absolute and relative sense. The other transport variables, of intervals and numbers of transfers, generally follow similar patterns and are not explicitly considered here. Relative transport times are only very weakly related to most other variables. In contrast, absolute transport times are strongly related to most other variables.
Relative transport times are negligibly associated with population densities, while absolute times are particularly strongly and negatively correlated. These negative relationships indicate that faster transport is associated with higher population densities, more so in Berlin than Paris.
Slightly weaker relationships are manifest between absolute travel times and distances to nearest schools. Relationships in both Berlin and Paris are positive, indicating that fast public transport is positively associated with shorter distances to schools, with the relationship about twice as strong in Berlin as in Paris.
Travel times are very strongly, and positively, correlated with bicycle infrastructure, indicating faster travel times in regions with better bicycle infrastructure. This relationship is much stronger in Paris than in Berlin, for reasons easy to discern by looking at the maps of Berlin for these two variables. Bicycle infrastructure there is much better in the periphery of the city, whereas transport times exhibit more of a systematic discrepancy between the east (fast) and west (slow) portions of the city. In Paris, in contrast, faster transport times and better bicycle infrastructure are both concentrated more towards the centre of the city.
Relationships between transport times and the index of accessibility to natural spaces are also very strong, and negative. This means that faster transport times are associated with lower accessibility to natural spaces, as might be generally expected of most high-density cities. The relationship is stronger in Berlin than Paris, indicating that faster transport times are most strongly associated with poorer access to natural spaces there than in Paris.
Finally, absolute transport times are slightly negatively associated with numbers of automobile parking spaces in Paris, whereas there is no relationship in Berlin. This negative relationship indicates that regions with faster public transport also tend to have more automobile parking spaces, reflecting planning decisions that associate use of public transport with the driving of private automobiles. No such relationship appears to exist in Berlin.
Shorter school distances are positively associated with the bicycle index in Paris, indicating a positive association between good bicycle infrastructure and short distances to schools. Berlin manifests no such relationship, likely for reasons described above, that bicycle infrastructure in Berlin is generally more peripheral than in Paris.
Although much weaker, relationships between schools distances and the index of accessibility to natural spaces are negative, indicating that locations closer to schools are further from nature, and more so in Berlin than in Paris.
Finally, the social variables are more strongly related to all other non-transport variables in Paris than in Berlin, except for with the index of bicycle infrastructure. This variable is more strongly, and positively, correlated with the social indicator in Berlin than in Paris, where the relationship is negative. The positive relationship in Berlin indicates that the provision of bicycle infrastructure is positively associated with social advantage, an effect again readily seen in examining the map of Berlin. In contrast, Paris is more effective in providing bicycle infrastructure in areas of relative social disadvantage.
Paris also seems to be more effective in educational provision in areas of social disadvantage, with the strong negative correlation indicating that socially disadvantaged Parisians generally have to travel shorter distances to schools. Although this relationship is also negative in Berlin, it is much weaker.
In contrast, Paris's very strong and positive relationship between social advantage and access to natural spaces indicates the relatively far greater difficulty experienced by less socially advantaged Parisians in accessing natural spaces compared with equivalent inhabitants of Berlin.
Finally, Paris manifests a very strong and negative association between social advantage and numbers of automobile parking spaces, indicating that low social disadvantage is strongly associated with high numbers of automobile parking spaces, or conversely that socially disadvantaged parts of the city offer relatively few automobile parking spaces. The relationship in Berlin is, in contrast, slightly positive.
Paris's transport system is considerably faster and more frequent. Nevertheless, it also involves greater numbers of transfers, suggesting that any attempt to improve the system in Berlin should take care to avoid inadvertently increasing numbers of transfers.
Berlin's average relative speed is also notably higher than Paris's, and at 1.09 likely too high to effectively discourage large numbers of people from opting to travel via private automobile. Examination of the map of relative travel times clearly reveals the effect of the connected ring out autobahns encircling the city. While reducing speeds on these carriageways may not be feasible, a uniform 30km/hour limit as introduced in Paris may nevertheless significantly reduce this ratio, and further incentivise many more people to opt for public transport rather than private automobile.
Although Paris is a far larger city, its average population density is nevertheless very similar to Berlin's. It is then even more striking that Paris offers considerably shorter average distances to schools than Berlin. School distances in Berlin are also only weakly correlated with social conditions, whereas average distances to schools in Paris are shorter in less socially advantaged areas. Both of these factors indicate a need in Berlin for more provision of local schooling in general, and particularly in socially disadvantaged regions, if it is to match the educational opportunities provided in Paris.
Paris's bicycle infrastructure is considerably better than Berlin's, and perhaps even more importantly, becomes better towards the inner city regions. In contrast, Berlin really only offers good bicycle infrastructure in the relatively peripheral, and more affluent, outer regions. Berlin really needs to proactively focus on improving bicycle infrastructure in the inner city regions.
Berlin is fortunately greatly enhanced by an abundance of natural space, including access to the city's rivers and canals, and access to these natural spaces is only weakly related to social advantage. This provides robust evidence for Berlin to appreciate its natural spaces, and to ensure that they remain accessible for everybody.
Paris's transport system is notably better than Berlin's in almost all ways except for the number of transfers necessary to travel equivalent distances. This difference is especially notable given that Paris is much larger than Berlin. Improvements to Paris's public transport system should focus on decreasing numbers of transfers.
Paris's average relative speed is very close to the "magical" value of one, at which point private automobiles are no faster than multi-modal transport including walking and cycling.
Paris has done a great job of providing bicycle infrastructure in the inner city regions, and notably of proactively enhancing or creating bicycle infrastructure in regions of social disadvantage.
Contrasts with Berlin nevertheless emphasise a couple of aspects which Paris could focus on improving. The most notable of these is the index of accessibility to natural spaces, and the relationship of this to other variables. Paris simply has far less natural space than Berlin, and much poorer general accessibility. Moreover, access to natural spaces is positively associated with social advantage, so that it is relatively difficult for socially disadvantaged Parisians to access natural spaces.
All UA variables are measured such that lower or negative values are good, whereas higher values are bad. (Exceptions are neutral variables such as population density which are nevertheless straightforward to interpret.) For example, unemployment or transport times are both better when values are lower. Indices of bicycle infrastructure and access to natural spaces are also transformed so that lower values indicate better or more of either. In these cases, the transformations are simply one minus the respective proportions of journeys out to fixed distance travelled along bicycle infrastructure, or through or alongside natural spaces. Values of 0 then reflect 100% of all journeys spent on bicycle infrastructure or in natural spaces, while values of 1 would represent complete absence of either bicycle infrastructure or natural spaces.
Variables are measured for every way, path, or street intersection within each city. Values for the maps are aggregated within polygons defined by the Socio-demographic variables described in the following sub-section, while values within the statistics page are aggregated across entire cities. Unless explicitly described otherwise, values of all variables are weighted by population density. This means that, for example, distances to nearest schools represent average distances that each person must travel to get to school. Full descriptions of the calculation of all variables are given in the Software and Algorithms chapter.
The extent and structure of each city is defined by its "socio-demographic variable," or "social variable" for short. These are taken from open-source datasets provided by the cities as a series of geographic areas, defined as polygonal shapes, and some corresponding measure of socio-demographic disadvantage. The cities themselves decide the resolution and extent of these polygonal data. These polygons then define the extent and shape of cities analysed in Urban Analyst, and the individual polygons into which the map data are aggregated.
Values of these socio-demographic variables are the only aspect that differs between different cities. One of the simplest versions is unemployment rate, generally measured either as a percentage (0-100), or a proportion (0-1). The UA platform selects the most representative measure of general social disadvantage provided by each city, and defaults to rates of unemployment only where no more comprehensive of integrative measures are openly provided by cities.
Urban Analyst provides highly detailed statistics on transport systems. Many of these are derived from estimates of times required to travel fixed distances of 10km. This value is chosen to capture the general efficiency of public transport systems. Shorter distances do not sufficiently capture the influence of transport modes such as express or long-distance train services, while longer distances unfairly penalise smaller cities in which most journeys are only of shorter distances.
Values at this distance of 10km are obtained by following these three steps, taking the example statistic of travel times:
- Calculate total travel times from all street intersections to all other street intersections within a city.
- Calculate a straight line of "best fit" (a "least squares regression" line) which describes how travel times vary with distance.
- Use that line to obtain the "average" value of travel time at the distance of 10km.
Values shown in the maps are aggregated within each polygon of a chosen city, while values shown in the statistics page are aggregated over entire cities.
The remainder of this section describes the five travel variables:
- Absolute travel times
- Relative travel times
- Numbers of transfers
- Intervals between consecutive services
- Compound travel statistic
Urban Analyst enables comparisons of travel times between two primary modes of transport:
Private Automobile. Travel times with private automobile are used as a benchmark for measures of travel time using other modes. UA generates realistic estimates of private automobile travel times through scaling to empirically observed data on actual vehicular travel times. (Calibration procedures are implemented and documented in this GitHub repository.) Importantly, UA includes an additional, unique aspect of automobile travel times not quantified in any other equivalent system, through an algorithm to accurately estimate the likely time required to park a private vehicle, and then to walk to a desired destination. These parking times are crucial, as direct travel times to many inner-city destinations do not provide realistic estimates of actual journey times to locations where it may be impossible to actually park a private automobile.
Multi-Modal Transport. UA's "multi-modal travel times" represent fastest possible times taken for journeys from every single point in a city to travel 10km using any combination of transport modes excluding private automobile. The primary modes considered are walking, bicycling, and all available modes of public transport within each city. Where it is faster to cycle 10km than to take public transport (such as from locations with very poor public transport connections, or on the top of long hills where downhill cycling may actually be faster), these times will represent the single mode of cycling only, but multi-modal times will generally reflect fastest times formed by combining multiple modes of transport.
Travel times measured these two ways are then combined to generate the following two primary travel time statistics:
Absolute travel times as the multi-modal travel times; that is, using any mode except private automobile.
Relative travel times as the ratios of absolute travel times compared with equivalent travel times with private automobile. Relative travel times of less than one indicate that multi-modal transport is faster than equivalent transport with private automobile, while values greater than one indicate that private automobile transport is faster.
In addition to travel times, UA also includes the following two additional statistics quantifying other aspects of public transport systems. Both are measured for every point of origin with a city, with final values again derived by following the steps described above to obtain average values of each for all trips of 10km distance. Numbers of transfers are thus the average number required for all journeys of 10km, while intervals are the required waiting times for the next equivalent journey out to that distance.
Intervals to Next Service are measured in minutes. For each point of origin in a city, this statistic measures the waiting time necessary before departing to each destination within a city on the service after the one corresponding to the fastest journey. This delayed service may not be fast as the original, or it may even be faster in some cases, as the UA algorithms also prioritise connections with the fewest possible transfers. It can happen that subsequent services are actually faster, yet involve additional transfers not required in the originally identified "fastest" service.
Numbers of Transfers measure the number of transfers necessary for a minimal-transfer journey out to a distance of 10km. These minimal-transfer journeys are selected to allow for journeys slightly slower than absolute fastest journeys (generally by up to 5 minutes) if they involve fewer transfers.
All three of the statistics described above - travel times, intervals, and numbers of transfers - are measured such that lower values are more desirable. Travel times are then directly multiplied by (a logarithmically-transformed version of) intervals between services to generate a "compound travel statistic". Low values of this statistic only arise in locations which have fast travel times and short intervals between services. Low values may accordingly always be interpreted as indicating overall good transport services. In contrast, high values may arise through various combinations of variables, from extremely high values of one single variable, to less extreme combinations of the two variables. It is thus generally not possible to directly discern reasons for high values of this compound travel statistic. Urban Analyst nevertheless provides direct insight into all individual values, as well as all pairwise combinations of values, permitting indirect insight.
Population density values are taken directly from the European Union Global Human Settlement Layer data, aggregated into polygons for maps, or across entire cities for statistics.
Distances to nearest schools are measured in kilometres, as shortest walking distances from each point to the nearest school. These are network distances, and not simple straight line distances. A single value is ascribed to each point within a city, and all points aggregated after weighting by local population densities.
The bicycle infrastructure index is derived from a measure of the proportion of all possible journeys from each point out to a fixed distance of five kilometres that travel along dedicated bicycle infrastructure. To conform with all other UA variables, the index is one minus this proportion, so that low values reflect high proportions of bicycle infrastructure. Values of zero would then reflect all journeys taken along dedicated bicycle paths, while values of one would mean a complete absence of dedicated bicycle infrastructure.
Travel is calculated using a bicycle-specific algorithm that only extends along ways unsuitable for bicycle travel where no alternatives exist. The weighting scheme used adds total distances for all portions of travel along designated cycleways that are separated from vehicular traffic. Portions of trips extending along other types of ways are added with "half weightings" so, for example, one kilometre along these types is equivalent to two kilometres on dedicated bicycle ways. These "half-weight" ways include residential or "living" streets, unpaved tracks, and bicycle lanes directly alongside automobile lanes. A third category of ways are weighted at one-quarter, including footpaths and general pedestrian areas which permit bicycle travel. The precise weighting scheme can be viewed in this source code file.
The weighted sums of all distances along these types of ways traversed out to five kilometres from any given point are then divided by the sum of all distances travelled regardless of way type to give a ratio between zero and one. This bicycle infrastructure index is then one minus this value.
Natural space accessibility is measured in a similar way to the bicycle infrastructure variable, except it quantifies proportions of walking distances out to maximal distances of two kilometres that traverse natural spaces. This provides a more realistic measure of natural space than simple aggregations of areas, because it measures the ability of people to directly walk from every point in a city through or alongside nearby natural spaces.
Moreover, aggregate metrics do not generally capture the ability of people to actually access natural spaces. A park may, for example, have restricted or even private access. This would count as a natural space in a simply aggregate metric, yet not in UA because access restrictions are taken into account in the routing algorithms.
The algorithm also measures lengths of ways walked adjacent to water - so-called "blue space", providing a comprehensive metric of the actual ability to access natural spaces from every point in a city. A natural space index of zero would represent an entire city of natural space, with no built structures at all, while a value of one would represent a complete absence of natural spaces.
The parking index is the ratio of numbers of nearby parking spaces to total volumes of nearby buildings. The parking statistic is calculated for each point by adding all nearby parking spaces with a weighting scheme that decreases exponentially with distance, so that nearby parking spaces count more than parking spaces that are farther away. Building volumes are also aggregated using an identical weighting scheme. The parking index at each point is then the ratio of the sum of distance-weighted numbers of parking spaces to the sum of distance-weighted total building volumes.
All publicly accessible parking spaces are counted, including on-street parking, open parking lots, and multi-level parking garages. Building volumes are aggregated regardless of type or purpose.
The Urban Analyst platform aims to be applicable to as much of the world as possible. To achieve this, data sources are chosen which are ideally have global coverage, and which are not specific to any one city. The global sources used are:
- Open Street Map for all data on traversable ways, natural spaces, parking infrastructure, and locations of schools.
- Population density data from the European Union Global Human Settlement Layer.
- Elevation data from NASA Earth Observation Data, used to include effects of incline on both pedestrian and bicycle travel times.
The following additional data are then required for each city:
- Data on some measure of socio-demographic inequality or disadvantage, such as rates of unemployment, quantified within a series of polygonal shapes which are also used to define each city.
- Data on public transport timetables in General Transit Feed Specification (GTFS) format, or in other formats able to be converted to GTFS format.
The last of these data requirements is the most restrictive, as most cities of the world do not have or provide public transport data in GTFS format. There are nevertheless thousands of cities which do provide GTFS feeds, as can be seen for example in the GTFS feed aggregation platform transit.land.
The primary software used to generate the results presented on Urban Analyst are:
osmdatafor accessing and processing data from Open Street Map.
dodgrfor general network routing queries.
gtfsrouterfor public transport routine queries.
m4rafor multi-modal routing queries
ua-enginefor additional algorithms combining aspects of these other routing algorithms for tasks specific to Urban Analyst.
All of this software was primarily developed by the same team responsible for Urban Analyst itself.