Create a choropleth map with geoviews and geopandas. Working geoviews installation instructions as of May 2021.

Choropleth maps can be used to immediately convey important information about a geographical dataset.

Like heat maps, they show the local variations of a measurement, such as population density. However, while heat maps average measurements in arbitrary bins, choropleth maps do that according to predefined boundaries, such as country and state frontiers.

In this post, you will use state-of-the art python visualization libraries to draw choropleth maps. You will learn:

How to install holoviz, geopandas, and geoviews in a conda environment
How to create your first maps with world geographical boundaries
Where to find definitions of geographical boundaries to make maps
How to connect your own data to geographical boundaries, to plot whatever you like.

choropleth map of the french population, by department

Installation

First, Install Anaconda if not yet done.

Then, create a new conda environment for this exercise, named geoviews:

conda create -n geoviews python=3.7

May 2021: Note that we do want to use python 3.7, to avoid a temporary conda dependency issue with later versions of python.

Activate it:

conda activate geoviews

And install geoviews :

conda install -c pyviz geoviews

This is going to take a few minutes, please be patient.

Finally, here are the files needed for this tutorial (jupyter notebook, input dataset), in case you need them.

The python visualization landscape : orientation

By installing geoviews, we have actually installed a large number of python packages, that are (or might be) needed for geographical data analysis and visualization.

Here is a simplified description of the dependencies between some of these packages:

geoviews : geographical visualization
- holoviews : high-level visualization of multidimensional data
  - bokeh : low-level visualization backend, based on JavaScript
- geopandas : describe and analyze geographical data
  - pandas : describe and analyze data
    - numpy : efficient manipulation of multidimensional data arrays, and fundamental package for scientific computing in python

This might seem a bit complicated, and indeed it is!

Currently, at the end of 2019, the landscape of python visualization is transforming rapidly, and it can be quite difficult to choose and learn the right tools. Personally, here's what I'm looking for:

a concise syntax: I want to do plots without wasting time writing code, to get fast insight on my data.
interactive plots in the browser: this basically calls for JavaScript under the hood.
big data: display lots of information without killing the client browser.
python: so JavaScript should remain mostly hidden from me.

Formalizing these four points helped me a lot in choosing my tools. For example, point 4. rules out pure JavaScript libraries such as D3.js. I still believe that these libraries are the way to go for professional and large scale display in the browser. But D3 for example has quite a steep learning curve... I did spend at least a week on it already! I will probably prefer to hire an expert when I really need a D3-based display rather than investing time into learning it.

Bokeh, on the other hand, drives JavaScript from python code (without relying on D3). So that's exactly what I need to address point 2.

Many times, I use bokeh directly, like in Show your Data in a Google Map with Python or Interactive Visualization with Bokeh in a Jupyter Notebook. But making a single plot in bokeh can require a dozen lines of code or more.

Holoviews, which can be used as a high-level interface to bokeh or matplotlib, makes it easy to create complex plots in just two lines of code, and is thus addressing point 1.

When I need to display big data, I use datashader, a library that compresses big data into an image dynamically before sending it to a bokeh plot. Again, bokeh and holoviews are the way to go here.

In june 2019, the holoviz project was launched. The holoviz team is packaging the tools I need in the pyviz conda repo, and seems to share my views of how data visualization in python should evolve. So I'm now using holoviz as a main entry point to visualization. If you want to know more about this project, you can refer to their FAQ.

First maps with geoviews¶

We import the tools we need, and we initialize geoviews:

In [1]:

import geoviews as gv
import geoviews.feature as gf
import xarray as xr
from cartopy import crs

gv.extension('bokeh')

geoviews.feature contains the geometric description of various features, such as the world land, ocean, and borders. Let's plot this:

In [2]:

(gf.land * gf.ocean * gf.borders).opts(
    'Feature', 
    projection=crs.Mercator(), 
    global_extent=True, 
    width=500
)

Out[2]:

With cartopy, we can easily change the projection:

In [3]:

(gf.land * gf.ocean * gf.borders).opts(
    'Feature', 
    projection=crs.NearsidePerspective(central_longitude=0, 
                                       central_latitude=45), 
    global_extent=True, 
    width=500, height=500
)

Out[3]:

Finding more features¶

For now, we have extracted the features we wanted to plot (gf.land, gf.ocean, gf.borders) from geoviews.features. Let's see what else we can find there:

In [4]:

dir(gf)

Out[4]:

['Feature',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'borders',
 'cf',
 'coastline',
 'grid',
 'lakes',
 'land',
 'ocean',
 'rivers',
 'states']

In fact, these are only pre-defined features for common use cases, and are quite limited. For example, the states features are restricted to the US.

Cartopy, which is in fact used under the hood by geoviews to get the land, ocean, and borders features, has a feature interface that allows to get access to more features through a variety of data sources, like (Natural Earth)(http://www.naturalearthdata.com).

But if you're looking for something specific, it might be unavailable. For example, I haven't been able to find a geographic description of the french departments, to create the map at the top of this article.

In the next section, we will see how to work around this problem.

Loading a GeoJSON file with pandas¶

GeoJSON files encode geographic data structures and can easily be read and manipulated with geopandas.

A google search of "france geojson" led me to the france-geojson repository by Grégoire David (thanks!). From this repository, I downloaded departements-version-simplifiee.geojson as france-geojson/departements-version-simplifiee.geojson.

After doing that, we can import geopandas and read this file:

In [5]:

import geopandas as gpd
sf = gpd.read_file('france-geojson/departements-version-simplifiee.geojson')
sf.head()

Out[5]:

	code	nom	geometry
0	01	Ain	POLYGON ((4.78021 46.17668, 4.79458 46.21832, ...
1	02	Aisne	POLYGON ((4.04797 49.40564, 4.03991 49.39740, ...
2	03	Allier	POLYGON ((3.03207 46.79491, 3.04907 46.75808, ...
3	04	Alpes-de-Haute-Provence	POLYGON ((5.67604 44.19143, 5.69209 44.18648, ...
4	05	Hautes-Alpes	POLYGON ((6.26057 45.12685, 6.29922 45.10855, ...

We see that the department geometries are stored as polygons, with the longitude and latitude of each point on the polygon. It's easy to draw a department with geopandas: we just select a row from the dataframe (here the first one for the department Ain), and we evaluate the geometry:

In [6]:

sf.loc[0].geometry

Out[6]:

We're now going to show all french departments, with a color indicating a measured value for each department. For the moment, we don't have a measured value to display, so we generate one randomly:

In [7]:

import numpy as np
sf['value'] = np.random.randint(1, 10, sf.shape[0])

Let's have a closer look at this line of code.

On the right side, we use numpy to generate an array of random integers between 1 and 10, with a length equal to the number of rows in the dataframe, which is sf.shape[0]. Then, we assign this numpy array to a new column in the dataframe, called value.

Plotting geographical data with geoviews¶

In the holoviews style, we just need two lines to display the departments. The first line declares the set of polygons from our geopandas dataframe. We explicitely choose the department name and its value as value dimensions:

In [8]:

deps = gv.Polygons(sf, vdims=['nom','value'])

Then, we plot the departments, with a few options. The most important one is color=dim('value'). This just means that we are going to map value to a linear color scale for each department. And you can find a full explanation of what dim objects are in the holoviews documentation.

Note that the column used for the color must be in vdims. It's easy to forget!

In [9]:

from geoviews import dim
deps.opts(width=600, height=600, toolbar='above', color=dim('value'),
          colorbar=True, tools=['hover'], aspect='equal')

Out[9]:

We used bokeh as a backend, so we benefit from the usual tools from this library (at the top of the plot, as specified), and from its interactive features. You can try to wheel zoom, pan, and hover on the departments to get detailed information.

Creating a choropleth map with your own data¶

In the simple example above, we randomly defined a value for each department. But in real life, what you want to do is to display data that is not present in the geopandas dataframe. So we need a way to associate data to the geopandas dataframe before plotting.

That's what we're going to do now, taking as an example the estimated population in each department, in 2019. I found the data here in xls format, and I converted it to tsv file (tab-separated values) to create the file population_france_departements_2019.tsv.

To read this file, we use plain pandas. Indeed, we don't need geoviews here because this dataset does not contain any geographical information. We use the well-known read_csv method, specifying that tabs are used to separate values, and that a comma is used to indicate thousands. You can just have a look inside the tsf text file if you want.

In [10]:

import pandas as pd
df = pd.read_csv('population_france_departements_2019.tsv', sep='\t', thousands=',')
df.head()

Out[10]:

	depcode	nom	t_0_19	t_20_39	t_40_59	t_60_74	t_75	t_total	h_0_19	h_20_39	h_40_59	h_60_74	h_75	h_total	f_0_19	f_20_39	f_40_59	f_60_74	f_75	f_total
0	01	Ain	167720	150949	179476	102788	52755	653688	86359	75242	89278	49523	21526	321928	81361	75707	90198	53265	31229	331760
1	02	Aisne	131435	115046	137405	96000	48130	528016	67391	57430	68197	45907	18308	257233	64044	57616	69208	50093	29822	270783
2	03	Allier	67628	61986	87232	71910	44309	333065	34894	31397	42594	33582	16973	159440	32734	30589	44638	38328	27336	173625
3	04	Alpes-de-Haute-Provence	33883	30028	43039	34523	20507	161980	17721	15097	20941	16884	8510	79153	16162	14931	22098	17639	11997	82827
4	05	Hautes-Alpes	30518	28633	37887	28356	16390	141784	15987	14362	18843	13704	6592	69488	14531	14271	19044	14652	9798	72296

After the first columns that indicate the deparment code and name, we have three sets of columns for the number of people in the department. The prefixes mean the following:

t_ : total number
h_ : number of men
f_ : number of women

And in each set, there is one column for each age range: from 0 to 19 years, ... , over 75 years.

Now, we need to associate the data in this pandas dataframe (df) to our geopandas dataframe (sf). That's very easy with the dataframe merge method.

Below, we merge data on each line, making sure that the value in column code of sf is equal to the value in column depcode of df. The suffixes option is for columns that have the same name in df and sf, like nom (the department name). It determines the names that the columns will have after merging. Here, I want the column that came from sf to be kept with the same name, and the one that came from df to get a _y suffix.

In [11]:

jf = sf.merge(df, left_on='code', right_on='depcode', suffixes=('','_y'))
jf.head(2)

Out[11]:

	code	nom	geometry	value	depcode	nom_y	t_0_19	t_20_39	t_40_59	t_60_74	...	h_40_59	h_60_74	h_75	h_total	f_0_19	f_20_39	f_40_59	f_60_74	f_75	f_total
0	01	Ain	POLYGON ((4.78021 46.17668, 4.79458 46.21832, ...	3	01	Ain	167720	150949	179476	102788	...	89278	49523	21526	321928	81361	75707	90198	53265	31229	331760
1	02	Aisne	POLYGON ((4.04797 49.40564, 4.03991 49.39740, ...	3	02	Aisne	131435	115046	137405	96000	...	68197	45907	18308	257233	64044	57616	69208	50093	29822	270783

2 rows × 24 columns

And we're now ready to plot the total population in each department! We also add the total number of men and women in the tooltips, by adding these columns to vdims. Also, please note how I rescaled the t_total color scale to display it in millions of inhabitants.

In [12]:

# declare how the plot should be done
regions = gv.Polygons(jf, vdims=['nom', 't_total', 'h_total', 'f_total'])
regions.opts(width=600, height=600, toolbar='above', color=dim('t_total')/1e6, 
             colorbar=True, tools=['hover'], aspect='equal')

Out[12]:

Conclusion

In this post, you have learnt:

About the python visualization ecosystem, and how to get started fast;
How to create your first maps with world geographical boundaries;
Where to find definitions of geographical boundaries to make maps;
How to connect your own data to geographical boundaries, to plot whatever you like.

If you know about open data sources for geographic visualization, please comment! I'd be very interested to learn about them.

Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my mailing list to be notified of new posts (no more than one mail per week I promise.)

Back Home

Learn about Data Science and Machine Learning!

You can join my mailing list for new posts and exclusive content:

Choropleth Maps in Python (2021)

Installation

The python visualization landscape : orientation

First maps with geoviews¶

Finding more features¶

Loading a GeoJSON file with pandas¶

Plotting geographical data with geoviews¶

Creating a choropleth map with your own data¶

Conclusion

Learn about Data Science and Machine Learning!

Please share if you like this post:

Tags

Join the Data Frog Community

Subscribe

Connect