How to use the colors of a colorbar in a bar plot in Matplotlib

The objective is to visualise two characteristics of a dataset: the spatial distribution of values by colour and the density distribution of values by colour. So, given a dataset of coordinates each with a colour value, show a scatter plot with a colour bar and a histogram of the colours. Below is a first attempt:

plain output in plot

The plots would be more useful if they communicated what data points on the cartesian plot belong in a given bin in the histogram. This means that, instead of a continous color range, the colorbar needs to have discreet colours that match the binning of the histogram.

Lets assume the data is normally distributed between 0 and 1.

data = np.random.rand(1000, 3)
x = data[:, 0]
y = data[:, 1]
c = data[:, 2]

First create a histogram, note that the range is explicitly set, rather that accept the default behaviour of using the max and min of the input (see numpy.histogram):

hist, bins = np.histogram(c, bins=7, range=(0, 1))

Note as well the small number of bins. This is to aid the visual matching of colors between the two plots.

Next, find out what bin each data point belongs to:

bin_indexed_data = np.digitize(c, bins)

The digitize function maps to zero the values that fall outside the leftmost bin and to N+1 to those that fall outside of the right-most bin. So, if you have N bins, the bin indices go from 1 to N:

bin_indices = np.arange(1, len(bins))

Now setup a normalising function according to the bin indices. This function will map a bin index to a normalised color index in the chosen colour map:

normaliser = colors.Normalize(np.min(bin_indices), np.max(bin_indices))

And choose a color map:

cm = plt.get_cmap('RdYlBu')

We are ready to plot:

fig, axes = plt.subplots(nrows=1, ncols=2)

scatter = axes[0].scatter(x, y, c=bin_indexed_data, cmap=cm, marker='o')

To control exactly what colors to show in the colorbar and what bins, the boundaries and values keyword arguments are used:

cb = plt.colorbar(scatter, ax=axes[0], boundaries=bins, values=bin_indices, ticks=bins)

This configures the color bar with discrete colours matching the bins.

Last we plot the histogram as a bar plot:

width = bins[1] - bins[0]
centers = (bins[:-1] + bins[1:]) / 2
axes[1].bar(centers, hist, align='center', width=width, color=cm(normaliser(bin_indices)))

The result is:

plain output in plot

The full code listing is:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize

data = np.random.rand(1000, 3)
x = data[:, 0]
y = data[:, 1]
c = data[:, 2]

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))

hist, bins = np.histogram(c, bins=7, range=(0, 1))

bin_indices = np.arange(1, len(bins))
normaliser = Normalize(np.min(bin_indices), np.max(bin_indices))

bin_indexed_data = np.digitize(c, bins)

cm = plt.get_cmap('RdYlBu')
scatter = axes[0].scatter(x, y, c=bin_indexed_data, cmap=cm, marker='o')
cb = plt.colorbar(scatter, ax=axes[0], boundaries=bins, values=bin_indices, ticks=bins)
axes[0].set_aspect('equal')

width = bins[1] - bins[0]
centers = (bins[:-1] + bins[1:]) / 2
axes[1].bar(centers, hist, align='center', width=width, color=cm(normaliser(bin_indices)))
axes[1].set_xticks(bins)

plt.show()

What happens if the data has NaNs?

In the example below, there are five bins and only 10 data points, one of which is a NaN.

with nan output in plot

The output of np.digitize(c, bins) is [6 5 2 2 1 3 5 5 4 1]. Note the number 6, this is the N+1'th bin to which the NaN was digitised. Because one of the colorbar values belongs outwith the stated boundaries, the colorbar shows unexpected colours and a mismatch occurrs between it and the histogram.