Creating a Covid-19 Dashboard with bokeh, pandas, numpy, etc.

There is a plethora of Covid-19 dashboards in the depths of the internet. However, they often let you not play around with the data and the parameters. So why not just build out own and customize it to our preferences. For interactive plots and visualization I love to work with bokeh. Without having to write javascript you can create everything you need for a dashboard and creates javascript output which can be rendered nicely in your browser. The final product looks like this and the entire source code can be found at Github:

gif

An interactive version is hosted at Heroku. Before we dive into the bokeh stuff, we take first a look at the data. Afterwards, I also explain briefly how you can use the Rest-Api and host a Python app for free at Heroku.

The Data

The root of all visualization is always the data. Thankfully, the Johns Hopkins University (JHU) offers raw data in csv format at their github account. So what is the first thing you do if you have csv files? You fire up a jupyter notebook (personally I use jupyter lab) and explore the data with pandas and some numpy.

So let’s get started and import these two wonderful libraries:

import pandas as pd
import numpy as np

Then we use the the raw urls from Github.

url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
url_death = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
url_recovered = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'

These files are updated daily, so we get the latest version. In pandas you can directly paste the URL into the read_csv function and load the remote file into a dataframe.

df = pd.read_csv(url_confirmed) 

With the head() we can get the first rows of the dataframe and see how the data is structured.

df.head()

	Province/State	Country/Region	Lat	Long	...	10/14/20	10/15/20	10/16/20	10/17/20
0	NaN	Afghanistan	33.93911	67.709953	...	39994	40026	40073	40141
1	NaN	Albania	41.15330	20.168300	...	15955	16212	16501	16774
2	NaN	Algeria	28.03390	1.659600	...	53584	53777	53998	54203
3	NaN	Andorra	42.50630	1.521800	...	3190	3190	3377	3377
4	NaN	Angola	-11.20270	17.873900	...	6846	7096	7222	7462

5 rows × 274 columns

So we have columns with the “Province/State”, “Country/Region”, the GPS coordinates and then the number of absolute confirmed cases. Each day is one column and JHU will add one column every day.

To get names of the confirmed cases columns we get all column names with df.columns and ignore the first four columns.

case_columns=df.columns[4:]

Now, lets plot the data of a single country. First, we get the index of the wanted row and then select the values with the loc[index,[columns]] access. A little plot in the end and we have our first simple visualization.

german_index = df.loc[df['Country/Region']=='Germany'].index[0]
df.loc[german_index,case_columns].plot()

png

So we see that the data is cumulative and always increasing. To get the new cases for every day we can just use diff() to subtract each succeeding columns.

df.loc[german_index,case_columns].diff().plot()

png

Looks a bit noisy. If we would zoom in, we would see a pattern every seven days. The well known seasonality of a week. We can smoothen the plot by a rolling window (with the windows size of seven) and averaging (we use numpy mean here).

df.loc[german_index,case_columns].diff().rolling(window=7, axis=0).apply(np.mean).plot()

png

This looks much better now. Later in the dashboard, we make the window size and the average function interactive parameters.

After we know how the numbers can be plotted, we want to inspect the missing data. If we drop all rows containing an nan and get the remaining unique countries, we will see the following:

df.dropna()["Country/Region"].unique()

array(['Australia', 'Canada', 'China', 'Denmark', 'France', 'Netherlands',
       'United Kingdom'], dtype=object)

As all the nan values are located in the Province/State column, we cann also select the rows where this column is not nan and see the same result.

df[~pd.isna(df['Province/State'])]['Country/Region'].unique()

array(['Australia', 'Canada', 'China', 'Denmark', 'France', 'Netherlands',
       'United Kingdom'], dtype=object)

A further inspection of this column reveals the following:

df['Province/State'].unique()

array([nan, 'Australian Capital Territory', 'New South Wales',
       'Northern Territory', 'Queensland', 'South Australia', 'Tasmania',
       'Victoria', 'Western Australia', 'Alberta', 'British Columbia',
       'Diamond Princess', 'Grand Princess', 'Manitoba', 'New Brunswick',
       'Newfoundland and Labrador', 'Northwest Territories',
       'Nova Scotia', 'Ontario', 'Prince Edward Island', 'Quebec',
       'Saskatchewan', 'Yukon', 'Anhui', 'Beijing', 'Chongqing', 'Fujian',
       'Gansu', 'Guangdong', 'Guangxi', 'Guizhou', 'Hainan', 'Hebei',
       'Heilongjiang', 'Henan', 'Hong Kong', 'Hubei', 'Hunan',
       'Inner Mongolia', 'Jiangsu', 'Jiangxi', 'Jilin', 'Liaoning',
       'Macau', 'Ningxia', 'Qinghai', 'Shaanxi', 'Shandong', 'Shanghai',
       'Shanxi', 'Sichuan', 'Tianjin', 'Tibet', 'Xinjiang', 'Yunnan',
       'Zhejiang', 'Faroe Islands', 'Greenland', 'French Guiana',
       'French Polynesia', 'Guadeloupe', 'Martinique', 'Mayotte',
       'New Caledonia', 'Reunion', 'Saint Barthelemy',
       'Saint Pierre and Miquelon', 'St Martin', 'Aruba',
       'Bonaire, Sint Eustatius and Saba', 'Curacao', 'Sint Maarten',
       'Anguilla', 'Bermuda', 'British Virgin Islands', 'Cayman Islands',
       'Channel Islands', 'Falkland Islands (Malvinas)', 'Gibraltar',
       'Isle of Man', 'Montserrat', 'Turks and Caicos Islands'],
      dtype=object)

A lot of islands and provinces of China:

df[df['Country/Region']=='China']

	Province/State	Country/Region	Lat	Long	...	10/14/20	10/15/20	10/16/20	10/17/20
56	Anhui	China	31.8257	117.2264	...	991	991	991	991
57	Beijing	China	40.1824	116.4142	...	937	937	937	937
58	Chongqing	China	30.0572	107.8740	...	585	586	586	586
59	Fujian	China	26.0789	117.9874	...	416	417	417	417
60	Gansu	China	35.7518	104.2861	...	170	170	170	170
61	Guangdong	China	23.3417	113.4244	...	1873	1875	1877	1881
62	Guangxi	China	23.8298	108.7881	...	260	260	260	260
63	Guizhou	China	26.8154	106.8748	...	147	147	147	147
64	Hainan	China	19.1959	109.7453	...	171	171	171	171
65	Hebei	China	39.5490	116.1306	...	368	368	368	368
66	Heilongjiang	China	47.8620	127.7615	...	948	948	948	948
67	Henan	China	37.8957	114.9042	...	1281	1281	1281	1281
68	Hong Kong	China	22.3000	114.2000	...	5201	5213	5220	5237
69	Hubei	China	30.9756	112.2707	...	68139	68139	68139	68139
70	Hunan	China	27.6104	111.7088	...	1019	1019	1019	1019
71	Inner Mongolia	China	44.0935	113.9448	...	270	275	275	275
72	Jiangsu	China	32.9711	119.4550	...	667	669	669	669
73	Jiangxi	China	27.6140	115.7221	...	935	935	935	935
74	Jilin	China	43.6661	126.1923	...	157	157	157	157
75	Liaoning	China	41.2956	122.6085	...	280	280	280	280
76	Macau	China	22.1667	113.5500	...	46	46	46	46
77	Ningxia	China	37.2692	106.1655	...	75	75	75	75
78	Qinghai	China	35.7452	95.9956	...	18	18	18	18
79	Shaanxi	China	35.1917	108.8701	...	433	433	434	436
80	Shandong	China	36.3427	118.1498	...	845	845	845	845
81	Shanghai	China	31.2020	121.4491	...	1064	1075	1080	1085
82	Shanxi	China	37.5777	112.2922	...	208	208	208	208
83	Sichuan	China	30.6171	102.7103	...	723	723	724	725
84	Tianjin	China	39.3054	117.3230	...	245	247	251	252
85	Tibet	China	31.6927	88.0924	...	1	1	1	1
86	Xinjiang	China	41.1129	85.2401	...	902	902	902	902
87	Yunnan	China	24.9740	101.4870	...	211	211	211	211
88	Zhejiang	China	29.1832	120.0934	...	1283	1283	1283	1283

33 rows × 274 columns

	Province/State	Country/Region	Lat	Long
69	Hubei	China	30.9756	112.2707

As the listing of states and provinces is very arbitrary (maybe in the beginning of the pandemic a more detailed view on China was useful), I decided for a compromise. I won’t display any regions just one number for one country. For displaying this on the world map, I decided to use the coordinates of the province/state with the most cases. For Example, for China this will be Hubei, for France it will be mainland France, as all islands and oversea territories have less cases. Therefore, we group by ‘Country/Region’ and then select the row where the maximum was recorded on the last day (case_columns[-1]):

idx = df.groupby('Country/Region')[case_columns[-1]].transform(max) == df[case_columns[-1]]

We get one row index per country which we use to generate a dataframe with the first for columns (province, country name, gps coordinates):

coord_df = df.loc[idx,df.columns[0:4]]

To validate our operation we take a look at China and see as expected, that Hubei was chosen as epicentre of the pandemic in China:

coord_df[coord_df['Country/Region']=='China']

	Province/State	Country/Region	Lat	Long
69	Hubei	China	30.9756	112.2707

For the case number, we use the sum over all provinces/states, again using groupby:

df = df.groupby('Country/Region')[case_columns].agg(sum)
df

	1/22/20	1/23/20	1/24/20	1/25/20	...	10/14/20	10/15/20	10/16/20	10/17/20
Country/Region
Afghanistan	0	0	0	0	...	39994	40026	40073	40141
Albania	0	0	0	0	...	15955	16212	16501	16774
Algeria	0	0	0	0	...	53584	53777	53998	54203
Andorra	0	0	0	0	...	3190	3190	3377	3377
Angola	0	0	0	0	...	6846	7096	7222	7462
...	...	...	...	...	...	...	...	...	...
West Bank and Gaza	0	0	0	0	...	45658	46100	46434	46746
Western Sahara	0	0	0	0	...	10	10	10	10
Yemen	0	0	0	0	...	2053	2053	2055	2055
Zambia	0	0	0	0	...	15616	15659	15659	15789
Zimbabwe	0	0	0	0	...	8055	8075	8099	8110

189 rows × 270 columns

To calculate the relative number of cases per persons in a country we need the population numbers. The UN provides some CSV data for download. Again, to transform the data to a format we can use, we have to do some processing. The procedure can be found in a separate page/notebook. To prevent “Unnamed” columns we use index_col=0. This results in a data frame with the country name as index.

df_population = pd.read_csv('data/population.csv',index_col=0)
df_population

	Country/Region	Population
Italy	Italy	60421760
Portugal	Portugal	10283822
World	World	7594270356
Rwanda	Rwanda	12301939
Bulgaria	Bulgaria	7025037
...	...	...
Diamond Princess	Diamond Princess	3600
Holy See	Holy See	825
Taiwan*	Taiwan*	23780000
Western Sahara	Western Sahara	595060
MS Zaandam	MS Zaandam	1829

268 rows × 2 columns

As the countries are the same now we can merge on the index:

df_w_pop = df.merge(df_population, left_index=True,right_index=True)
df_w_pop

	1/22/20	1/23/20	1/24/20	1/25/20	...	10/16/20	10/17/20	Country/Region	Population
Afghanistan	0	0	0	0	...	40073	40141	Afghanistan	37172386
Albania	0	0	0	0	...	16501	16774	Albania	2866376
Algeria	0	0	0	0	...	53998	54203	Algeria	42228429
Andorra	0	0	0	0	...	3377	3377	Andorra	77006
Angola	0	0	0	0	...	7222	7462	Angola	30809762
...	...	...	...	...	...	...	...	...	...
West Bank and Gaza	0	0	0	0	...	46434	46746	West Bank and Gaza	4569087
Western Sahara	0	0	0	0	...	10	10	Western Sahara	595060
Yemen	0	0	0	0	...	2055	2055	Yemen	28498687
Zambia	0	0	0	0	...	15659	15789	Zambia	17351822
Zimbabwe	0	0	0	0	...	8099	8110	Zimbabwe	14439018

189 rows × 272 columns

Validate with Germany:

pop_germany = df_w_pop.loc['Germany',['Population']]
pop_germany

Population    82905782
Name: Germany, dtype: object

Round about 83 millions sounds right. With this number we can now plot the cases divided by population (see the changed y axis):

ax = df_w_pop.loc['Germany',case_columns].apply(lambda x: x/pop_germany).plot()
ax.legend(["Confirmed cases Germany per capita"])

png

And the cases per one million inhabitants:

ax = df_w_pop.loc['Germany',case_columns].apply(lambda x: x/(pop_germany/1e6)).plot()
ax.legend(["Confirmed cases Germany per million"])

png

This concludes the inspection and pre-procession of the raw data and we can jump into the dashboard creation.

Doing the Bokeh

Bokeh enables you to create interactive visualization in your browser. You can create plots, tables, and other widgets to control appearance of the plots. In the case of our dashboard, we use two plots on the left (one line plot for the cases, and plotting circles in a world map). Further, we use some tables, buttons, sliders, multi-select lists etc.

These elements are arranged as follows:

png In Python code, we use the layout function, which takes a list of further layout elements. Specifically, we use column and row elements. The first row consists of three columns, where “column 0” contains the two plots. “Column 1” consists of two tables with html headings and the last columns has all the buttons and widgets for control. The bottom row’s only element is a footer text line:

self.layout = layout([
            row(column(tab_plot, world_map),
                column(top_top_14_new_header, top_top_14_new, top_top_14_cum_header, top_top_14_cum),
                column(refresh_button, radio_button_group_df, radio_button_group_per_capita, plots_button_group,
                       radio_button_group_scale, slider, radio_button_average,
                       multi_select),
                ),
            row(footer)
            ])

This footer for example is a simple Div element with HTML inside:

footer = Div(
            text="""Covid-19 Dashboard created by Andreas Weichslgartner in April 2020 with python, bokeh, pandas, 
            numpy, pyproj, and colorcet. Source Code can be found at 
            <a href="https://github.com/weichslgartner/covid_dashboard/">Github</a>.""",
            width=1600, height=10, align='center')

The buttons and selections widgets on the right also quite easy to implement. Just give a list with labels and a parameter which signals what button is active. Then add a callback function which will be triggered on clicking on the buttons.

        radio_button_group_per_capita = RadioButtonGroup(
            labels=["Total Cases", "Cases per Million"], active=0 if not self.active_per_capita else 1)
        radio_button_group_per_capita.on_click(self.update_capita)

The onclick callback function as one argument, the new value of the active button. In the case of the per_capita button, it is 0 for total numbers and 1 if the per_capita option is activated. As you see in the function, the current active status is kept in class member variables. First, I used global variables (which is fine for small bokeh plots), but gets rather ugly for more states. So, I decided to create a class Dashboard and encapsulate all the state variables as class members. Coming back to the callback function, once we saved the state, we update the table data (self.generate_table_new()and self.generate_table_cumulative()). Finally, we update the line plot in the upper left with the self.update_data function.

    def update_capita(self, new):
        """
        callback to change between total and per capita numbers
        :param new: 0 if total, 1 if per capita
        :return:
        """
        if new == 0:
            self.active_per_capita = False  # 'total'
        else:
            self.active_per_capita = True  # 'per_capita'
        self.generate_table_new()
        self.generate_table_cumulative()
        self.update_data('', self.country_list, self.country_list)

The second kind of callbacks are onchange functions, like in the case of the multiselect widget:

        multi_select = MultiSelect(title="Option (Multiselect Ctrl+Click):", value=self.country_list,
                                   options=countries, height=500)
        multi_select.on_change('value', self.update_data)

Here, the value is a list with the active countries and the callback function has a attribute value, the old value of the list and the new:

    def update_data(self, attr, old, new):
        """
        repaints the plots with an updated country list
        :param attr:
        :param old:
        :param new:
        :return:
        """
        _ = (attr, old)
        self.country_list = new
        self.source.data = self.get_dict_from_df(self.active_df, self.country_list, self.active_prefix.name)
        self.layout.set_select(dict(name=TAB_PANE), dict(tabs=self.generate_plot(self.source).tabs))

However, we only use the new value (currently active countries) and discard the other two parameters. Afterwards, we update the data source of the plot and redraw the complete plot. The data source is defined as ColumnDataSource(data=new_dict). In this dict each key is one plot line type and the values of this keys are the data point. For example, germany_confirmed_daily_raw represents the confirmed cases on a daily basis without average. Based on the active member variables these key values entries are generates by the self.get_dict_from_df function. Normally, this would be enough to update the plot but we might also influence the axis, scaling etc, that’s why just replace the old plot with a new one. To do this we use layout.set_select which searches for the element with the given name “TAB_PANE” and then replaces it with a new tabs element. The tabs element in our case is the line plot in the upper left with the two tabs daily and cumulative. You could also select the layout element by iterating through some children layout elements, e.g. layout.children[0].children[0]. I did this in the beginning, but this is rather ugly and once you update your layout you manually have to update to correct child. Thankfully I discovered set_select and can now search by a unique name of the element.

To fill the dictionaries of the data source we use the following function:

    def get_dict_from_df(self, df: pd.DataFrame, country_list: List[str], prefix: str):
        """
        returns the needed data in a dict
        :param df: dataframe to fetch the data
        :param country_list: list of countries for which the data should be fetched
        :param prefix: which data should be fetched, confirmed, deaths or recovered (refers to the dataframe)
        :return: dict with for keys
        """
        new_dict = {}
        for country in country_list:
            absolute_raw, absolute_rolling, absolute_trend, delta_raw, delta_rolling, delta_trend = \
                self.get_lines(df, country, self.active_window_size)
            country = replace_special_chars(country)
            new_dict[f"{country}_{prefix}_{TOTAL_SUFF}_{PlotType.raw.name}"] = absolute_raw
            new_dict[f"{country}_{prefix}_{TOTAL_SUFF}_{PlotType.average.name}"] = absolute_rolling
            new_dict[f"{country}_{prefix}_{TOTAL_SUFF}_{PlotType.trend.name}"] = absolute_trend
            new_dict[f"{country}_{prefix}_{DELTA_SUFF}_{PlotType.raw.name}"] = delta_raw
            new_dict[f"{country}_{prefix}_{DELTA_SUFF}_{PlotType.average.name}"] = delta_rolling
            new_dict[f"{country}_{prefix}_{DELTA_SUFF}_{PlotType.trend.name}"] = delta_trend
            new_dict['x'] = x_date  # list(range(0, len(delta_raw)))
        return new_dict

We iterate over all selected countries and get the needed data from the global pandas dataframes. In the key, we encode the country, the kind of data (confirmed, deaths, recovered), cumulative or daily and they kind data processing (raw, rolling average, and trend). We replace special characters and whitespaces in the country with number character. This is a bit of a hack, as the tooltip function only works with alphanumeric characters and there are no countries with numbers in the name. The x_date is also a global list with the dates for the x-axis, generated as follows:

x_date = [pd.to_datetime(case_columns[0]) + timedelta(days=x) for x in range(0, len(case_columns))]

The return value of get_dict_from_df is the base of the central plotting function generate_plot. It decodes back the information from the dict keys, generates the correct y-axis, and plots the lines specified by the current class state.

    def generate_plot(self, source: ColumnDataSource):
        """
        do the plotting based on interactive elements
        :param source: data source with the selected countries and the selected kind of data (confirmed, deaths, or
        recovered)
        :return: the plot layout in a tab
        """
        # global active_y_axis_type, active_tab
        keys = source.data.keys()
        if len(keys) == 0:
            return self.get_tab_pane()
        infected_numbers_new = []
        infected_numbers_absolute = []

        for k in keys:
            if f"{DELTA_SUFF}_{PlotType.raw.name}" in k:
                infected_numbers_new.append(max(source.data[k]))
            elif f"{TOTAL_SUFF}_{PlotType.raw.name}" in k:
                infected_numbers_absolute.append(max(source.data[k]))
        y_range = self.calculate_y_axis_range(infected_numbers_new)
        p_new = figure(title=f"{self.active_prefix.name} (new)", plot_height=400, plot_width=WIDTH, y_range=y_range,
                       background_fill_color=BACKGROUND_COLOR, y_axis_type=self.active_y_axis_type.name)
        y_range = self.calculate_y_axis_range(infected_numbers_absolute)
        p_absolute = figure(title=f"{self.active_prefix.name} (absolute)", plot_height=400, plot_width=WIDTH,
                            y_range=y_range,
                            background_fill_color=BACKGROUND_COLOR, y_axis_type=self.active_y_axis_type.name)

        selected_keys_absolute = []
        selected_keys_new = []
        for vals in source.data.keys():
            if vals == 'x' in vals:
                selected_keys_absolute.append(vals)
                selected_keys_new.append(vals)
                continue
            tokenz = vals.split('_')
            name = f"{revert_special_chars_replacement(tokenz[0])} ({tokenz[-1]})"
            color = color_dict[tokenz[0]]
            plt_type = PlotType[tokenz[-1]]
            if (plt_type == PlotType.raw and self.active_plot_raw) or \
                    (plt_type == PlotType.average and self.active_plot_average) or \
                    (plt_type == PlotType.trend and self.active_plot_trend):
                if TOTAL_SUFF in vals:
                    selected_keys_absolute.append(vals)
                    p_absolute.line('x', vals, source=source, line_dash=line_dict[plt_type].line_dash, color=color,
                                    alpha=line_dict[plt_type].alpha, line_width=line_dict[plt_type].line_width,
                                    line_cap='butt', legend_label=name)
                else:
                    selected_keys_new.append(vals)
                    p_new.line('x', vals, source=source, line_dash=line_dict[plt_type].line_dash, color=color,
                               alpha=line_dict[plt_type].alpha, line_width=line_dict[plt_type].line_width,
                               line_cap='round', legend_label=name)
        self.add_figure_attributes(p_absolute, selected_keys_absolute)
        self.add_figure_attributes(p_new, selected_keys_new)

        tab1 = Panel(child=p_new, title=f"{self.active_prefix.name} (daily)")
        tab2 = Panel(child=p_absolute, title=f"{self.active_prefix.name} (cumulative)")
        tabs = Tabs(tabs=[tab1, tab2], name=TAB_PANE)
        if self.layout is not None:
            tabs.active = self.get_tab_pane().active
        return tabs

For the line colors we use a global dict with a country/color relationship. The color scheme is taken from the package colorcet (I uses a dark scheme for best contrast against a grey background).

import colorcet as cc
color_dict = dict(zip(unique_countries_wo_special_chars,
                      cc.b_glasbey_bw_minc_20_maxl_70[:len(unique_countries_wo_special_chars)]
                      )
                  )

Also the hover tooltip is generated from data source dict keys:

    def generate_tool_tips(selected_keys) -> HoverTool:
        """
        string magic for the tool tips
        :param selected_keys:
        :return:
        """

        tooltips = [(f"{revert_special_chars_replacement(x.split('_')[0])} ({x.split('_')[-1]})",
                     f"@{x}{{(0,0)}}") if x != 'x' else ('Date', '$x{\%F}') for x in selected_keys]
        hover = HoverTool(tooltips=tooltips,
                          formatters={'$x': 'datetime'}
                          )
        return hover

The second plot, the world map, is generate with the following function:

    def create_world_map(self):
        """
        draws the fancy world map and do some projection magic
        :return:
        """
        tile_provider = get_provider(Vendors.CARTODBPOSITRON_RETINA)

        tool_tips = [
            ("(x,y)", "($x, $y)"),
            ("country", "@country"),
            ("number", "@num{(0,0)}")

        ]
        world_map = figure(width=WIDTH, height=400, x_range=(-BOUND, BOUND), y_range=(-10_000_000, 12_000_000),
                           x_axis_type="mercator", y_axis_type="mercator", tooltips=tool_tips)
        # world_map.axis.visible = False
        world_map.add_tile(tile_provider)
        self.world_circle_source = ColumnDataSource(
            dict(x=x_coord, y=y_coord, num=self.active_df['total'],
                 sizes=self.active_df['total'].apply(lambda d: ceil(log(d) * 4) if d > 1 else 1),
                 country=self.active_df[ColumnNames.country.value]))
        world_map.circle(x='x', y='y', size='sizes', source=self.world_circle_source, fill_color="red", fill_alpha=0.8)
        return world_map

The tile provider determines the style of the map (an overview of the styles can be found here). Again, the data is stored in a ColumnDataSource and a dict. The circles are a logarithmic representations of the numbers in the selected global dataframe. One thing we have to take care of is the projection of the coordinates. The John Hopkins data gives the coordinates in the World Geodetic System notation (a 3D representation with longitude and latitude as used in GPS). In contrast, for the 2D plot we need a projection of the 3D coordinated to a 2D space. The most used projection is the Mercator projection. We can do the transformation with the pyproj package.

from pyproj import Transformer
# Transform lat and long (World Geodetic System, GPS, EPSG:4326) to x and y  (Pseudo-Mercator, "epsg:3857")
transformer = Transformer.from_crs("epsg:4326", "epsg:3857")
x_coord, y_coord = transformer.transform(df_coord[ColumnNames.lat.value].values,
                                         df_coord[ColumnNames.long.value].values)

Adding a Rest-Api

We finished the layout of our dashboard, but each time we start it will have the same countries and plot types selected. If we want to share a specific plot a Rest-Api would be neat. For example, something like the following request:

https://covid-19-bokeh-dashboard.herokuapp.com/dashboard?country=Germany&country=Finland&per_capita=True&plot_raw=False

After the URL, we append a ? and then key/value pairs concatenated with =. Note here, that we can have have multiple values with the same key (important for the country selection).

But how do we get these key/value pairs into our dashboard? Fortunately, bokeh runs on a Tornardo webserver which has the needed functionality. We can access them as follows:

args = curdoc().session_context.request.arguments

The keys in this àrgs dict are already strings, but the values are still encoded as byte strings and we need to encode them back with the to_basestring function from Tornardo.

The overall parsing function looks like this:

def parse_arguments(arguments: dict):
    """
    parse get arguments of rest api
    :param arguments: as the dict given from tornardo
    :return:
    """
    arguments = {k.lower(): v for k, v in arguments.items()}
    country_list = ['Germany']
    if 'country' in arguments:
        country_list = [countries_lower_dict[to_basestring(c).lower()] for c in arguments['country'] if
                        to_basestring(c).lower() in countries_lower_dict.keys()]
    if len(country_list) == 0:
        country_list = ['Germany']
    active_per_capita = parse_bool(arguments, 'per_capita', False)
    active_window_size = parse_int(arguments, 'window_size', 7)
    active_plot_raw = parse_bool(arguments, 'plot_raw')
    active_plot_average = parse_bool(arguments, 'plot_average')
    active_plot_trend = parse_bool(arguments, 'plot_trend')
    active_average = Average.median if 'average' in arguments and to_basestring(
        arguments['average'][0]).lower() == 'median' else Average.mean
    active_y_axis_type = Scale.log if 'scale' in arguments and to_basestring(
        arguments['scale'][0]).lower() == Scale.log.name else Scale.linear
    active_prefix = Prefix.confirmed
    if 'data' in arguments:
        val = to_basestring(arguments['data'][0]).lower()
        if val in Prefix.deaths.name:
            active_prefix = Prefix.deaths
        elif val in Prefix.recovered.name:
            active_prefix = Prefix.recovered
    return country_list, active_per_capita, active_window_size, active_plot_raw, active_plot_average, \
           active_plot_trend, active_average, active_y_axis_type, active_prefix

The results are the used to construct the dashboard:

country_list_, active_per_capita_, active_window_size_, active_plot_raw_, active_plot_average_, \
active_plot_trend_, active_average_, active_y_axis_type_, active_prefix_ = parse_arguments(args)

dash = Dashboard(country_list=country_list_,
                 active_per_capita=active_per_capita_,
                 active_window_size=active_window_size_,
                 active_plot_raw=active_plot_raw_,
                 active_plot_average=active_plot_average_,
                 active_plot_trend=active_plot_trend_,
                 active_y_axis_type=active_y_axis_type_,
                 active_prefix=active_prefix_)

The last thing we need to do is to call the layout function:

dash.do_layout()

Now we can start up the dashboard in our console:

bokeh serve dashboar.py

And see the result at http://localhost:5006/dashboard.

Hosting the Dashboard at Heroku

It nice to check the dashboard on our local machine, but hosting the whole thing on the internet would be great. After some search I found Heroku, where you can host Python apps for free. The easiest thing is to connect your Github repo with Heroku and with each push you deploy the app. Further, you need a requirements.txt with your Python dependencies. You can generate this file with:

pip freeze > requirements.txt

I personally use Anaconda as a package manager and at the first try my requirements.txt did not work out of the box with Heroku (they use virtual env and pip). In the end, I created a new local conda environment and install the requirements.txt through pip to test if everything works.

Further you need to add a Procfile to your repository. In the dashboard case, it has the following content: ` web: bokeh serve –port=$PORT –address=0.0.0.0 –allow-websocket-origin=covid-19-bokeh-dashboard.herokuapp.com –use-xheaders dashboard.py `

One final thing to notice, the Heroku app will be shutdown after some time and has to restart at the next request. This can take some time. So you can regularly access your app to prevent this.

Conclusion

Bokeh is a really great library to create interactive plots with all the widgets you could imagine. And all this without getting your hands dirty with javascript. It is also very good for visualizing streaming data on the fly, e.g., for monitoring systems etc. It will stay my tool of choice for these kind of tasks. For other interactive plotting I also like the wonderful Altair package. You can find the complete source code of the dashboard, notebooks, and config files at my Github and the deployed version at Heroku.