Explore Building Data Genome 2 Data with Streamlit, Part I: Metadata

5 min readFeb 4, 2021

In this series of I will demonstrate using Streamlit to create a set of data apps to explore the Building Data Genome 2 data-set. This series is divided into three parts: the first part builds up a Streamlit app to explore the metadata and weather data from the BDG2 data-set. The next parts will demonstrate how to use Streamlit to create interactive machine learning models to perform unsupervised and supervised learning over the BDG2 data-set. You can find the source code for this series on Github.

Calculate Heating Degree Day/Cooling Degree Day for selected BDG2 sites. Made possible (and easily) through Streamlit! Link to app: https://share.streamlit.io/zixiaoshawnshi/buildinggenomeexplorer/src/metadata_explore.py

A little background on Streamlit and the data-set:

Streamlit is an awesome new tool in the Python ecosystem. It allows users to easily create data apps for exploration, analysis and creating ML models. What’s even more awesomely is that you can ask Streamlit to host your apps for free! No more dealing with app hosting yourself. It is perfect for quick prototyping or small projects. To install Streamlit on your local computer and try out yourself, simply follow the tutorial from the Streamlit documentation.

The Building Data Genome 2 (BDG2) is a public data-set containing more than 3,000 energy meters from 1,636 buildings. Each energy meter contains hourly readings for a span of two years. This is one of the first public building energy data with this scale and resolution.

Now the introduction is over, we can start writing a Streamlit app! First import some things we need:

import streamlit as stimport numpy as np
import pandas as pd
import plotly.express as px
import functions as f

In Streamlit, writing texts is super easy by simply using some functions as below:

st.title("BDG2 Metadata Exploration")
st.header("Author: [Zixiao (Shawn) Shi](zixiao.shawn.shi@gmail.com)")
st.text("In this streamlit explorer we examine the metadata and weather data for the BDG2 dataset.")

Of course, if you prefer Markdown like myself, you can use st.markdown() or simply take advantage of Streamlit magic like this:

'''[Link to original dataset](https://github.com/buds-lab/building-data-genome-project-2)Some columns names have been remapped for better readability.
‘’’

I think there are some font inconsistencies mixing st.text() and markdown.

Instead of being a typical block of Python comments, these contents will be rendered as Markdown on the webpage.

Next read the dataset and perform some tranformation:

@st.cache
def load_data():
    df = pd.read_csv("https://media.githubusercontent.com/media/buds-lab/building-data-genome-project-2/master/data/metadata/metadata.csv")
    weather_df = pd.read_csv("https://media.githubusercontent.com/media/buds-lab/building-data-genome-project-2/master/data/weather/weather.csv")column_dict = {
        'sqm':                  'Square Meters',
        'sqft':                 'Square Feet',
        'yearbuilt':            'Year Built',
        'eui':                  'EUI',
        'site_id':              'Site Name',
        'primaryspaceusage':    'Primary Space Type',
        'timezone':             'Time Zone',
        'lat':                  'latitude',
        'lng':                  'longitude'
    }
    df = df.rename(columns = column_dict)weather_column_dict = {
        'site_id':              'Site Name',
        'airTemperature':       'Air Temperature',
        'cloudCoverage':        'Cloud Coverage',
        'dewTemperature':       'Dew Point Temperature',
        'precipDepth1HR':       'Hourly Percipitation Depth',
        'precipDepth6HR':       '6-Hour Percipitation Depth',
        'seaLvlPressure':       'Air Pressure',
        'windDirection':        'Wind Direction',
        'windSpeed':            'Wind Speed',
    }
    weather_df = weather_df.rename(columns = weather_column_dict)return df, weather_dfdf, weather_df = load_data()

Note we used @st.cache property for the load_data() function, which makes Streamlit to use cache when reading these .csv files. Which is pretty handy and prevents the app from loading the data repeatedly.

Next use the Streamlit multi-select widget to allow the user to choose which sites to display on the built-in Streamlit map:

'''
## Location of the buildings:'''
map_sites_to_plot = st.multiselect(
    "Choose which site to show on the map:",
    df["Site Name"].unique()
)map_filter = df["Site Name"].isin(map_sites_to_plot)
st.map(df.loc[map_filter, ['latitude', 'longitude']].dropna())

Built-in Streamlit maps, the selected BDG2 sites are shown as red dots. Granted this is not a perfect geospatial visualization. Maybe for now it is better to use other alternatives, I will touch on this in the later parts.

Next use the Streamlit selectbox widget to allow users to choose which building attribute to display in a histogram. The user can also decide how to group the buildings, I will be using plotly for the histogram. Besides it’s built-in plotting tools, Streamlit currently also supports matplotlib and Bokeh.

'''
## Distribution of building attributes:
'''hist_option_plot_column = st.selectbox(
    'Choose which building attribute to plot the histogram:',
    ["Square Meters", "Square Feet", "Year Built", "EUI"]
)hist_option_group_column = st.selectbox(
    'Choose which building attribute to group the histogram:',
    ["No Grouping", "Site Name", "Primary Space Type", "Time Zone"]
)if hist_option_group_column == "No Grouping":
    hist = px.histogram(df, x=hist_option_plot_column)
else:
    hist = px.histogram(df, x=hist_option_plot_column, color = hist_option_group_column)
st.plotly_chart(hist)

Now to make the Heating Degree Day (HDD) and Cooling Degree Day (CDD) calculator requires a bit more effort. First is to ask users to enter the desired base temperature for HDD and CDD. Fortunately Streamlit also thought of this already:

hdd_base = st.number_input(
    "Enter the baseline temperature for heating degree days (HDD):",
    value = 10.0,
    step = 0.1
)
cdd_base = st.number_input(
    "Enter the baseline temperature for cooling degree days (CDD):",
    value = 10.0,
    step = 0.1
)

Now to create HDD/CDD calculation and plot them by each site:

dd_sites_to_plot = st.multiselect(
    "Choose which sites to plot HDD/CDD for:",
    weather_df["Site Name"].unique()
)if len(dd_sites_to_plot) > 0:
    dd_filter = weather_df["Site Name"].isin(dd_sites_to_plot)
    dd_df = weather_df.loc[dd_filter, ['timestamp', 'Site Name', 'Air Temperature']]
    dd_df['HDD'] = f.calculate_hdd(dd_df, hdd_base, 'Air Temperature')
    dd_df['CDD'] = f.calculate_hdd(dd_df, cdd_base, 'Air Temperature')
    dd_df['timestamp'] = pd.to_datetime(dd_df['timestamp'])
    dd_df = dd_df.set_index('timestamp')
    dd_results = (dd_df.groupby('Site Name').resample('m').sum()/24.0).reset_index()
    hdd_fig = px.bar(dd_results, x='timestamp', y='HDD', color='Site Name', barmode='group')
    st.plotly_chart(hdd_fig)
    cdd_fig = px.bar(dd_results, x='timestamp', y='CDD', color='Site Name', barmode='group')
    st.plotly_chart(cdd_fig)

And it looks like:

That’s it, this metadata and weather data exploration app for BDG2 is created without too much sweat! If you have noticed I haven’t include all the pieces in this article, so you can explore the app and source code on your own.

Furthermore, the layout of the app is not the best, but Streamlit actually offers pretty powerful layout tools and customization. We will take advantage of them in later parts of the series. Until then, cheers!

Explore Building Data Genome 2 Data with Streamlit, Part I: Metadata

Written by Zixiao (Shawn) Shi

No responses yet