How to Implement the Builder Pattern in Python for Automating SQL Dashboard Analytics
In this article, I discuss the design and implementation of an SQL Builder in Python. The main benefit of this pattern is saving plenty of development time in making domain-specific queries. In addition, the builder pattern allows flexibility in where and when information is added to the query.
Most SQL builder libraries are built to support SQL-like statements in different languages. However, builders that can truly automate queries for companies can only be developed in-house because public libraries will always be designed to only solve general problems.
Through this article (source code available below!), you will be able to create your own custom SQL builder with syntax such as the following example where we want to get the number of cars sold in the past 24 hours.
sql = (
AdvancedSQLBuilder().SELECT("NAME").SELECT("COUNT(*)")
.FROM("CARS").WHERE("NAME = 'BMW'")
.ORDER_BY("COUNT(*)").HOURS("SELL_DATE", 24)
.GROUP_BY("NAME")
.BUILD()
)
print(sql)
To output the following:
SELECT NAME, COUNT(*)
FROM CARS
WHERE NAME = 'BMW'
AND SELL_DATE >= DATEADD(HOUR, -24, GETDATE())
GROUP BY NAME
ORDER BY COUNT(*)
The above result uses MS SQL Server, but can be adapted for any SQL database.
Motivation
The naive way to instantiate an object of a class uses the following format:
class MyClass:
def __init__(self, item_a, item_b, item_c):
self.a = item_a
self.b = item_b
self.c = item_c
# Example
obj = MyClass(1, 2, 3)
All is well using this method of object instantiation until we do not need to supply all of the values of a, b, c during instantiation. An example of this is when creating SQL statements.
A similar method of designing an SQL statement object may be the following:
class SQLStatement:
def __init__(self, select_properties, from_tables,
where_conditions, group_by_clause, having_conditions):
self.select_properties = select_properties
self.from_tables = from_tables
self.where_conditions = where_conditions
self.group_by_clause = group_by_clause
self.having_conditions = having_conditions
def __repr__(self):
return f"""SELECT {self.select_properties}
FROM {self.from_tables}
WHERE {self.where_conditions}
GROUP BY {self.group_by_clause}
HAVING {self.having_conditions}
"""
# Example
sql = SQLStatement("NAME", "CARS", "COLOR = 'blue'", "","")
print(sql)
This outputs:
SELECT NAME
FROM CARS
WHERE COLOR = 'blue'
GROUP BY
HAVING
In the case above, we have an SQLStatement class that takes a required 5 arguments of select, from, where, group by, and having clauses. When we do not need to use the other arguments such “group by” and “having” as in the example, we need to put an empty argument. We can also adjust this class to use optional arguments, but that becomes messy very quickly.
The Builder Pattern
The builder pattern can be best described with a simple example:
class MyBuilder:
def __init__(self):
self.a = None
self.b = None
self.c = None
def a(self, arg):
self.a = arg
return self
def b(self, arg):
self.b = arg
return self
def c(self, arg):
self.c = arg
return self
def build(self):
list_of_values = [self.a, self.b, self.c]
list_of_values = [i for i in list_of_values if i is not None]
return list_of_values.join(",")
# Example
myBuilder = MyBuilder().b(1).a(2)
print(myBuilder.build())
# Outputs: 1,2
The above example demonstrates the builder pattern, which has the following properties:
- Functions that set a class variable should return self. This allows chaining commands such as in the example.
- The class should have a build() function (or any other name). This function should consume the information of all the variables that were set before calling it and return the desired output while handling missing information.
- The setter functions can be called in any order.
Let’s build the SQL builder!
Now that we know how to use the builder design pattern, we can use it to create a domain-specific SQL builder.
The goal of this builder is to make the process of aggregating data easier for any type of process. This is particularly useful for building queries for dashboards if we add the ability to restrict results within specific date ranges, segment different groups, etc.
First, we need a baseline SQL builder:
# SQLBuilder.py
newline = "\n"
class SQLBuilder(object):
def __init__(self):
self._DECLARE = ""
self._SELECT = ""
self._FROM_MAP = {}
self._WHERE = ""
self._ORDER_BY = ""
self._GROUP_BY = ""
self._HAVING = ""
def DECLARE(self, key, type, value = ''):
assert(key.startswith('@'))
# Handle keyword starting with declare
s = "\n"
if (self._DECLARE == ""):
s = ""
self._DECLARE += s + f"DECLARE {key} {type}{' = ' if value else ''}{value}"
return self
def SELECT(self, t):
s = ", "
if (self._SELECT == ""):
s = "SELECT "
self._SELECT += s + t
return self
def FROM(self, t, alias=''):
if (not t in self._FROM_MAP):
self._FROM_MAP[t] = ''
if (alias):
self._FROM_MAP[t] = alias
return self
def WHERE(self, t):
if (self._WHERE != ""):
return self.AND(t)
self._WHERE = "WHERE " + t
return self
def AND(self, t):
s = "\n"
if (self._WHERE == ""):
s = "WHERE 1=1\n"
self._WHERE += s + f"AND {t}"
return self
def GROUP_BY(self, t):
s = ",\n"
if (self._GROUP_BY == ""):
s = "GROUP BY "
self._GROUP_BY += s + t
return self
def ORDER_BY(self, t):
s = ",\n"
if (self._ORDER_BY == ""):
s = "ORDER BY "
self._ORDER_BY += s + t
return self
def HAVING(self, t):
s = ",\n"
if (self._HAVING == ""):
s = "HAVING "
self._HAVING += s + t
return self
def JOIN(self, table_one, key_one, table_two, key_two):
return self.FROM(table_one).FROM(table_two).AND(f"{table_one}.{key_one} = {table_two}.{key_two}")
def BUILD(self):
self._FROM = ', '.join([f"{name}{f' {alias}' if alias else ''}" for name, alias in self._FROM_MAP.items()])
return (
f"{self._DECLARE + newline if self._DECLARE else ''}"
+ f"{newline + self._SELECT if self._SELECT else ''}"
+ f"{newline + 'FROM ' + self._FROM if self._FROM else ''}"
+ f"{newline + self._WHERE if self._WHERE else ''}"
+ f"{newline + self._GROUP_BY if self._GROUP_BY else ''}"
+ f"{newline + self._HAVING if self._HAVING else ''}"
+ f"{newline + self._ORDER_BY if self._ORDER_BY else ''}"
)
An example of using this base class is to find the count of entries in a table CARS that has the NAME equal to ‘BMW’.
from SQLBuilder import SQLBuilder
print(
SQLBuilder()
.SELECT("NAME").SELECT("COUNT(*)")
.FROM("CAR_SALES", "C")
.WHERE("C.NAME = 'BMW'")
.ORDER_BY("COUNT(*)")
.BUILD()
)
This outputs:
SELECT NAME, COUNT(*)
FROM CAR_SALES C
WHERE C.NAME = 'BMW'
ORDER BY C.COUNT(*)
We can now add special helper functions
These helper functions can include:
- HOURS(key, n) to limit the range of the key up to n hours ago.
- TODAY(key) to limit the values of key to be within today.
- START_DATE(key, start_date) to limit the lowest value of key to start_date
- END_DATE(key, end_date) to limit the highest value of key to end_date
- LIKE(key, s) to add a WHERE condition that key is a string like s.
- NULL(key) to add a WHERE condition that the value of key is null.
- NOTNULL(key) to add a WHERE condition that the value of key is NOT null.
- etc.
# AdvancedSQLBuilder.py
from SQLBuilder import SQLBuilder
class AdvancedSQLBuilder(SQLBuilder):
def HOURS(self, key, n):
assert(n > 0)
self.AND(f"{key} >= DATEADD(HOUR, -{n}, GETDATE())")
return self
def TODAY(self, key):
self.AND(f"CONVERT(VARCHAR(10), {key}, 102) = CONVERT(VARCHAR(10), GETDATE(), 102)")
return self
def START_DATE(self, key, start_date):
(self.DECLARE('@start_date','datetime',start_date)
.AND(f"{key} is not null")
.AND(f"{key} >= @start_date"))
return self
def END_DATE(self, key, end_date):
self.DECLARE('@end_date','datetime',end_date)
self.AND(f"{key} is not null")
self.AND(f"{key} < @end_date")
return self
def LIKE(self, key, s):
self.AND(f"{key} like '{s}'")
return self
def NULL(self, key):
self.AND(f"{key} is null")
return self
def NOTNULL(self, key):
self.AND(f"{key} is not null")
return self
An example of using this is then to find the number of cars sold last year:
from AdvancedSQLBuilder import AdvancedSQLBuilder
print(
AdvancedSQLBuilder()
.SELECT("NAME").SELECT("COUNT(*)")
.FROM("CAR_SALES")
.START_DATE("SELL_DATE", '2021-11-18 00:00:00')
.END_DATE("SELL_DATE", '2022-11-18 00:00:00')
.ORDER_BY("COUNT(*)")
.GROUP_BY("NAME")
.BUILD()
)
This outputs:
DECLARE @start_date datetime = 2021-11-18 00:00:00
DECLARE @end_date datetime = 2022-11-18 00:00:00
SELECT NAME, COUNT(*)
FROM CAR_SALES
WHERE 1=1
AND SELL_DATE is not null
AND SELL_DATE >= @start_date
AND SELL_DATE is not null
AND SELL_DATE < @end_date
GROUP BY NAME
ORDER BY COUNT(*)
We can achieve even more with this pattern
Going into domain-specific examples, suppose our database uses ID numbers for our car names instead of a string in the following:
# Example database table:
# CARS
+------------+--------+----------------+
| PRIMARY_ID | CAR_ID | SOME_VALUE |
+------------+--------+----------------+
| 1 | 103112 | some_value |
| 2 | 103113 | another value |
| ... | ... | ... |
+------------+--------+----------------+
# CAR_MAP
+--------+---------------+
| ID | CARNAME |
+--------+---------------+
| 103112 | BMW |
| 103113 | Aston Martin |
| ... | ... |
+--------+---------------+
We can add an automatic mapping function so that we can filter by car name:
# DomainSpecificSQLBuilder.py
from AdvancedSQLBuilder import AdvancedSQLBuilder
class DomainSpecificSQLBuilder(AdvancedSQLBuilder):
def CARNAME(self, name, table_two, key_two):
self.JOIN("CAR_MAP","ID", table_two, key_two)
self.AND(f"CAR_MAP.NAME = {name}"
An example of using this is to get all the names of cars with more than 30 sales:
from DomainSpecificSQLBuilder import DomainSpecificSQLBuilder
print(
DomainSpecificSQLBuilder()
.SELECT("CARNAME").SELECT("COUNT(*)")
.FROM("CAR_SALES")
.CARNAME("BMW", "CAR_SALES", "CAR_ID")
.WHERE("COUNT(*) > 30")
.BUILD()
)
This outputs the following:
SELECT CARNAME, COUNT(*)
FROM CAR_SALES, CAR_MAP
WHERE 1=1
AND CAR_MAP.ID = CAR_SALES.CAR_ID
AND CAR_MAP.NAME = BMW
AND COUNT(*) > 30
Conclusion
In this article, I have shown how to use the builder pattern to create a domain-specific SQL builder. This can then be used to programmatically generate SQL queries for different slices of data, allowing swift gathering of data for business dashboards and general analysis.
GitHub Repository
Feel free to check out the source code here: https://github.com/justinsj/sql-builder
If you enjoyed this article or learned something from it, I would appreciate your clap!
If you also have anything to suggest, let me know in the comments or message me on LinkedIn!