Creating a Simple Mongo ORM in Python
Published on Oct 17, 2021 by Diego Rodriguez Mancini.
Introduction
In ALeRCE we have a python library for connecting to our database in different applications. The library, called DB-Plugins, is supposed to have plugins to facilitate the interaction of our applications and the different database engines. Each plugin consists of a few models and a core connection interface for the user to make queries.
Currently there is only a SQL Plugin and models that map our database tables to python objects using SQLAlchemy. But now we need to add a MongoDB plugin to connect to a new database.
This article will go through the process of creating the ORM part of the plugin.
Object Relational Mapper
An Object Relational Mapper maps database tables to python classes. Libraries like Django and SQLAlchemy have very complex and complete ORM tools. In our case, we are using MongoDB which is a NoSQL database engine in which “tables” are called collections and collections have documents in a format similar to JSON and with python in particular, we can use PyMongo as client library and interact with the database using plain dict
objects.
So a ORM-like utility for Mongo is not really needed, but sometimes it might be useful. For example when you want to keep some kind of standarization between your applications and also in our specific case where we want to maintain a similar interface between SQL and NoSQL plugins.
Also, ORM-like tools have a lot of aditional purposes, like providing validation, querying, database initialization/deletion, etc. This article is about a very simple ORM layer that only aims to cover the model declaration and database initialization (collections + indexes). Other functionality like querying is delegated to a different module.
Implementation
The Base Model Class
In order to have a declarative way of initializing our database, we need to make all models inherit from a Base class. This class will contain the model’s metadata such as the field mappings, indexes, collection name, and other information we might need. Also since pymongo treats database documents as dictionaries, we can make the Base class to extend dict
and this allows all models to also be dict
instances.
class Base(dict, metaclass=BaseMetaClass): def __init__(self, *args, **kwargs): super(Base, self).__init__(*args, **kwargs) def __getattr__(self, key): try: return self[key] except KeyError: raise AttributeError( "{} has no attribute {}".format( self.__class__.__name__, key, ) ) def __setattr__(self, key, value): self[key] = value def __str__(self): return dict.__str__(self) def __repr__(self): return dict.__repr__(self)
So far it’s a pretty simple class. It inherits from dict
and calls the super’s constructor method to initialize just as if it was a dictionary. The key concept of this class is the metaclass argument, with a BaseMetaClass class.
The Base Model’s Metaclass
In order to register mappings and overall database metadata we need to use a metaclass. A metaclass is like a class that creates other classes. So when the Base class is declared (not instanced) it uses a metaclass class. The metaclass gets instanced and runs a __new__
method to create the class using this metaclass.
Let’s take a look on the metaclass code:
class BaseMetaClass(type): """Metaclass for Base model class that creates mappings.""" metadata = Metadata() # this class will contain overall database metadata def __new__(cls, name, bases, attrs): """Create class with mappings and other metadata.""" if name == "Base": return type.__new__(cls, name, bases, attrs) fields = {} class_dict = { "__fields__": {}, "__tablename__": name } for k, v in attrs.items(): if k == "__tablename__": class_dict["__tablename__"] = v if isinstance(v, Field): class_dict["__fields__"][v.name] = v cls.metadata.collections[class_dict["__tablename__"]] = { "fields": class_dict["__fields__"], } return type.__new__(cls, name, bases, class_dict)
The __new__
method returns a new class declaration. We are using type
as the parent class and returning its __new__
method but customizing the attributes of the new created class with the class_dict
dictionary. We are adding some model metadata to this class_dict
, stuff like __tablename__
, __fields__
, and as we will add later, __indexes__
metadata can be added as well.
Fields mapping is very simple with PyMongo. We do not need to create a mapping between a engine specific syntax and the python types. We will define the Field
class later, but so far you only need to understand that the actual mapping is just taking a name defined by the user and using that name as the key or attribute name inside the database document.
The last part of the method adds all the model’s information to a single Metadata class that contains the full database metadata and also methods for initializationd and deletion.
The Database Metadata Class
The Metadata class contains metadata of our database, such as database name, the list of collections and their metadata, etc. Database initialization and deletion is also performed through this class if we provide a valid MongoClient
.
class Metadata: database = "DEFAULT_DATABASE" collections = {} def create_all(self, client: MongoClient, database: str = None): self.database = database or self.database db = client[self.database] for c in self.collections: collection = db[c] def drop_all(self, client: MongoClient, database: str = None): self.database = database or self.database client.drop_database(self.database)
The class’s code is pretty straight forward. It has a dictionary with all collections and their metadata, a createall and dropall methods to initialize and delete the database. The thing is that with PyMongo, collections are lazy loaded. This means that a collection object is not really persisted to the database until a document is inserted or an index is created. This means that so far, our code does nothing on the database. But what we want is to create some indexes declared on our models. The following sections cover the models definition and modify the current code to allow proper database initialization.
The Field Class
We are going to create only two simple field classes. In a regular ORM you would have to create fields for different types of database types, such as integer, float, string, spatial coordinates, etc… For this use case we don’t really need field type definition since we can actually store anything in a mongo document. So our field class can only map a custom name for the document:
class Field: def __init__(self, name): self.name = name def __repr__(self): return f"Field<name={self.name}>" def __str__(self): return f"Field<name={self.name}>" class SpecialField(Field): def __init__(self, name, callback): super(SpecialField, self).__init__(name) self.callback = callback
The SpecialField class can be used to perform operations on the actual model attributes. For example, we can store a field using other field’s value. To do that the user would have to code a callback function that takes a dictionary with the model fields and return the processed value that will be added as a new field.
The Model Class
from pymongo import IndexModel, TEXT class MyModel(Base): def my_function(**kwargs): return str(kwargs["my_field"]).upper() my_field = Field("my_field") my_special_field = SpecialField("special", my_function) __table_args__ = [IndexModel([("my_field", TEXT)])]
We previously defined a simple Field class and a SpecialField class that takes a function and receives the model arguments to perform some operation and create a new field based on that. In this case we defined a function which converts the my_field
field to upper case and store it in my_special_field
.
We are also defining a __table_args__
attribute to define other metadata supported by the BaseMetaClass.
IndexModel is a way of defining indexes provided by PyMongo
Adding Indexes Metadata
BaseMetaClass
class BaseMetaClass(type): """Metaclass for Base model class that creates mappings.""" metadata = Metadata() # this class will contain overall database metadata def __new__(cls, name, bases, attrs): """Create class with mappings and other metadata.""" if name == "Base": return type.__new__(cls, name, bases, attrs) fields = {} class_dict = { "__fields__": {}, "__tablename__": name, "__indexes__": [] } for k, v in attrs.items(): if k == "__tablename__": class_dict["__tablename__"] = v if isinstance(v, Field): class_dict["__fields__"][v.name] = v if k == "__table_args__": for table_arg in v: if isinstance(table_arg, IndexModel): class_dict["__indexes__"].append(table_arg) cls.metadata.collections[class_dict["__tablename__"]] = { "fields": class_dict["__fields__"], "indexes": class_dict["__indexes__"] } return type.__new__(cls, name, bases, class_dict)
We add a __indexes__
key to the classdict and append any IndexModel found in the Model’s definition.
Base class
class Base(dict, metaclass=BaseMetaClass): def __init__(self, *args, **kwargs): model = {} for field in self.__fields__: try: if isinstance(self.__fields__[field], SpecialField): model[field] = self.__fields__[field].callback(**kwargs) else: model[field] = kwargs[field] except KeyError: raise AttributeError( "{} model needs {} attribute".format( self.__class__.__name__, field ) ) super(Base, self).__init__(*args, **model) def __getattr__(self, key): try: return self[key] except KeyError: raise AttributeError( "{} has no attribute {}".format( self.__class__.__name__, key, ) ) def __setattr__(self, key, value): self[key] = value def __str__(self): return dict.__str__(self) def __repr__(self): return dict.__repr__(self)
We add the call to the SpecialField’s callback to create special fields that depend on other fields.
Updating create_all
Method
from pymongo import MongoClient class Metadata: database = "DEFAULT_DATABASE" collections = {} def create_all(self, client: MongoClient, database: str = None): self.database = database or self.database db = client[self.database] for c in self.collections: collection = db[c] collection.create_indexes(self.collections[c]["indexes"]) def drop_all(self, client: MongoClient, database: str = None): self.database = database or self.database client.drop_database(self.database)
In order to persist collections in the database we have to at least call an insert operation or the create_index~/~create_indexes
method. In the previous step we updated BaseMetaClass to add all indexes inside the collections
attribute of the Metadata class, so now we can easily call create_indexes
with the list of indexes stored in the collections
dict.
Utilization Example
Basic model instantiation looks like this: my_model = MyModel(my_field="this is my field") print(my_model) print(my_model.__indexes__) print(my_model.__fields__) print(my_model.__tablename__) print(Base.metadata.collections)
{'my_field': 'this is my field', 'special': 'THIS IS MY FIELD'} [<pymongo.operations.IndexModel object at 0x7f716005ef50>] {'my_field': Field<name=my_field>, 'special': Field<name=special>} MyModel {'MyModel': {'fields': {'my_field': Field<name=my_field>, 'special': Field<name=special>}, 'indexes': [<pymongo.operations.IndexModel object at 0x7f716005ef50>]}}
We can then easily use our models with the PyMongo client:
client = MongoClient() Base.metadata.create_all(client, "my_database") db = client["my_database"] collection = db["MyModel"] collection.insert_one(my_model)
Conclusion
An ORM tool can be a crucial part of your system if you want to have a unified interface for interacting with your database. In this article we managed to create a simple ORM layer for MongoDB using PyMongo.
Since an ORM in a MongoDB environment is not needed in most applications, we implemented a basic tool with just enough features that help us with database creation, deletion and model definitions.