Python lightweight serialization and deserialization package marshmallow detailed usage guide 2

marshmallow official site

define a test class

import datetime as dt

class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()

1, Scheme

To convert a class or a json data to each other (that is, serialize and deserialize), an intermediate carrier is needed, which is Schema, and Schema can also be used for data verification.

# This is a simple Scheme
from marshmallow import Schema, fields


class UserSchema(Schema):
    name = fields. String()
    email = fields. Email()
    created_time = fields.DateTime()

2, Serializing (serialization)

Use the dump() method of scheme to serialize the object and return the data in dict format
In addition, the dumps() method of the schema serializes the object and returns a string in json-encoded format.

user = User("lhh","[email protected]")
schema = UserSchema()
res = schema. dump(user)
print(res)
# {'email': '[email protected]', 'created_time': '2021-05-28 20:43:08.946112', 'name': 'lhh'} dict

res2 = schema. dumps(user)
print(res2)
# {"name": "lhh", "email": "[email protected]", "created_time": "2021-05-28 20:45:17.418739"} json

3. Filter output

When you do not need to output all the fields, you can declare the only parameter when instantiating Scheme to specify the output:

summary_schema = UserSchema(only={<!-- -->"name","email"})
res = summary_schema. dump(user)
print(res)

4. Deserializing

The load() method of the schema is the opposite of the dump() method, and is used for deserialization of the dict type. It converts the input dictionary format data into the application layer data structure. He can also play a role in validating the input dictionary format data.
Similarly, there is also a loads() method for decoding json. Used for deserialization of string type.
By default, the load() method returns a dictionary and throws a ValidationError exception when the value of the input data does not match the field type.

user_data = {<!-- -->
    "name": "lhh",
    "email": "[email protected]",
    "created_time": "2021-05-28 20:45:17.418739"
}
schema = UserSchema()
res = schema. load(user_data)
print(res)
# {'created_time': '2021-05-28 20:45:17.418739', 'email': '[email protected]', 'name': 'lhh'}

For deserialization, it makes more sense to turn the incoming dict into an object. In Marshmallow, the dict -> object method needs to be implemented by itself, and then a decorator post_load can be added in front of the method

class UserSchema(Schema):
    name = fields. String()
    email = fields. Email()
    created_time = fields.DateTime()

    @post_load
    def make_user(self, data):
        return User(**data)

In this way, each time the load() method is called, a User class object will be returned according to the logic of make_user.

user_data = {<!-- -->
    "name": "lhh",
    "email": "[email protected]"
}

schema = UserSchema()
res = schema. load(user_data)
print(res)
# <__main__. User object at 0x0000027BE9678128>
user = res
print("name: {} email: {}". format(user. name, user. email))
# name: lhh email: [email protected]

5. Handling collections of multiple objects

If a collection of multiple objects is iterable, you can also directly serialize or deserialize this collection. Set the parameter many=True when instantiating the Scheme class

You can also pass in this parameter when calling the dump() method instead of setting it when instantiating the class.

user1 = User(name="lhh1", email="[email protected]")
user2 = User(name="lhh2", email="[email protected]")
users = [user1, user2]

# the first method
schema = UserSchema(many=True)
res = schema. dump(users)
print(res)

# The second method
schema = UserSchema()
res = schema. dump(users,many=True)
print(res)

6. Validation

When invalid data is passed through Schema.load() or Schema.loads(), a ValidationError exception will be thrown. The ValidationError.messages attribute has validation error messages, and the data that passes the validation is in the ValidationError.valid_data attribute
We catch this exception and then do exception handling. First you need to import the exception ValidationError

from marshmallow import Schema,fields,ValidationError


class UserSchema(Schema):
    name = fields. String()
    email = fields. Email()
    created_time = fields.DateTime()

try:
    res = UserSchema().load({<!-- -->"name":"lhh","email":"lhh"})

except ValidationError as e:
    print(f"Error message: {<!-- -->e.messages} Valid data: {<!-- -->e.valid_data}")

'''
    When verifying a data set, the returned error message will be stored in errors in the form of a key-value pair of error number-error message
'''
user_data = [
    {<!-- -->'email': '[email protected]', 'name': 'lhh'},
    {<!-- -->'email': 'invalid', 'name': 'Invalid'},
    {<!-- -->'name': 'wcy'},
    {<!-- -->'email': '[email protected]'},
]


try:
    schema = UserSchema(many=True)
    res = schema. load(user_data)
    print(res)
except ValidationError as e:
    print("Error message: {} Valid data: {}".format(e.messages, e.valid_data))

As you can see above, there is an error message, but there is no check for the attributes that are not passed in, that is to say, there is no requirement that the attributes must be passed in.

Specify non-default fields in the Schema: set the parameter required=True

As you can see above, there is an error message, but there is no check for the attributes that are not passed in, that is to say, there is no requirement that the attributes must be passed in.
Specify non-default fields in the Schema: set the parameter required=True

6.1 Custom authentication information

When writing a Schema class, you can set the value of the validate parameter to the built-in fields to customize the validation logic. The value of validate can be a function, an anonymous function lambda, or an object that defines __call__.

from marshmallow import Schema,fields,ValidationError


class UserSchema(Schema):
    name = fields. String(required=True, validate=lambda s:len(s) < 6)
    email = fields. Email()
    created_time = fields.DateTime()
        
user_data = {<!-- -->"name":"InvalidName","email":"[email protected]"}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e. messages)

Customize exception information in the validation function:

#encoding=utf-8
from marshmallow import Schema, fields, ValidationError

def validate_name(name):
    if len(name) <=2:
        raise ValidationError("name length must be greater than 2 digits")
    if len(name) >= 6:
        raise ValidationError("name length cannot be greater than 6 characters")




class UserSchema(Schema):
    name = fields. String(required=True, validate=validate_name)
    email = fields. Email()
    created_time = fields.DateTime()

user_data = {<!-- -->"name":"InvalidName","email":"[email protected]"}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e. messages)

NOTE: Validation only happens during deserialization! It will not be validated during serialization!

6.2 Write the verification function in the Schema to become a verification method

In Schema, the validation method can be registered using the validates decorator.

#encoding=utf-8
from marshmallow import Schema, fields, ValidationError, validates


class UserSchema(Schema):
    name = fields. String(required=True)
    email = fields. Email()
    created_time = fields.DateTime()

    @validates("name")
    def validate_name(self, value):
        if len(value) <= 2:
            raise ValidationError("name length must be greater than 2 digits")
        if len(value) >= 6:
            raise ValidationError("name length cannot be greater than 6 characters")


user_data = {<!-- -->"name":"InvalidName","email":"[email protected]"}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e. messages)

6.3 Required Fields (required options)

Custom required exception information:

First of all, we can customize the exception message thrown when the field is missing when require=True: set the value of the parameter error_messages

#encoding=utf-8
from marshmallow import Schema, fields, ValidationError, validates


class UserSchema(Schema):
    name = fields.String(required=True, error_messages={<!-- -->"required":"The name field is required"})
    email = fields. Email()
    created_time = fields.DateTime()

    @validates("name")
    def validate_name(self, value):
        if len(value) <= 2:
            raise ValidationError("name length must be greater than 2 digits")
        if len(value) >= 6:
            raise ValidationError("name length cannot be greater than 6 characters")


user_data = {<!-- -->"email":"[email protected]"}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e. messages)

Ignore some fields:

After using required, we can still ignore this required field when passing in data.

#encoding=utf-8
from marshmallow import Schema, fields, ValidationError, validates


class UserSchema(Schema):
    name = fields. String(required=True)
    age = fields. Integer(required=True)

# Method 1: Set the value (tuple) of the partial parameter in the load() method, and ignore those fields in the table.
schema = UserSchema()
res = schema.load({<!-- -->"age": 42}, partial=("name",))
print(res)
# {'age': 42}

# Method 2: Set partial=True directly
schema = UserSchema()
res = schema.load({<!-- -->"age": 42}, partial=True)
print(res)
# {'age': 42}

It seems that the two methods are the same, but there is a difference between method 1 and method 2: method 1 ignores the fields passed in partial, method 2 ignores all fields except the existing fields in the previously passed data

6.4 Handling of unknown fields

By default, if an unknown field (a field that is not in the Schema) is passed in, executing the load() method will throw a ValidationError exception. This behavior can be modified by changing the unknown option.

unknown has three values:

  • EXCLUDE: exclude unknown fields (throw away unknown fields directly)
  • INCLUDE: accept and include the unknown fields (accept unknown fields)
  • RAISE: raise a ValidationError if there are any unknown fields (throws an exception)

We can see that the default behavior is RAISE. There are two ways to change:

Method 1: Modify in class Meta when writing Schema class

from marshmallow import EXCLUDE,Schema,fields

class UserSchema(Schema):
    name = fields.String(required=True,error_messages={<!-- -->"required": "The name field must be filled in"})
    email = fields. Email()
    created_time = fields.DateTime()


    classMeta:
        unknown = EXCLUDE
        

Method 2: Set the value of the parameter unknown when instantiating the Schema class

class UserSchema(Schema):
    name = fields.Str(required=True, error_messages={<!-- -->"required": "The name field must be filled in"})
    email = fields. Email()
    created_time = fields.DateTime()

shema = UserSchema(unknown=EXCLUDE)

7, Schema.validate (validation data)

If you just want to use Schema to validate data without deserializing to generate objects, you can use Schema.validate()
As you can see, the schema.validate() will automatically verify the data. If there is an error, it will return the dict of the error message. If there is no error, it will return an empty dict. Through the returned data, we can confirm whether the verification is passed. .

#encoding=utf-8
from marshmallow import Schema, fields, ValidationError

class UserSchema(Schema):
    name = fields.Str(required=True, error_messages={<!-- -->"required": "The name field must be filled in"})
    email = fields. Email()
    created_time = fields.DateTime()

user = {<!-- -->"name":"lhh","email":"2432783449"}
schema = UserSchema()
res = schema. validate(user)
print(res) # {'email': ['Not a valid email address.']}

user = {<!-- -->"name":"lhh","email":"[email protected]"}
schema = UserSchema()
res = schema. validate(user)
print(res) # {}

8. Specifying Serialization/Deserialization Keys (specifying serialization/deserialization keys)

data_key satisfies both serialization and deserialization methods

from marshmallow import fields,Schema,ValidationError
import datetime as dt


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()


class UserSchema(Schema):
    name = fields.Str(data_key="name_123")
    email = fields.Email(data_key="email_123")
    created_time = fields.DateTime()

user = User("lhh",email="[email protected]")
user = {<!-- -->"name": "lhh", "email": "[email protected]"}
schema = UserSchema()
res = schema. dump(user)
print(res)
# {'email_123': '[email protected]', 'name_123': 'lhh'}

user = {<!-- -->"name_123": "lhh", "email_123": "[email protected]"}
schema = UserSchema()
res = schema. load(user)
print(res)
# {'email': '[email protected]', 'name': 'lhh'}

9. Refactoring: Create an implicit field

When a Schema has many attributes, specifying field types for each attribute can be repetitive, especially when many attributes are already native Python data types. class Meta allows specifying the properties to be serialized, and marshmallow will choose the appropriate field type based on the type of the property.

# Refactor the Schema
class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    classMeta:
        fields = ("name", "email", "created_at", "uppername")

In the above code, name will be automatically formatted as String type, and created_at will be formatted as DateTime type.

Additional options are available if you wish to specify which field names are included in addition to those explicitly declared. as follows:

class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    classMeta:
        # No need to include 'uppername'
        additional = ("name", "email", "created_at")

10. Sort

For some use cases, it may be useful to maintain the field order of serialized output. To enable ordering, set the ordered option to true. This will instruct marshmallow to serialize the data into collections.OrderedDict

from collections import OrderedDict
import datetime as dt
from marshmallow import fields, ValidationError, Schema

class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()

class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    classMeta:
        fields = ("name", "email", "created_time", "uppername")
        ordered = True


user = User("lhh", "[email protected]")
schema = UserSchema()
res = schema. dump(user)
print(isinstance(res,OrderedDict)) # determine variable type
# True
print(res)
# OrderedDict([('name', 'lhh'), ('email', '[email protected]'), ('created_time', '2021-05-29T09:40:46.351382'), ('uppername' , 'LHH')])

11. “Read-only” and “Write-only” fields

In the context of Web API, the serialization parameter dump_only and the deserialization parameter load_only are conceptually equivalent to read-only and write-only fields, respectively.

from marshmallow import Schema,fields


class UserSchema(Schema):
    name = fields. Str()
    password = fields.Str(load_only=True) # equal to write only
    created_at = fields.DateTime(dump_only=True) # equal to read-only

When loading, dump_only fields are treated as unknown fields. If the unknown option is set to include, the values for keys corresponding to these fields will therefore be loaded without validation.

12. The default value of the specified field when serializing/deserializing

If the input value is missing during serialization, use default to specify the default value. If the input value is missing during deserialization, use missing to specify the default value.

#encoding=utf-8
import uuid
import datetime as dt
from marshmallow import fields, ValidationError, Schema


class UserSchema(Schema):
    id = fields.UUID(missing=uuid.uuid1)
    birthday = fields.DateTime(default=dt.datetime(1996,11,17))

# Serialization
res = UserSchema(). dump({<!-- -->})
print(res)
# {'birthday': '1996-11-17T00:00:00'}

# deserialize
res = UserSchema().load({<!-- -->'birthday': '1996-11-17T00:00:00'})
print(res)
# {'id': UUID('751d95db-c020-11eb-83eb-001a7dda7115'), 'birthday': datetime.datetime(1996, 11, 17, 0, 0)}