FanduTech – Data Science Product Development and Consulting:

Machine Learning, Predictive Analytics, Big Data

Unpickling issue in multi-module Python project

Views 1137 | Likes0 | Dislikes 0

Description:

Pickle is a module in Python that is used for serializing and de-serializing a Python object structure. Pickling(and unpickling) is similar to 'Serialization', 'Marshalling', or 'flattening' of objects. It is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Not all python types can be pickled. Here is a list of types that can be pickled. Sometimes issues arises while unpickling a binary stream due to this restriction.

For example, functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. What this means is that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. Similarly for classes defined inside a module. Also, when you pickle in one module and then try to unpickle from another module (or through first module), you get errors like "AttributeError: Can't get attribute 'xxxx' on ...".This is again due to namespace difference issue.

Let's take an example.
test1.py: This module pickles the data and has a method to unpickle.

    import pickle

    class Test(object):
    def __init__(self, data):
    self.data = data

    def save():
    test = Test('test')
    with open('test_data.pkl', 'wb') as f:
    pickle.dump(test, f, pickle.HIGHEST_PROTOCOL)

    def retrieve():
    with open('test_data.pkl', 'rb') as f:
    s = pickle.load(f)

    if __name__=='__main__':
    save()
						
					

 

test2.py: This module invokes the retrieve function in Test1 module to unpickle the data.

    import test1
    #from test1 import Test

    if __name__=='__main__':
    test1.retrieve()
							
						

 

This will give "AttributeError: Can't get attribute...". The reason is, the Test class in test1 module is pickled as __main__.Test, but when it is unpickled from Test2 module (even through Test1) the "__main__" is referenced to Test2. So even if test1 is imported in test2, it does not find __main__.Test in test2, it finds test1.Test. That is the reason it gives error.

The solution is to either, make the class or the function available within the namespace of the top-level module through explicit import,like this

import test1
from test1 import Test

if __name__=='__main__':
test1.retrieve()

or, pickle the data within the namespace of the same top-level module as the one where you unpickle the data, like this.

test1.py:

    import pickle

    class Test(object):
    def __init__(self, data):
    self.data = data

    def save():
    test = Test('test')
    with open('test_data.pkl', 'wb') as f:
    pickle.dump(test, f, pickle.HIGHEST_PROTOCOL)

    def retrieve():
    with open('test_data.pkl', 'rb') as f:
    s = pickle.load(f)
							
						

 

test2.py:

    import test1
    #from test1 import Test

    if __name__=='__main__':
    test1.save()
    test1.retrieve()
							
						

Login to like or dislike

Comments


Login to add a new comment


Recent Blogs
  • Sept. 2, 2020
    Views 1210 | Likes0 | Dislikes 0

    How to leverage Data Science in Retail Industry

  • Sept. 1, 2018
    Views 1637 | Likes0 | Dislikes 0

    Building AWS Data Pipeline for cross-account resources

  • March 30, 2017
    Views 1259 | Likes1 | Dislikes 0

    Text Classification with Deep Learning in Keras

  • Jan. 18, 2017
    Views 1137 | Likes0 | Dislikes 0

    Unpickling issue in multi-module Python project

  • Aug. 13, 2015
    Views 959 | Likes0 | Dislikes 0

    Is Big Data Just a Fad?