Machine Learning, Predictive Analytics, Big Data
Post date Jan. 18, 2017
Author: gopalsharma2001
1137 | 0 | 0
Description:
Pickle is a module in Python that is used for serializing and de-serializing a Python object structure. Pickling(and unpickling) is similar to 'Serialization', 'Marshalling', or 'flattening' of objects. It is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Not all python types can be pickled. Here is a list of types that can be pickled. Sometimes issues arises while unpickling a binary stream due to this restriction.
For example, functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. What this means is that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. Similarly for classes defined inside a module. Also, when you pickle in one module and then try to unpickle from another module (or through first module), you get errors like "AttributeError: Can't get attribute 'xxxx' on ...".This is again due to namespace difference issue.
Let's take an example.
test1.py: This module pickles the data and has a method to unpickle.
import pickle class Test(object): def __init__(self, data): self.data = data def save(): test = Test('test') with open('test_data.pkl', 'wb') as f: pickle.dump(test, f, pickle.HIGHEST_PROTOCOL) def retrieve(): with open('test_data.pkl', 'rb') as f: s = pickle.load(f) if __name__=='__main__': save()
test2.py: This module invokes the retrieve function in Test1 module to unpickle the data.
import test1 #from test1 import Test if __name__=='__main__': test1.retrieve()
This will give "AttributeError: Can't get attribute...". The reason is, the Test class in test1 module is pickled as __main__.Test, but when it is unpickled from Test2 module (even through Test1) the "__main__" is referenced to Test2. So even if test1 is imported in test2, it does not find __main__.Test in test2, it finds test1.Test. That is the reason it gives error.
The solution is to either, make the class or the function available within the namespace of the top-level module through explicit import,like this
import test1
from test1 import Testif __name__=='__main__':
test1.retrieve()
or, pickle the data within the namespace of the same top-level module as the one where you unpickle the data, like this.
test1.py:
import pickle class Test(object): def __init__(self, data): self.data = data def save(): test = Test('test') with open('test_data.pkl', 'wb') as f: pickle.dump(test, f, pickle.HIGHEST_PROTOCOL) def retrieve(): with open('test_data.pkl', 'rb') as f: s = pickle.load(f)
test2.py:
import test1 #from test1 import Test if __name__=='__main__': test1.save() test1.retrieve()
How to leverage Data Science in Retail Industry
Building AWS Data Pipeline for cross-account resources
Text Classification with Deep Learning in Keras
Unpickling issue in multi-module Python project
Is Big Data Just a Fad?