Extracting image features using PySpark-OpenCV on RDD data (i.e. SequeceFile is in Hadoop)
Let me explain my experiment. I have convert the image files to sequencefile(SequenceWritable). Using java i.e from local drive to hadoop(HDFS) file. And trying to read this sequencefile from hadoop using pySpark. Here I am able to load the data in RDD.
If trying to use this RDD with OpenCV function could not able to compile. I need help on this.
code eg:
import cv2 import numpy as np imageRdd = sc.sequenceFile("/user/GR5017759/Retinopathy/OutputSeq") R = cv2.imdecode(np.asarray(bytearray(imageRDD), dtype=np.uint8)
===================================
error:
TypeError: 'RDD' object is not iterable If you have any idea on this please help me.
Wow o_o PySpark-OpenCV is not officially supported so people will not be able to help you out here ...