Extracting image features using PySpark-OpenCV on RDD data (i.e. SequeceFile is in Hadoop)

asked 2015-12-06 23:23:20 -0600

Let me explain my experiment. I have convert the image files to sequencefile(SequenceWritable). Using java i.e from local drive to hadoop(HDFS) file. And trying to read this sequencefile from hadoop using pySpark. Here I am able to load the data in RDD.

If trying to use this RDD with OpenCV function could not able to compile. I need help on this.

code eg:

import cv2 import numpy as np imageRdd = sc.sequenceFile("/user/GR5017759/Retinopathy/OutputSeq") R = cv2.imdecode(np.asarray(bytearray(imageRDD), dtype=np.uint8)

===================================

error:

TypeError: 'RDD' object is not iterable If you have any idea on this please help me.

edit retag flag offensive close merge delete

Comments

Wow o_o PySpark-OpenCV is not officially supported so people will not be able to help you out here ...

StevenPuttemans gravatar imageStevenPuttemans ( 2015-12-08 04:21:55 -0600 )edit