Ask Your Question

Revision history [back]

cv2.split(img)[3] fails on RGB[A?]

S'up OpenCVers.

Although the title of the question (my first!) is quite sparse, the question itself requires some reasonably detailed background... so here goes.

I'm working with reasonably large (max size 131GB; usual size ~11GB) ECW aerial images. The images have spatial metadata that includes the co-ordinate system, and the coordinates of the top-left and bottom-right corners and centre, plus details on the colour bands. There is always an Alpha band (which is useful later).

The process uses gdal.Warp() to cut each big image into hundreds of thousands of small images; each cutline is the bounding box of the boundary of an individual property. The boundaries themselves are stored in a PostgreSQL spatial database with a unique identifier (so I don't need to retain the geographical metadata in my output images: I can append them later).

As you might imagine, not all property boundaries are rectangular - and of those that are rectangular, a lot of them are not oriented 'north-south'. All bounding boxes are rectangular and oriented N-S.

What this means is that the amount of NODATA in each cut image varies with the degree of tilt of the property boundary away from 'vertical'. Generalising this to non-rectangular images, it varies with the difference between the minimum bounding rectangle of the geometry, and the bounding box.

The aim is to get a set of images with minimal NODATA, in order to run them against an image classifier.

So the next stage is to take each of the output 'cut' images, and orient them 'as vertically as possible'.

The second stage uses

  • cv2.split() to extract the alpha layer,
  • cv2.findContours() to find the outline of the alpha layer
  • cv2.minAreaRect() to get the minimum bounding rectangle (MBR) of the outline,

cv2.minAreaRect()'s output includes the orientation of the MBR (sweet!), so all that's left to do is snip the original image by the MBR, rotate the clipped output by MBR[2], and save the image.

The process is two scripts:

  1. cuts up the large image and stores the cut images;
  2. re-orients the images so that they are facing due[ish] North.

It's two scripts because the job is separable: the cut-up images are produced faster than the 'rotating' happens, and so it makes sense to have one script pumping out 'cut' images and the second script following behind 'rotating' the results (as opposed to doing the cut-and-rotate process to each image).

I've done this same task a dozen times (I wrote it to begin with, before there was a Python API for gdal - which meant 3 million os or subprocess calls to gdapwarp and gdal_translate) and it's always gone without a hitch.

Today the 'cut' process worked perfectly, but the 'rotate' failed at the cv2.split(), saying that the index '3' was out of range (think of it as a direct reference to cv2.split()[3]: the alpha layer in an RGBA image).

When I look at the images in QGIS (a spatial desktop application), it 'sees' the layers - and gives a value for each. The 4th layer is there, and its value is exactly right - '0' outside the property boundary; '255' inside.

When I interrogate the images (using gdalinfo) the metadata for the layers is there, and includes Alpha.

However there is a 'quirk' that I have not seen before: the COLORSPACE variable is 'RGB' and each layer has a Mask Flags: PER_DATASET ALPHA. (Here's the metadata, excluding the PROJCS, which is not part of the problem)

Metadata:
  COLORSPACE=RGB
  COMPRESSION_RATE_TARGET=9
  VERSION=2
Corner Coordinates:
Upper Left  (  320000.000, 5820000.000) (144d57'24.70"E, 37d44'58.61"S)
Lower Left  (  320000.000, 5800000.000) (144d57' 6.78"E, 37d55'47.14"S)
Upper Right (  340000.000, 5820000.000) (145d11' 1.55"E, 37d45'11.99"S)
Lower Right (  340000.000, 5800000.000) (145d10'45.62"E, 37d56' 0.61"S)
Center      (  330000.000, 5810000.000) (145d 4' 4.66"E, 37d50'29.79"S)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
  Description = Red
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
  Description = Green
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue
  Description = Blue
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 4 Block=256x256 Type=Byte, ColorInterp=Alpha
  Description = AllOpacity
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195

I've seen the 'RGBA image reports that it's RGB' before, and usually it can be remedied by using gdaltranslate with --expand RGBA, but this does not work: it adds a 5th layer, and the 5th layer is entirely NODATA.

So finally the question[s]...

Has anyone seen anything similar?

Is there some mechanism (preferably in cv, in Python) by which I can expose that Alpha layer?

I thought about trying to treat each layer as if it's a numpy array, and seeing if I could linear-algebra my way around the problem: I'll probably make a start on that later this morning for this particular iteration of the process.

If this is something that needs specific handling every time, then I can add the fix to the checks that are a normal part of our data curation.

But if there's an easier fix that enables images with this odd Mask Flag: PER_DATASET ALPHA to be processed without pre-treatment, so much the better. Fewer moving parts is always better, I find.

cv2.split(img)[3] fails on RGB[A?]

S'up OpenCVers.

Although the title of the question (my first!) is quite sparse, the question itself requires some reasonably detailed background... so here goes.

I'm working with reasonably large (max size 131GB; usual size ~11GB) ECW aerial images. The images have spatial metadata that includes the co-ordinate system, and the coordinates of the top-left and bottom-right corners and centre, plus details on the colour bands. There is always an Alpha band (which is useful later).

The process uses gdal.Warp() to cut each big image into hundreds of thousands of small images; each cutline is the bounding box of the boundary of an individual property. The boundaries themselves are stored in a PostgreSQL spatial database with a unique identifier (so I don't need to retain the geographical metadata in my output images: I can append them later).

As you might imagine, not all property boundaries are rectangular - and of those that are rectangular, a lot of them are not oriented 'north-south'. All bounding boxes are rectangular and oriented N-S.

What this means is that the amount of NODATA in each cut image varies with the degree of tilt of the property boundary away from 'vertical'. Generalising this to non-rectangular images, it varies with the difference between the minimum bounding rectangle of the geometry, and the bounding box.

The aim is to get a set of images with minimal NODATA, in order to run them against an image classifier.

So the next stage is to take each of the output 'cut' images, and orient them 'as vertically as possible'.

The second stage uses

  • cv2.split() to extract the alpha layer,
  • cv2.findContours() to find the outline of the alpha layer
  • cv2.minAreaRect() to get the minimum bounding rectangle (MBR) of the outline,

cv2.minAreaRect()'s output includes the orientation of the MBR (sweet!), so all that's left to do is snip the original image by the MBR, rotate the clipped output by MBR[2], and save the image.

The process is two scripts:

  1. cuts up the large image and stores the cut images;
  2. re-orients the images so that they are facing due[ish] North.

It's two scripts because the job is separable: the cut-up images are produced faster than the 'rotating' happens, and so it makes sense to have one script pumping out 'cut' images and the second script following behind 'rotating' the results (as opposed to doing the cut-and-rotate process to each image).

I've done this same task a dozen times (I wrote it to begin with, before there was a Python API for gdal - which meant 3 million os or subprocess calls to gdapwarp and gdal_translate) and it's always gone without a hitch.

Today the 'cut' process worked perfectly, but the 'rotate' failed at the cv2.split(), saying that the index '3' was out of range (think of it as a direct reference to cv2.split()[3]: the alpha layer in an RGBA image).

When I look at the images in QGIS (a spatial desktop application), it 'sees' the layers - and gives a value for each. The 4th layer is there, and its value is exactly right - '0' outside the property boundary; '255' inside.

When I interrogate the images (using gdalinfo) the metadata for the layers is there, and includes Alpha.

However there is a 'quirk' that I have not seen before: the COLORSPACE variable is 'RGB' and each layer has a Mask Flags: PER_DATASET ALPHA. (Here's the metadata, excluding the PROJCS, which is not part of the problem)

Metadata:
  COLORSPACE=RGB
  COMPRESSION_RATE_TARGET=9
  VERSION=2
Corner Coordinates:
Upper Left  (  320000.000, 5820000.000) (144d57'24.70"E, 37d44'58.61"S)
Lower Left  (  320000.000, 5800000.000) (144d57' 6.78"E, 37d55'47.14"S)
Upper Right (  340000.000, 5820000.000) (145d11' 1.55"E, 37d45'11.99"S)
Lower Right (  340000.000, 5800000.000) (145d10'45.62"E, 37d56' 0.61"S)
Center      (  330000.000, 5810000.000) (145d 4' 4.66"E, 37d50'29.79"S)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
  Description = Red
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
  Description = Green
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue
  Description = Blue
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195
Band 4 Block=256x256 Type=Byte, ColorInterp=Alpha
  Description = AllOpacity
  Overviews: 100000x100000, 50000x50000, 25000x25000, 12500x12500, 6250x6250, 3125x3125, 1562x1562, 781x781, 390x390, 195x195

I've seen the 'RGBA image reports that it's RGB' before, and usually it can be remedied by using gdaltranslate with --expand RGBA, but this does not work: it adds a 5th layer, and the 5th layer is entirely NODATA.

So finally the question[s]...

Has anyone seen anything similar?

Is there some mechanism (preferably in cv, in Python) by which I can expose that Alpha layer?

I thought about trying to treat each layer as if it's a numpy array, and seeing if I could linear-algebra my way around the problem: I'll probably make a start on that later this morning for this particular iteration of the process.

If this is something that needs specific handling every time, then I can add the fix to the checks that are a normal part of our data curation.

But if there's an easier fix that enables images with this odd Mask Flag: PER_DATASET ALPHA to be processed without pre-treatment, so much the better. Fewer moving parts is always better, I find.