Ask Your Question
0

How to load the training data to OpenCV from UCI?

asked 2018-06-12 13:05:41 -0600

wlq1988 gravatar image

updated 2018-06-12 13:26:10 -0600

I have a character/font dataset found in UCI repository:

https://archive.ics.uci.edu/ml/datase...

Take any CSV file as an example, for instance 'AGENCY.csv'. I am struggling to load it to the OpenCV using a c++ functions. It seems that the structure of the dataset is quite different from what normally assumed in function

cv::ml::TrainData::loadFromCSV

Any ideas to do it neatly or I need to pre-process the csv files directly?

EDIT:

The file contains 20x20 pixels font images. The first row is the header row. The first two lines (including the header) look like that:

font,fontVariant,m_label,strength,italic,orientation,m_top,m_left,originalH,originalW,h,w,r0c0,r0c1,r0c2,r0c3,r0c4,r0c5,r0c6,r0c7,r0c8,r0c9,r0c10,r0c11,r0c12,r0c13,r0c14,r0c15,r0c16,r0c17,r0c18,r0c19,r1c0,r1c1,r1c2,r1c3,r1c4,r1c5,r1c6,r1c7,r1c8,r1c9,r1c10,r1c11,r1c12,r1c13,r1c14,r1c15,r1c16,r1c17,r1c18,r1c19,r2c0,r2c1,r2c2,r2c3,r2c4,r2c5,r2c6,r2c7,r2c8,r2c9,r2c10,r2c11,r2c12,r2c13,r2c14,r2c15,r2c16,r2c17,r2c18,r2c19,r3c0,r3c1,r3c2,r3c3,r3c4,r3c5,r3c6,r3c7,r3c8,r3c9,r3c10,r3c11,r3c12,r3c13,r3c14,r3c15,r3c16,r3c17,r3c18,r3c19,r4c0,r4c1,r4c2,r4c3,r4c4,r4c5,r4c6,r4c7,r4c8,r4c9,r4c10,r4c11,r4c12,r4c13,r4c14,r4c15,r4c16,r4c17,r4c18,r4c19,r5c0,r5c1,r5c2,r5c3,r5c4,r5c5,r5c6,r5c7,r5c8,r5c9,r5c10,r5c11,r5c12,r5c13,r5c14,r5c15,r5c16,r5c17,r5c18,r5c19,r6c0,r6c1,r6c2,r6c3,r6c4,r6c5,r6c6,r6c7,r6c8,r6c9,r6c10,r6c11,r6c12,r6c13,r6c14,r6c15,r6c16,r6c17,r6c18,r6c19,r7c0,r7c1,r7c2,r7c3,r7c4,r7c5,r7c6,r7c7,r7c8,r7c9,r7c10,r7c11,r7c12,r7c13,r7c14,r7c15,r7c16,r7c17,r7c18,r7c19,r8c0,r8c1,r8c2,r8c3,r8c4,r8c5,r8c6,r8c7,r8c8,r8c9,r8c10,r8c11,r8c12,r8c13,r8c14,r8c15,r8c16,r8c17,r8c18,r8c19,r9c0,r9c1,r9c2,r9c3,r9c4,r9c5,r9c6,r9c7,r9c8,r9c9,r9c10,r9c11,r9c12,r9c13,r9c14,r9c15,r9c16,r9c17,r9c18,r9c19,r10c0,r10c1,r10c2,r10c3,r10c4,r10c5,r10c6,r10c7,r10c8,r10c9,r10c10,r10c11,r10c12,r10c13,r10c14,r10c15,r10c16,r10c17,r10c18,r10c19,r11c0,r11c1,r11c2,r11c3,r11c4,r11c5,r11c6,r11c7,r11c8,r11c9,r11c10,r11c11,r11c12,r11c13,r11c14,r11c15,r11c16,r11c17,r11c18,r11c19,r12c0,r12c1,r12c2,r12c3,r12c4,r12c5,r12c6,r12c7,r12c8,r12c9,r12c10,r12c11,r12c12,r12c13,r12c14,r12c15,r12c16,r12c17,r12c18,r12c19,r13c0,r13c1,r13c2,r13c3,r13c4,r13c5,r13c6,r13c7,r13c8,r13c9,r13c10,r13c11,r13c12,r13c13,r13c14,r13c15,r13c16,r13c17,r13c18,r13c19,r14c0,r14c1,r14c2,r14c3,r14c4,r14c5,r14c6,r14c7,r14c8,r14c9,r14c10,r14c11,r14c12,r14c13,r14c14,r14c15,r14c16,r14c17,r14c18,r14c19,r15c0,r15c1,r15c2,r15c3,r15c4,r15c5,r15c6,r15c7,r15c8,r15c9,r15c10,r15c11,r15c12,r15c13,r15c14,r15c15,r15c16,r15c17,r15c18,r15c19,r16c0,r16c1,r16c2,r16c3,r16c4,r16c5,r16c6,r16c7,r16c8,r16c9,r16c10,r16c11,r16c12,r16c13,r16c14,r16c15,r16c16,r16c17,r16c18,r16c19,r17c0,r17c1,r17c2,r17c3,r17c4,r17c5,r17c6,r17c7,r17c8,r17c9,r17c10,r17c11,r17c12,r17c13,r17c14,r17c15,r17c16,r17c17,r17c18,r17c19,r18c0,r18c1,r18c2,r18c3,r18c4,r18c5,r18c6,r18c7,r18c8,r18c9,r18c10,r18c11,r18c12,r18c13,r18c14,r18c15,r18c16,r18c17,r18c18,r18c19,r19c0,r19c1,r19c2,r19c3,r19c4,r19c5 ...
(more)
edit retag flag offensive close merge delete

Comments

it would be helpful, if you could add the first 2 lines of the csv to your question,so we don't have to dl the full dataset for analysis

please also add, how you're trying to invoke TrainData::loadFromCSV()

berak gravatar imageberak ( 2018-06-12 13:09:11 -0600 )edit
1

That is the thing. It seems that TrainData::loadFromCSV() only allows to specify the header line. But I don't see how I can get the labels and images from such a complicated CSV using this function.

wlq1988 gravatar imagewlq1988 ( 2018-06-12 13:18:03 -0600 )edit

can you try again with better formatting ? (it's really important in this case)

  • paste 2 lines
  • mark them with your mouse
  • press ctrl-k

also, maybe we need some hints about: what are you trying to achieve ? what is the purpose of reading this ? machine-learning ? what are you trying to do with that data ?

berak gravatar imageberak ( 2018-06-12 13:18:43 -0600 )edit
1

I added the column descriptions. Basically the labels are in the column m_label, and the images are stored as horizontal vector in the columns r0c0 through r19c19.

wlq1988 gravatar imagewlq1988 ( 2018-06-12 13:25:51 -0600 )edit

again, which data do you need from there ? pixels & label only ?

berak gravatar imageberak ( 2018-06-13 01:13:21 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-06-13 02:33:58 -0600

berak gravatar image

updated 2018-06-13 02:55:32 -0600

there are a couple of problems with TrainData::loadFromCSV here:

  • it does not handle multi-word strings correctly ("AGENCY FB")
  • you cannot selectively choose columns (or throw away unwanted ones)

given, you manage to replace AGENCY FB to AGENCY_FB (or similar items) globally, you could use it like:

Ptr<ml::TrainData> td = ml::TrainData::loadFromCSV("uci.csv",
    1, // 1 header line
    2, // m_label is 3rd item
    -1 // only one label
);

Mat pixels = td->getSamples();
Mat labels = td->getResponses();
cout << pixels.size() << " " << pixels.type() << endl;
cout << labels.size() << " " << labels.type() << endl;

cout << labels << endl;
// get rid of the 1st 11 columns (0,1, 3,4,5,6,7,8,9,10,11)
cout << pixels(Range::all(), Range(11,411)) << endl;


[411 x 2] 5
[1 x 2] 5
[64258;
 64257]
[1, 1, 1, ... 255, 255, 255;
 1, 1, 1, ... 255, 255, 255]
edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2018-06-12 13:05:41 -0600

Seen: 417 times

Last updated: Jun 13 '18