Using a custom splitter in hadoop job

Monday, July 7, 2014

I have some huge files, of the format



Heading1
Data1.1
Data1.2
.
.
Data1.N
Heading2
Data2.1
Data2.2
.
.
Data2.N
.
.
HeadingN


I am trying to write a such that HeadingM and DataM.N are processed as a single record. Problem is no. of data records inside a single heading can be huge, how can i write a custom splitter, such that each split contains a section of data records and the split starts with HeadingM, such that in the map, 1st record in the map() is the heading & the rest are related data records.







http://ift.tt/TNjRDj