Tuesday, November 15, 2016

Loading Data into Hbase from HDFS(MapRFS-mfs)


Step 1: HDFS file data

[root@localhost mapr]# hadoop fs -cat //user/mapr/sample.txt
100,sachin,Mum,India,Cricket
200,Samantha,Chennai,india,Actress
300,Modi,Ahd,Ind,politician

Step 2: Creating table in HBase with rowkey

hbase(main):006:0> create 'blkimport','cf'
0 row(s) in 1.2990 seconds

=> Hbase::Table - blkimport
hbase(main):007:0> list
blkimport


hbase(main):010:0> scan 'blkimport'
ROW                                   COLUMN+CELL
0 row(s) in 0.0180 seconds

Step 3: importing data from hdfs to HBase using ImportTsv

[mapr@localhost hbase-1.1.1]$ /opt/mapr/hbase/hbase-1.1.1/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns='HBASE_ROW_KEY,cf:pname,cf:loc,cf:country,cf:prefession'   blkimport /user/mapr/sample.txt

Here /user/mapr/sample.txt--> HDFS file
         blkimport --------------> HBase table
         cf------------------------> ColumnFamily
        HBASE_ROW_KEY->  RowKey
   pname,loc,country,prefession--> Column Names

Step 4: Scan table HBase

hbase(main):011:0> scan 'blkimport'
ROW                                   COLUMN+CELL
 100                                  column=cf:country, timestamp=1477509673574, value=India
 100                                  column=cf:loc, timestamp=1477509673574, value=Mum
 100                                  column=cf:pname, timestamp=1477509673574, value=sachin
 100                                  column=cf:prefession, timestamp=1477509673574, value=Cricket
 200                                  column=cf:country, timestamp=1477509673574, value=india
 200                                  column=cf:loc, timestamp=1477509673574, value=Chennai
 200                                  column=cf:pname, timestamp=1477509673574, value=Samantha
 200                                  column=cf:prefession, timestamp=1477509673574, value=Actress
 300                                  column=cf:country, timestamp=1477509673574, value=Ind
 300                                  column=cf:loc, timestamp=1477509673574, value=Ahd
 300                                  column=cf:pname, timestamp=1477509673574, value=Modi
 300                                  column=cf:prefession, timestamp=1477509673574, value=politician
3 row(s) in 0.0570 seconds



1 comment: