Tuesday, September 15, 2015

Configure Sqoop for Netezza and Import table to Hive Metastore



Our scenario is a below:

- DWH: Netezza (IBM Puredata)
- Hadoop : Cloudera (CDH5.4) and we are using Cloudera Manager. 
- Hadoop Cluster configuration consideration: All the worker nodes are behind a private network. Can not be accessed with our edge/management nodes.
- Objective: Import data from Netezza to a Hive table.

I followed below steps to meet the objective. 
As pr our current stats, we are now able to migrate 700 GB data in 1 hour (using 4 channels or mappers).

1. Install netezza connectors:

1.1 Install netezza connector for sqoop following steps in link: http://cloudera.com/content/cloudera/en/documentation/connectors/latest/Netezza/Cloudera-Connector-for-Netezza/ccnt_topic_3.html

1.2 (for Oozie)download "sqoop-nz-connector-1.2c5.tar" from below link and upload sqoop-nz-connector-1.2c5.jar to HDFS: /user/oozie/share/lib/lib_/sqoop/
http://www.cloudera.com/content/cloudera/en/downloads/connectors/sqoop/netezza/1-2c5-for-cdh-5-x.html

      1.3 sudo -su oozie hadoop fs -put /tmp/sqoop-nz-connector-1.2c5.jar /user/oozie/share/lib/lib_/sqoop/


2. In addition to above, 
2.1 Netezza JDBC driver directory /var/lib/sqoop should have below permissions:
  This directory should have set permission as (chmod 755  /var/lib/sqoop)
  full path up to the driver file /var/lib/sqoop/nzjdbc3.jar should be accessible by others (chmod 444 /var/lib/sqoop/nzjdbc3.jar)
 
2.2 (for Oozie)upload netezza jdbc driver to   HDFS: /user/oozie/share/lib/lib_/sqoop/
sudo -su oozie hadoop fs -put /var/lib/sqoop/nzjdbc3.jar /user/oozie/share/lib/lib_/sqoop/

2.3 restart oozie

3. Configure proxy on edge node/s to connect to netezza for data nodes. (in case your datanodes are in private network).
[sqoop will launches several map jobs on data nodes and data nodes connect directly to netezza]

3.1 install haproxy on the edge node/s
  yum install haproxy
 
3.2 Configure the proxy for netezza.
Append below in '/etc/haproxy/haproxy.cfg'. Here the proxy will listen to 25501 on the edge node/s and bypass the incoming connection to netezza host:port.

# Proxy For Netezza
listen netezza : 25501
    mode tcp
    option tcplog
    balance leastconn

    server netezza :
    

3.3 service haproxy restart

4. To import use below
*** the user should have create external table privilege on the target database on netezza

- use username in upper case
- use direct ( Each map task of Netezza connector’s import job works on a subset of the Netezza partitions and transparently creates and uses an external table to transport data)
- use "--P" for password prompt
- we can pass mapreduce params like "-D mapreduce.task.timeout=0" just after import/export command
Here:
: netezza host/IP or host/IP of the edge node running the proxy for netezza.
: netezza port(default 5480) or port listening for netezza in the edge node running the proxy for netezza.

4.1     to import data on a HDFS directory use below:
sqoop import  \
--direct \
--connect jdbc:netezza://:/ \
--table D_ACCOUNT \
--fields-terminated-by '|' \
--target-dir /user/etl/temp/ \
--username \
--password '' \
-- --nz-logdir /tmp/

4.2     to import data on Hive table use below (it will prompt for password as we used “--P”):
sqoop import  \
--connect jdbc:netezza://:25501/  \
--table .  \
--where   \
--target-dir ""  \
--username   \
--P \
--hive-import  \
--hive-table ""  \
--hive-overwrite  \
--delete-target-dir  \
--num-mappers 4  \
--verbose \
--direct  \
-- --nz-logdir   --nz-uploaddir


4.3     Then use hive to copy data from temp table(text or serial format) to final table(parquet + snappy) (select ..insert). Sqoop has some limitation to load data directly to a parquet table.

15 comments:

  1. Hey, Great job here on NETEZZA. You know what! I read a lot of blog posts and I never come across such a helpful post. I like this post you made with great effort. Very ingenious.

    ReplyDelete
  2. Good blog,thanks for sharing the valuable information...thank you.
    Best software Training institute in Bangalore

    ReplyDelete
  3. It was really a nice post and i was really impressed by reading this Big Data Hadoop Online Course Bnagalore

    ReplyDelete
  4. Thank you for sharing your wonderful information. for professional graphics,
    work contacts me. I am a freelance designer in gurgaon.
    Freelance Graphic Designing
    Freelance Catalogue Designing in delhi
    Freelance Catalogue Designing in gurgaon
    Freelance Brochure Designing
    Freelance Label Designing
    Freelance Banner Designer
    Freelance Poster Designer
    graphic design services in delhi
    graphic design services in gurgaon
    Freelance Catalogue Designing in delhi
    Freelance Catalogue Designing in gurgaon
    Freelance Brochure Designing
    Freelance Label Designing
    Freelance Banner Designer
    Freelance Poster Designer
    graphic design services in delhi
    graphic design services in gurgaon
    freelance website designer in gurgaon
    freelance designer in gurgaon
    freelance website designer in gurgaon
    freelance web designer in gurgaon
    freelance graphic designer services in gurgaon
    freelancer graphic designer services in gurgaon
    freelancer graphic designer services in gurgaon
    freelancer graphic services in gurgaon
    freelancer logo services in gurgaon
    freelancer logo services in gurgaon
    freelancer web designer services in gurgaon
    freelancer web designer services in gurgaon
    freelance web designer services in gurgaon
    freelance website designer services in gurgaon
    freelance website designer services in gurgaon
    freelance logo designer service in gurgaon
    freelance logo designer service in gurgaon
    logo designer in gurgaon
    brochure design in gurgaon
    logo design in gurgaon
    freelance logo design in gurgaon
    freelance logo designer in gurgaon
    freelance logo designer in gurgaon

    ReplyDelete
  5. I am impressed. I don't think Ive met anyone who knows as much about this subject as you do. You are truly well informed and very intelligent. You wrote something that people could understand and made the subject intriguing for everyone. Really, great blog you have got here
    BCOM 1st Year Exam TimeTable 2020
    BCOM 2nd Year Exam Date Sheet/TimeTable 2020
    BCOM 3rd Year
    Exam Time Schedule 2020

    ReplyDelete