Sunday, June 1, 2014

SQOOP - Imports RDBMS data to HDFS and Export HDFS data to RDBMS

TO IMPORT & EXPORT DATA FROM RDBMS (MYSQL,ORACLE, etc) INTO HDFS / HIVE / HBASE
Pre-requisite
  1. Apache Hadoop
  2. Apache Sqoop (compatible with Hadoop version)
  3. Apache Hive (optional)
  4. Apache HBase (optional)
  5. Apache HCatalog (optional)
  6. JDBC/ODBC connector
For all RDBMS, Connection URL changes and remaining all command line arguments remains same. You need to download specific JDBC/ODBC connector JAR and copy it to $SQOOP_HOME/lib

MySQL
Download mysql-connector-java.jar and place in $SQOOP_HOME/lib folder
cp mysql-connector-java-5.1.18-bin.jar /usr/local/hadoop/sqoop-1.4.3-cdh4.4.0/lib/
Expecting you have data in MySQL tables.
Retrieving list of Databases available in MySQL from SQOOP
sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root -P
MySQL to HDFS Import
Have Primary key:
sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName
No Primary key:
sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName -m 1
MySQL to Hive Import:
Have Primary key:
sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
No Primary key:
sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table TableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
MySQL to HBase Import
Have Import All columns:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
HBase import few columns: 
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
To HBase with Primary key:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 –hbase-create-table
To Hbase with no primary key:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Export from HDFS to MySQL:
Same for all Hive/HBase/HDFS: Because Hive tables are nothing but directories in HDFS. So you're just exporting a directory to MySQL: 
sqoop export --connect jdbc:mysql://localhost:3306/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

SQL Server: 
Connection URL:
sqoop import --connect 'jdbc:sqlserver://<IP(or)hostname>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 1
Download Connector from Microsoft website:
Place it in $SQOOP_HOME/lib
Oracle:
Connection URL:
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" 
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \ --num-mappers 1 --verbose -P \

IBM DB2
Download the DB2Driver and place it in $SQOOP_HOME/lib
sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir sqoopimports
sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/dataFile.csv

Different Connection Strings for Different RDBMS
Database version --direct support? connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

No comments:

Post a Comment