admin 管理员组文章数量: 887016
RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
Sqoop 简介
Apache Sqoop 项目旨在协助 RDBMS 与 Hadoop 之间进行高效的大数据交流。用户可以在 Sqoop 的帮助下,轻松地把关系型数据库的数据导入到 Hadoop 与其相关的系统 ( 如 HBase 和 Hive) 中;同时也可以把数据从 Hadoop 系统里抽取并导出到关系型数据库里。除了这些主要的功能外,Sqoop 也提供了一些诸如查看数据库表等实用的小工具。
Sqoop 支持的数据库
理论上,Sqoop 支持任何一款支持 JDBC 规范的数据库,如 DB2、MySQL 等。在使用 Sqoop 连接关系型数据库前,首先需要把相关的 JDBC 驱动拷贝到 $SQOOP_HOME/lib 文件夹下,然后在“connect”参数后指定好数据库连接的 url,如“--connect jdbc:db2://localhost:50000/SAMPLE”。 对于 DB2 数据库来说,Sqoop 目前支持 DB2 的绝大多数数据类型,而且 Sqoop 的大多数工具也能在 DB2 上较好地运行。
Sqoop 支持的文件类型
Sqoop 能够将 DB2 数据库的数据导入到 HDFS 上,并保存为多种文件类型。常见的有定界文本类型,Avro 二进制类型以及 SequenceFiles 类型。在本文里,统一用定界文本类型。Available commands:codegen Generate code to interact with database recordscreate-hive-table Import a table definition into Hiveeval Evaluate a SQL statement and display the resultsexport Export an HDFS directory to a database tablehelp List available commandsimport Import a table from a database to HDFSimport-all-tables Import tables from a database to HDFSlist-databases List available databases on a serverlist-tables List available tables in a databaseversion Display version informationSee 'sqoop help COMMAND' for information on a specific command.
List RDBMS Databases; ./sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -password root123;
List RDBMS Tables;
./sqoop list-tables --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123;
Import RDBMS Tables; ./sqoop import --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123 --table USERS;
hadoop@feixu-master1:/usr/local/sqoop/bin$ ./sqoop import --connect jdbc:mysql://192.168.56.102:3306/feixu --username root --password root123 --table RATES_BK;
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/12/20 05:03:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/12/20 05:03:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/12/20 05:03:52 INFO tool.CodeGenTool: Beginning code generation
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/12/20 05:03:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.jar
13/12/20 05:03:53 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/12/20 05:03:53 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/12/20 05:03:53 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/12/20 05:03:53 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 INFO mapreduce.ImportJobBase: Beginning import of RATES_BK
13/12/20 05:03:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`USER_ID`), MAX(`USER_ID`) FROM `RATES_BK`
13/12/20 05:03:57 INFO mapred.JobClient: Running job: job_201312200120_0010
13/12/20 05:03:58 INFO mapred.JobClient: map 0% reduce 0%
13/12/20 05:04:13 INFO mapred.JobClient: map 25% reduce 0%
13/12/20 05:04:14 INFO mapred.JobClient: map 50% reduce 0%
13/12/20 05:04:15 INFO mapred.JobClient: map 100% reduce 0%
13/12/20 05:04:17 INFO mapred.JobClient: Job complete: job_201312200120_0010
13/12/20 05:04:17 INFO mapred.JobClient: Counters: 18
13/12/20 05:04:17 INFO mapred.JobClient: Job Counters
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=36108
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Launched map tasks=4
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/12/20 05:04:17 INFO mapred.JobClient: File Output Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Written=216525
13/12/20 05:04:17 INFO mapred.JobClient: FileSystemCounters
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_READ=440
13/12/20 05:04:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=268158
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=216525
13/12/20 05:04:17 INFO mapred.JobClient: File Input Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Read=0
13/12/20 05:04:17 INFO mapred.JobClient: Map-Reduce Framework
13/12/20 05:04:17 INFO mapred.JobClient: Map input records=10000
13/12/20 05:04:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=152514560
13/12/20 05:04:17 INFO mapred.JobClient: Spilled Records=0
13/12/20 05:04:17 INFO mapred.JobClient: CPU time spent (ms)=3350
13/12/20 05:04:17 INFO mapred.JobClient: Total committed heap usage (bytes)=37519360
13/12/20 05:04:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1391800320
13/12/20 05:04:17 INFO mapred.JobClient: Map output records=10000
13/12/20 05:04:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=440
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Transferred 211.4502 KB in 23.3888 seconds (9.0407 KB/sec)
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Retrieved 10000 records.
本文标签: RDBMS和HDFS Hive HBASE的迁移工具Sqoop
版权声明:本文标题:RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.freenas.com.cn/jishu/1687299756h86770.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论