·課程名稱:Hadoop開發(fā)工程師(CCDH)認(rèn)證
·開課時(shí)間:2024-03-26
【課程簡介】
作為大數(shù)據(jù)核心技術(shù),hadoop 為企業(yè)提供了高擴(kuò)展、高冗余、高容錯(cuò)、和經(jīng)濟(jì)有效的“數(shù)據(jù)驅(qū)動(dòng)”解決方案。針對(duì)目前普遍缺乏海量數(shù)據(jù)技術(shù)人員的現(xiàn)狀,Cloudera公司推出面向開發(fā)人員的認(rèn)證Cloudera Certified Developer for Apache Hadoop (CCDH)。通過在青藍(lán)咨詢的CCDH課程培訓(xùn)您將學(xué)習(xí)到:
* Hadoop核心
* HDFS和MapReduce工作原理
* 如何開發(fā)MapReduce應(yīng)用
* 如何單元測(cè)試MapReduce應(yīng)用
* 如何使用MapReduce combiners, partitioners和distributed cache
* 開發(fā)調(diào)試MapReduce應(yīng)用
* 如何實(shí)現(xiàn)MapReduce應(yīng)用中的輸入/輸出
* 常見MapReduce算法
* 如何用MapReduce來聯(lián)結(jié)數(shù)據(jù)集
* 如何把Hadoop嵌入到企業(yè)已有的計(jì)算環(huán)境里
* 如何使用Mahout來進(jìn)行機(jī)器學(xué)習(xí)
* 如何使用Hive和Pig來快速開發(fā)數(shù)據(jù)分析應(yīng)用
* 如何使用Oozie來創(chuàng)建管理工作流
【授課對(duì)象】
企業(yè)管理者、CIO、CTO、政府信息部門官員、項(xiàng)目(開發(fā))經(jīng)理、咨詢顧問 、IT經(jīng)理,IT咨詢顧問,IT支持專家、系統(tǒng)工程師、數(shù)據(jù)中心管理員、云計(jì)算管理員及想加入云計(jì)算隊(duì)伍的您需要使用Apache Hadoop來開發(fā)功能強(qiáng)大的數(shù)據(jù)分析應(yīng)用的程序開發(fā)人員。
學(xué)員需具備程序設(shè)計(jì)經(jīng)驗(yàn),特別是Java方面的技能和背景。無需Hadoop方面的基礎(chǔ)和經(jīng)驗(yàn)。
【授課內(nèi)容】
了解MapReduce和HDFS是如何組合相互匹配,提供可擴(kuò)展的強(qiáng)大系統(tǒng)。
學(xué)習(xí)編寫針對(duì)Hadoops API的程序,掌握編寫更有趣的數(shù)據(jù)處理任務(wù)所需的基本技能。
掌握如何在數(shù)據(jù)中心服務(wù)器上或Amazons EC2上部署Hadoop,利用Hadoop擴(kuò)充現(xiàn)有系統(tǒng)。
掌握如何把不同類型數(shù)據(jù)導(dǎo)入Hadoop作進(jìn)一步分析,以及利用Sqoop導(dǎo)入現(xiàn)有數(shù)據(jù)庫。
掌握如何使用Hive,涉及數(shù)據(jù)導(dǎo)入、表格創(chuàng)建及作出查詢。
掌握最佳方案以減輕MapReduce程序調(diào)試難度,及規(guī)模調(diào)試的本地測(cè)試工具和技術(shù)。
深入了解Hadoop API,包括自定義數(shù)據(jù)類型和文件格式,HDFS的直接訪問,中間數(shù)據(jù)劃分,以及其他工具,如DistributedCache。
深入了解圖算法,以及PageRank。了解有效執(zhí)行聯(lián)接的策略,比較不同數(shù)據(jù)模型的不同技術(shù)。
掌握如何進(jìn)行MapReduce程序優(yōu)化,提高性能。
模塊 |
內(nèi)容 |
The Motivation for Hadoop
|
l Problems with Traditional Large-Scale Systems l Introducing Hadoop l Hadoopable Problems |
The Motivation for Hadoop
|
l Problems with Traditional Large-Scale Systems l Introducing Hadoop l Hadoopable Problems |
Hadoop: Basic Concepts and HDFS
|
l The Hadoop Project and Hadoop Components l The Hadoop Distributed File System |
Introduction to MapReduce V2
|
l MapReduce Overview l Example: WordCount l Mappers l Reducers |
Hadoop Clusters and the Hadoop Ecosystem
|
l Hadoop Cluster Overview l Hadoop Jobs and Tasks l Other Hadoop Ecosystem Components |
Writing a MapReduce Program in Java
|
l Basic MapReduce API Concepts l Writing MapReduce Drivers, Mappers, and Reducers in Java l Speeding Up Hadoop Development by Using Eclipse l Differences Between the Old and New MapReduce APIs |
Writing a MapReduce Program Using Streaming |
l Writing Mappers and Reducers with the Streaming API |
Unit Testing MapReduce Programs
|
l Unit Testing l The JUnit and MRUnit Testing Frameworks l Writing Unit Tests with MRUnit l Running Unit Tests |
Delving Deeper into the Hadoop API
|
l Using the ToolRunner Class l Setting Up and Tearing Down Mappers and Reducers l Decreasing the Amount of Intermedi-ate Data with Combiners l Accessing HDFS Programmatically l Using The Distributed Cache l Using the Hadoop API’s Library of Mappers,Reducers, and Partitioners |
Practical Development Tips and Techniques |
l Strategies for Debugging MapReduce Code l Testing MapReduce Code Locally by Using |
LocalJobRunner
|
l Writing and Viewing Log Files l Retrieving Job Information with Counters l Reusing Objects l Creating Map-Only MapReduce Jobs |
Partitioners and Reducers
|
l How Partitioners and Reducers Work Together l Determining the Optimal Number of Reduc-ers for a Job l Writing Customer Partitioners |
Data Input and Output
|
l Creating Custom Writable and Writable-Comparable Implementations l Saving Binary Data Using SequenceFile andAvro Data Files l Issues to Consider When Using File Compression l Implementing Custom InputFormats and OutputFormats |
Common MapReduce Algorithms
|
l Sorting and Searching Large Data Sets l Indexing Data l Computing Term Frequency — Inverse Document Frequency l Calculating Word Co-Occurrence l Performing Secondary Sort |
Joining Data Sets in MapReduce Jobs |
l Writing a Map-Side Join l Writing a Reduce-Side Join |
Integrating Hadoop into the Enterprise Workflow |
l Integrating Hadoop into an Existing Enterprise l Loading Data from an RDBMS into HDFS by Using Sqoop l Managing Real-Time Data Using Flume l Accessing HDFS from Legacy Systems with FuseDFS and HttpFS |
An Introduction to Hive, Imapala, and Pig |
l The Motivation for Hive, Impala, and Pig l Hive Overview l Impala Overview l Pig Overview l Choosing Between Hive, Impala, and Pig |
An Introduction to Oozie |
l Introduction to Oozie l Creating Oozie Workflows |
Conclusion |
l Conclusion |
注:具體開課時(shí)間將根據(jù)實(shí)際進(jìn)行調(diào)整,請(qǐng)關(guān)注青藍(lán)咨詢官方公眾號(hào)消息或咨詢課程顧問!
【聯(lián)系青藍(lán)咨詢】
地址: 深圳市南山區(qū)高新南一道06號(hào)TCL大廈B座3樓309室 (公交站:大沖 地鐵站:一號(hào)線高新園C出口)
郵編:518057
電話:0755-86950769
網(wǎng)址:http://www.mycalorietracker.com