課程信息

課程名稱: Hadoop開發(fā)工程師(CCDH)認(rèn)證

公開班、定制班

開課時(shí)間:2024-03-26

課程介紹

 Hadoop開發(fā)工程師(CCDH)認(rèn)證


課程簡(jiǎn)介

作為大數(shù)據(jù)核心技術(shù),hadoop 為企業(yè)提供了高擴(kuò)展、高冗余、高容錯(cuò)、和經(jīng)濟(jì)有效的“數(shù)據(jù)驅(qū)動(dòng)”解決方案。針對(duì)目前普遍缺乏海量數(shù)據(jù)技術(shù)人員的現(xiàn)狀,Cloudera公司推出面向開發(fā)人員的認(rèn)證Cloudera Certified Developer for Apache Hadoop (CCDH)。通過在青藍(lán)咨詢CCDH課程培訓(xùn)您將學(xué)習(xí)到:

* Hadoop核心

* HDFS和MapReduce工作原理

* 如何開發(fā)MapReduce應(yīng)用

*  如何單元測(cè)試MapReduce應(yīng)用

* 如何使用MapReduce combiners, partitioners和distributed cache

* 開發(fā)調(diào)試MapReduce應(yīng)用

* 如何實(shí)現(xiàn)MapReduce應(yīng)用中的輸入/輸出

* 常見MapReduce算法

* 如何用MapReduce來聯(lián)結(jié)數(shù)據(jù)集

* 如何把Hadoop嵌入到企業(yè)已有的計(jì)算環(huán)境里

* 如何使用Mahout來進(jìn)行機(jī)器學(xué)習(xí)

* 如何使用Hive和Pig來快速開發(fā)數(shù)據(jù)分析應(yīng)用

* 如何使用Oozie來創(chuàng)建管理工作流


授課對(duì)象

企業(yè)管理者、CIO、CTO、政府信息部門官員、項(xiàng)目(開發(fā))經(jīng)理、咨詢顧問 、IT經(jīng)理,IT咨詢顧問,IT支持專家、系統(tǒng)工程師、數(shù)據(jù)中心管理員、云計(jì)算管理員及想加入云計(jì)算隊(duì)伍的您需要使用Apache Hadoop來開發(fā)功能強(qiáng)大的數(shù)據(jù)分析應(yīng)用的程序開發(fā)人員。

學(xué)員需具備程序設(shè)計(jì)經(jīng)驗(yàn),特別是Java方面的技能和背景。無需Hadoop方面的基礎(chǔ)和經(jīng)驗(yàn)。


授課內(nèi)容

了解MapReduce和HDFS是如何組合相互匹配,提供可擴(kuò)展的強(qiáng)大系統(tǒng)。

學(xué)習(xí)編寫針對(duì)Hadoops API的程序,掌握編寫更有趣的數(shù)據(jù)處理任務(wù)所需的基本技能。

掌握如何在數(shù)據(jù)中心服務(wù)器上或Amazons EC2上部署Hadoop,利用Hadoop擴(kuò)充現(xiàn)有系統(tǒng)。

掌握如何把不同類型數(shù)據(jù)導(dǎo)入Hadoop作進(jìn)一步分析,以及利用Sqoop導(dǎo)入現(xiàn)有數(shù)據(jù)庫。

掌握如何使用Hive,涉及數(shù)據(jù)導(dǎo)入、表格創(chuàng)建及作出查詢。

掌握最佳方案以減輕MapReduce程序調(diào)試難度,及規(guī)模調(diào)試的本地測(cè)試工具和技術(shù)。

深入了解Hadoop API,包括自定義數(shù)據(jù)類型和文件格式,HDFS的直接訪問,中間數(shù)據(jù)劃分,以及其他工具,如DistributedCache。

深入了解圖算法,以及PageRank。了解有效執(zhí)行聯(lián)接的策略,比較不同數(shù)據(jù)模型的不同技術(shù)。

掌握如何進(jìn)行MapReduce程序優(yōu)化,提高性能。


模塊

內(nèi)容

The Motivation for Hadoop

 

Problems with Traditional  Large-Scale Systems

Introducing Hadoop

Hadoopable Problems

The Motivation for Hadoop

 

Problems with Traditional  Large-Scale Systems

Introducing Hadoop

Hadoopable Problems

Hadoop: Basic Concepts and HDFS

The Hadoop Project and  Hadoop Components

The Hadoop Distributed File System

Introduction to MapReduce V2

 

MapReduce Overview

Example: WordCount  

Mappers  

Reducers

Hadoop Clusters and the Hadoop Ecosystem

Hadoop Cluster Overview

Hadoop Jobs and Tasks

Other Hadoop Ecosystem Components

Writing a MapReduce Program in Java

Basic MapReduce API Concepts

Writing MapReduce Drivers, Mappers, and Reducers in Java

Speeding Up Hadoop Development by Using Eclipse

Differences Between the Old and New MapReduce APIs

Writing a MapReduce Program Using Streaming

Writing Mappers and Reducers with the Streaming API

Unit Testing MapReduce Programs

Unit Testing

The JUnit and MRUnit Testing Frameworks

Writing Unit Tests with MRUnit

Running Unit Tests

Delving Deeper into the Hadoop API

 

Using the ToolRunner Class

Setting Up and Tearing Down Mappers and Reducers

Decreasing the Amount of Intermedi-ate  Data with Combiners

Accessing HDFS Programmatically

Using The Distributed Cache

Using the Hadoop API’s Library of Mappers,Reducers, and Partitioners

Practical Development Tips and Techniques

Strategies for Debugging MapReduce Code

Testing MapReduce Code Locally by Using

LocalJobRunner

 

Writing and Viewing Log Files

Retrieving Job Information with Counters

Reusing Objects

Creating Map-Only MapReduce Jobs

Partitioners and Reducers

How Partitioners and Reducers Work Together

Determining the Optimal Number of Reduc-ers for a Job

Writing Customer Partitioners

Data Input and Output

 

Creating Custom Writable and Writable-Comparable Implementations

Saving Binary Data Using SequenceFile andAvro Data Files

Issues to Consider When Using File Compression

Implementing Custom InputFormats and OutputFormats

Common MapReduce Algorithms

 

Sorting and Searching Large Data Sets

Indexing Data

Computing Term Frequency — Inverse Document Frequency

Calculating Word Co-Occurrence

Performing Secondary Sort

Joining Data Sets in MapReduce Jobs

Writing a Map-Side Join

Writing a Reduce-Side Join

Integrating Hadoop into the Enterprise Workflow

Integrating Hadoop into an Existing Enterprise

Loading Data from an RDBMS into HDFS by Using Sqoop

Managing Real-Time Data Using Flume

Accessing HDFS from Legacy  Systems with FuseDFS and HttpFS

An Introduction to Hive, Imapala, and Pig

The Motivation for Hive, Impala, and Pig

Hive Overview

Impala Overview

Pig Overview

Choosing Between Hive, Impala, and Pig

An Introduction to Oozie

Introduction to Oozie

Creating Oozie Workflows

Conclusion

Conclusion



注:具體開課時(shí)間將根據(jù)實(shí)際進(jìn)行調(diào)整,請(qǐng)關(guān)注青藍(lán)咨詢官方公眾號(hào)消息或咨詢課程顧問!



【聯(lián)系青藍(lán)咨詢】

地址: 深圳市南山區(qū)高新南一道06號(hào)TCL大廈B座3樓309室 (公交站:大沖   地鐵站:一號(hào)線高新園C出口) 

    郵編:518057 

    電話:0755-86950769

    郵箱:peixun@shzhchina.com 

    網(wǎng)址:http://www.mycalorietracker.com

 

掃碼關(guān)注 了解更多課程信息