Course Outline
Section 1: Data Management in HDFS
- Various Data Formats (JSON / Avro / Parquet)
- Compression Schemes
- Data Masking
- Labs : Analyzing different data formats; enabling compression
Section 2: Advanced Pig
- User-defined Functions
- Introduction to Pig Libraries (ElephantBird / Data-Fu)
- Loading Complex Structured Data using Pig
- Pig Tuning
- Labs : advanced pig scripting, parsing complex data types
Section 3 : Advanced Hive
- User-defined Functions
- Compressed Tables
- Hive Performance Tuning
- Labs : creating compressed tables, evaluating table formats and configuration
Section 4 : Advanced HBase
- Advanced Schema Modelling
- Compression
- Bulk Data Ingest
- Wide-table / Tall-table comparison
- HBase and Pig
- HBase and Hive
- HBase Performance Tuning
- Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
Requirements
- comfortable with Java programming language (most programming exercises are in java)
- comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)
- a working knowledge of Hadoop.
Lab environment
Zero Install: There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser
Testimonials (5)
Trainer's preparation & organization, and quality of materials provided on github.
Mateusz Rek - MicroStrategy Poland Sp. z o.o.
Course - Impala for Business Intelligence
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
I thought he did a great job of tailoring the experience to the audience. This class is mostly designed to cover data analysis with HIVE, but me and my co-worker are doing HIVE administration with no real data analytics responsibilities.
ian reif - Franchise Tax Board
Course - Data Analysis with Hive/HiveQL
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczątka
Course - Administrator Training for Apache Hadoop
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.