Desarrollando con Spark para Big Data (TTSK7505)

*ÂżBusca un horario flexible (fuera de horario o fines de semana)? Por favor llame o envĂ­enos un correo electrĂłnico: 858-208-4141 o sales@ccslearningacademy.com.

Opciones de financiación para estudiantes están disponibles.
¿Buscas formación en grupo? Contáctenos
CategorĂ­a:

Descargar PDF de detalles del curso

DescripciĂłn del curso:

Learn advanced Big Data and Spark skills to access disparate databases, integrate Machine Learning (ML), and establish streaming solutions.

Apache Spark is an important component in the Hadoop Ecosystem as a cluster computing engine used for Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, Spark offers faster in-memory processing for computing tasks when compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R along with SQL-based front-ends.

With advanced libraries like Mahout and MLib for Machine Learning, GraphX, or Neo4J for rich data graph processing, as well as access to other NoSQL data stores, Rule engines, and components, Spark is a lynchpin in modern Big Data and Data Science computing.

This course introduces you to enterprise-grade Spark programming and the components to craft complete data science solutions. You’ll learn core big data and Spark development techniques and industry practices. This course is offered in Java, and with some alterations, Python, Scala, and R.

Formato

Dirigido por un instructor

Tema

Longitud

Esquema del curso

Spark Overview

  • Hadoop Ecosystem
  • Hadoop YARN vs. Mesos
  • Spark vs. Map/Reduce
  • Spark with Map/Reduce: Lambda Architecture
  • Spark in the Enterprise Data Science Architecture

Spark Component Overview

  • Spark Shell
  • RDDs: Resilient Distributed Datasets
  • Data Frames
  • Spark 2 Unified DataFrames
  • Spark Sessions
  • Functional Programming
  • Spark SQL
  • MLib
  • Structured Streaming
  • Spark R
  • Spark and Python

RDDs: Resilient Distributed Datasets

  • Coding with RDDs
  • Transformaciones
  • Actions
  • Lazy Evaluation and Optimization
  • RDDs in Map/Reduce

DataFrames

  • RDDs vs. DataFrames
  • Unified Dataframes (UDF) in Spark 2.0
  • Partitioning

Spark Applications

  • Spark Sessions
  • Running Applications
  • Inicio sesiĂłn

DataFrame Persistence

  • RDD Persistence
  • DataFrame and Unified DataFrame Persistence

Spark Streaming

  • Streaming Overview
  • Streams
  • Structured Streaming
  • DStreams and Apache Kafka

Accessing NOSQL Data

  • Ingesting data
  • Parquet Files
  • Relational Databases
  • Graph Databases (Neo4J and GraphX)
  • Interacting with Hive
  • Accessing Cassandra Data
  • Document Databases (MongoDB and CouchDB)

Enterprise Integration

  • Map/Reduce and Lambda Integration
  • Camel Integration
  • Drools and Spark

Algorithms and Patterns

  • MLib and Mahout
  • Classification
  • Clustering
  • Decision Trees
  • Decompositions
  • Pipelines
  • Spark Packages

Spark SQL

  • Spark SQL
  • SQL and DataFrames
  • Spark SQL and Hive
  • Spark SQL and JDBC

GraphX

  • Graph APIs
  • GraphX
  • ETL in GraphX
  • Exploratory Analysis
  • Graph computation
  • Pregel API Overview
  • GraphX Algorithms
  • Neo4J as an alternative

Alternate Languages

  • Using Web Notebooks (Zeppelin and Jupyter)
  • R on Spark
  • Python on Spark
  • Scala on Spark

Clustering Spark for Developers

  • Parallelizing Spark Applications
  • Clustering concerns for Developers

Performance and Tuning

  • Monitoring Spark Performance
  • Tuning Memory
  • Tuning CPU
  • Tuning Data Locality
  • Troubleshooting

PĂşblico objetivo

Experienced Developers and Architects who seek proficiency in working with Apache Spark in an enterprise data environment.

Lo que aprenderás

Join an engaging hands-on learning environment, where you’ll learn:

  • The essentials of Spark architecture and applications
  • How to execute Spark Programs
  • How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
  • How to persist and restore data frames
  • Essential NOSQL access
  • How to integrate machine learning into Spark applications
  • How to use Spark Streaming and Kafka to create streaming applications

Requisitos previos

Before attending this course, you should have:

  • Java programming experience
  • Python programming experience
  • Basic understanding of SQL
  • Comfort with navigating the Linux command line
  • Basic knowledge of Linux editors (such as VI/nano) for editing code

Inclusiones

Con CCS Learning Academy, recibirás:

  • Instructor-led training
  • Manual del estudiante del seminario de capacitaciĂłn
  • ColaboraciĂłn con compañeros de clase (actualmente no disponible para cursos a su propio ritmo)
  • Escenarios y actividades de aprendizaje del mundo real.
  • Soporte para la programaciĂłn de exámenes*
  • Disfrute de asistencia para la colocaciĂłn laboral durante los primeros 12 meses despuĂ©s de finalizar el curso.
  • Este curso es elegible para el programa Learn and Earn de CCS Learning Academy: obtenga un reembolso de la matrĂ­cula de hasta 50% si se le coloca en un trabajo a travĂ©s de CCS Global Tech. DivisiĂłn de colocaciĂłn*
  • Precios gubernamentales y privados disponibles.*

*Para más detalles llame al: 858-208-4141 o correo electrónico: formación@ccslearningacademy.com; ventas@ccslearningacademy.com

 

Carro de la compra
es_CRSpanish