Data Science & Big Data Overview: Tools, Tech & Modern Roles in the Data Driven Enterprise (TTDS6000)
* Looking for a flexible schedule (after hours or weekends)? Please call 858-208-4141 or email us: sales@ccslearningacademy.com.
Student financing options are available.
Transitioning military and Veterans, please contact us to sign up for a free consultation on training and hiring options.
Looking for group training? Contact Us
Course schedule:About This Course
The Data Science & Big Data Overview | Tools, Tech & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing. The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment. This goal of this course is to provide students with a baseline understanding of core concepts and technologies to a conversant level.
Course Outline
Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We will work with you to tune this course and level of coverage to target the skills you need most.
- Foundations
- Grids and Virtualization
-
- Service-Oriented Architecture
- Enterprise Service Bus
- Enterprise Message Bus
- The Cloud
- The Hadoop Ecosystem
- HDFS: Hadoop Distributed File System
- Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
- Hadoop Map/Reduce
- Spark
- Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource
- Big Data, NOSQL, and ETL
- Big Data vs. RDBMS
- NOSQL: Not Only SQL
- Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
- Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
- Columnar Databases: Cassandra, HBase, BigTable
- Document Databases: MongoDB, CouchDB/CouchBase
- Graph Databases: Giraph, Neo4J, GraphX
- Apache Hive
- Common Data Formats
- Leveraging SQL and SQL variants
- ETL: Exchange, Transform, Load
- Data Ingestion, Transformation, and Loading
- Exporting Data
- Sqoop, Flume, Informatica, and other tools
- Enterprise Integration Patterns and Message Busses
- Enterprise Integration Patterns: Apache Camel and Spring Integration
- Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools
- An Overview of Developing in Hadoop Ecosystem
- Languages: R, Python, Java, Scala, Pig, and BPMN
- Libraries and Frameworks
- Development, Testing, and Deployment
- Exploring Artificial Intelligence and Business Systems
- Artificial Intelligence: Myths, Legends, and Reality
- The Math
- Statistics
- Probability
- Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
- Business Rule Systems: Drools, JRules, Pegasus
- Artificial Intelligence: Myths, Legends, and Reality
- The Modern Data Team
- Agile Data Science
- NOSQL Data Architects and Administrators
- Developers
- Grid Administrators
- Business and Data Analysts
- Management
- Evolving your Team
- Growing your Infrastructure
Learning Objectives
Pre-requisites
- Attendees should have prior exposure to Enterprise Information Technology. As well as familiarity with Relational Databases.
Target Audience
- This introductory-level / primer course is an overview intended for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools.