Indhold

Kurset indeholder blandt andet:

Data Science Essentials

To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering

Implementing Data Access & Governance Policies

This course explores how a DAG (Data Access Governance), a structured data access framework, can reduce the likelihood of data security breaches, and reduce the likelihood of future breaches. Risk and data safety compliance addresses how to identify threats against an organization's digital data assets. You will learn about legal compliance, industry regulations, and compliance with organizational security policies. You will learn how the IAM (identity and access management) relates to users, devices, or software components. Learners will then explore how a PoLP (Principle of Least Privilege) dictates to whom and what permission is given to users to access data. You will learn to create an IAM user and group within AWS (Amazon Web Services), and how to assign file system permissions to a Windows server in accordance with the principle of least privilege. Finally, you will examine how vulnerability assessments are used to identify security weaknesses, and different types of preventative security controls, for example, firewalls or malware scanning.

Streaming Data Architectures

Learn the fundamentals of streaming data with Apache Spark. During this course, you will discover the differences between batch and streaming data. Observe the types of streaming data sources. Learn about how to process streaming data, transform the stream, and materialize the results. Decouple a streaming application from the data sources with a message transport. Next, learn about techniques used in Spark 1.x to work with streaming data and how it contrasts with processing batch data; how structured streaming in Spark 2.x is able to ease the task of stream processing for the app developer; and how streaming processing works in both Spark 1.x and 2.x. Finally, learn how triggers can be set up to periodically process streaming data; and the key aspects of working with structured streaming in Spark

R for Data Science

R is a programming language that is an essential skill for statistical computing and graphics. It is the tool of choice for data science professionals in every industry and field—not only to create reproducible high-quality analyses, but to take advantage of R's great graphic and charting capabilities. In this 11-video Skillsoft Aspire course, you will explore the fundamental data structures used in R, including working with vectors, lists, matrices, factors, and data frames. The key concepts in this course include: creating vectors in R and manipulating and performing operations on vectors in R; how to sort vectors in R; and how to use lists in R and explore example code line by line executing each line using the run current line command along the way. You will also examine creating matrices and performing matrix operations in R; creating factors and data frames in R; performing data frame operations in R; and how to create and use a data frame.

Creating Real Time Dashboards

To become a data science expert, you must master the art of data visualization. This 12-video course explores how to create and use real time dashboards with Tableau. Begin with an introduction to real-time dashboards and differences between real time and streaming data. Next, take a look at different cloud data sources. Learn how to build a dashboard in Tableau and update it in real time. Discover how to organize your dashboard by adding objects and adjusting the layout. Then customize and format different aspects of dashboards in Tableau and add interactivity using actions like filtering. Look at creating a dashboard starter, a prebuilt dashboard that can be used with Tableau Online to connect to cloud data sources. Add extensions to your dashboard such as the Tableau Extensions API (application program interface). Explore how to put together a simple dashboard story, which consists of sheets—each sheet in sequence is called a story point—and how to share a dashboard in Tableau. In the concluding exercise, learners create a dashboard starter.

Getting Started with Hive

This 9-video Skillsoft Aspire course focuses solely on theory and involves no programming or query execution. Learners begin by examining what a data warehouse is, and how it differs from a relational database, important because Apache Hive is primarily a data warehouse, despite giving a SQL-like interface to query data. Hive facilitates work on very large data sets, stored as files in the Hadoop Distributed File System, and lets users perform operations in parallel on data in these files by effectively transforming Hive queries into MapReduce operations. Next, you will hear about types of data and operations which data warehouses and relational databases handle, before moving on to basic components of the Hadoop architecture. Finally, the course discusses features of Hive making it popular among data analysts. The concluding exercise recalls differences between online transaction processing and online analytical processing systems, asking learners to identify Hadoop’s three major components; list Hadoop offerings on three major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform); and list benefits of Hive for data analysts.

Statistics for Data Science #1

Along the career path to Data Science, a fundamental understanding of statistics and modeling is required. The goal of all modeling is generalizing as well as possible from a sample to the population of big data as a whole. In this 10-video Skillsoft Aspire course, learners explore the first step in this process. Key concepts covered here include the objectives of descriptive and inferential statistics, and distinguishing between the two; objectives of population and sample, and distinguishing between the two; and objectives of probability and non-probability sampling and distinguishing between them. Learn to define the average of a data set and its properties; the median and mode of a data set and their properties; and the range of a data set and its properties. Then study the inter-quartile range of a data set and its properties; the variance and standard deviation of a data set and their properties; and how to differentiate between inferential and descriptive statistics, the two most important types of descriptive statistics, and the formula for standard deviation.

MongoDB for Data Wrangling

This course explores how to use MongoDB, a cross-platform document-oriented database that has become a popular tool for data wrangling and data science. MongoDB is a NoSQL (not only structured query language) that uses JSON (Javascript Object Notation) like documents with schemata. One advantage of MongoDB is the flexibility of how it stores data. You will learn how to perform MongoDB actions related to data wrangling by using Python with the PyMongo library. You will learn how to perform basic CRUD (create, read, update, delete) operations on a Mongo DB document. Next, learn how to use the find operation to select documents from a collection, and to use query operators to match document criteria. You will learn how to select documents using a specified criterion, similar to a WHERE clause in an SQL statement. Finally, this course demonstrates how to use the mongoimport tool to import from JSON or CSV, and mongoexport to export data from a MongoDB collection to JSON or CSV (comma separated values).

Powering Recommendation Engines

This 13-video course explores recommendation engines, systems which provide various users with items or products that they may be interested in by observing their previous purchasing, search, and behavior histories. They are used in many industries to help users find or explore products and content; for example, to find movies, news, insurance, and a myriad of other products and services. Learners will examine the three main types of recommendation systems: item-based, user-based or collaborative, and content-based. The course next examines how to collect data to be used for learning, training, and evaluation. You will learn how to use RStudio, an open-source IDE (integrated development environment) to import, filter, and massage data into data sets. Learners will create an R function that will give a score to an item based on other user ratings and similarity scores. You will learn to use R to create a function called compareUsers, to create an item-to-item similarity or content score. Finally, learn to validate and score by using the built-in R function RMSE (root mean square error).

Creating Data APIs for Customers

This 13-video course explores recommendation engines, systems which provide various users with items or products that they may be interested in by observing their previous purchasing, search, and behavior histories. They are used in many industries to help users find or explore products and content; for example, to find movies, news, insurance, and a myriad of other products and services. Learners will examine the three main types of recommendation systems: item-based, user-based or collaborative, and content-based. The course next examines how to collect data to be used for learning, training, and evaluation. You will learn how to use RStudio, an open-source IDE (integrated development environment) to import, filter, and massage data into data sets. Learners will create an R function that will give a score to an item based on other user ratings and similarity scores. You will learn to use R to create a function called compareUsers, to create an item-to-item similarity or content score. Finally, learn to validate and score by using the built-in R function RMSE (root mean square error).

Deploying Data Tools for All Users

Explore a variety of new data science tools available today; the different uses for these tools; and the benefits and challenges in deploying them in this 12-video course. First, examine a data science platform, the nucleus of technologies used to perform data science tasks. You will then explore the analysis process to inspect, clean, transform, and model data. Next, the course surveys integrating and exploring data, coding, and building models using that data, deploying the models to production, and delivering results through applications or by generating reports. You will see how a great data science platform should be flexible and scalable, and it should combine multiple features and capabilities that effectively centralize data science efforts. You will learn the six sequential steps of a typical data science workflow, from defining the objective for the project to reporting the results. Finally, explore DevOps, resources that allow developers and IT to work together in harmony which includes people, processes, and infrastructure; and its typical functionalities including integration, testing, packaging, as well as deployment.

Trifacta for Data Wrangling

Data wrangling, an increasingly popular tool among today's top firms, has become more complex as data become even more unstructured and varied in their source. In this 13-video Skillsoft Aspire course, you will learn how to simplify the task by organizing and cleaning disparate data to present your data in the best format possible with Trifacta, which accelerates data wrangling to enhance productivity for data scientists. Learn to reshape data, look up data, and pivot data. Explore essential methods for wrangling data, including how to use Trifacta to standardize, format, filter, and extract data. Also covered are other key topics: how to split and merge columns; utilize conditional aggregation; apply transforms to reshape data; and join two data sets into one by using join operations. In the concluding exercise, learners will be asked to start by loading a data set into Trifacta; to replace any missing values, if necessary; and to use a row filter operation, use a group by operation, and use an aggregate function in the group by operation.

Balancing the Four Vs of Data

The four Vs (volume, variety, velocity, and veracity) of big data and data science are a popular paradigm used to extract meaning and value from massive data sets. In this course, learners discover the four Vs, their purpose and uses, and how to extract value by using the four Vs. Key concepts covered here include the four Vs, their roles in big data analytics, and the overall principle of the four Vs; and ways in which the four Vs relate to each other. Next, study variety and data structure and how they relate to the four Vs; validity and volatility and how they relate to the four Vs; and how the four Vs should be balanced in order to implement a successful big data strategy. Learners are shown the various use cases of big data analytics and the four Vs of big data, and how the four Vs can be leveraged to extract value from big data. Finally, review the four Vs of big data analytics, their differences, and how balance can be achieved.

Cleaning Data in R

R is a programming language that is essential for data science, used for statistical computing and graphics. In this 13-video course, learners explore essential methods for wrangling and cleaning data with R. Begin by recognizing types of unclean data and criteria for ensuring data quality. First, learners see how to fetch a JSON (JavaScript Object Notation) document over HTTP and load data into a dplyr table. Learn how to load multiple sheets from an Excel document and how to handle common errors encountered when reading CSV (comma-separated values) data. Read data from a relational database with a SQL (structured query language) query. Explore joining tabular data by combining two related data sets by using a join operation, and spreading data—reshaping tabular data by spreading values from rows to columns. Look at summarizing data, applying a summary function using dplyr; imputing data, using mean imputation to replace missing values; and extracting matches, using a regular expression and data wrangling tools from the tidyverse package. The closing exercise practices data wrangling functions using R.

Data Research in Practice

To master data science, you must learn the techniques surrounding data research. In this 10-video course, learners will discover how to apply essential data research techniques, including JMP measurement, and how to valuate data by using descriptive and inferential methods. Begin by recalling the fundamental concept of data research that can be applied on data inference. Then learners look at steps that can be implemented to draw data hypothesis conclusions. Examine values, variables, and observations that are associated with data from the perspective of quantitative and classification variables. Next, view the different scales of standard measurements with a critical comparison between generic and JMP models. Then learn about the key features of nonexperimental and experimental research approaches when using real-time scenarios. Compare differences between descriptive and inferential statistical analysis and explore the prominent usage of different types of inferential tests. Finally, look at the approaches and steps involved in the implementation of clinical data research and sales data research using real-time scenarios. The concluding exercise involves implementing data research.

Raw Data to Insights

Explore how statistical analysis can turn raw data into insights, and then examine how to use the data to improve business intelligence, in this 10-video course. Learn how to scrutinize and perform analytics on the collected data. The course explores several approaches for identifying values and insights from data by using various standard and intuitive principles, including data exploration and data ingestion, along with the practical implementation by using R. First, you will learn how to detect outliers by using R, and how to compare simple linear regression models, with and without outliers, to improve the quality of the data. Because today's data are available in diversified formats, with large volume and high velocity, this course next demonstrates how to use a variety of technologies: Apache Kafka, Apache NiFi, Apache Sqoop, and Wavefront (a program for simulating two-dimensional acoustic systems) to ingest data. Finally, you will learn how these tools can help users in data extraction, scalability, integration support, and security.

Enterprise Value Through Data

Examine data-driven organizations, how they use data science, and the importance of prioritizing data in this 13-video course. Data-driven organizations are committed to gathering and utilizing data necessary for a business holistically to gain competitive advantage. You will explore how to create a culture within an organization by involving management and training employees. You will examine analytic maturity as a metric to measure an organization's progress. Next, learn how to analyze data quality; how it is measured in a relative manner, not an absolute manner; and how it should be measured, weighed and appropriately applied to determine the value or quality of a data set. You will learn the potential business effects of missing data and the three main reasons why data are not included in a collection: missing at random, missing due to data collection, and missing not at random. This course explores the wide range of impacts when there is duplicate data. You will examine how truncated or censored data have inconsistent results. Finally, you will explore data provenance and record-keeping.

Getting Started with Hadoop

In this course, learners will explore the theory behind big data analysis using Hadoop, and how MapReduce enables parallel processing of large data sets distributed on a cluster of machines. Begin with an introduction to big data and the various sources and characteristics of data available today. Look at challenges involved in processing big data and options available to address them. Next, a brief overview of Hadoop, its role in processing big data, and the functions of its components such as the Hadoop Distributed File System (HDFS), MapReduce, and YARN (Yet Another Resource Negotiator). Explore the working of Hadoop's MapReduce framework to process data in parallel on a cluster of machines. Recall steps involved in building a MapReduce application and specifics of the Map phase in processing each row of the input file's data. Recognize the functions of the Shuffle and Reduce phases in sorting and interpreting the output of the Map phase to produce a meaningful output. To conclude, complete an exercise on the fundamentals of Hadoop and MapReduce.

Data Engineering Fundamentals

Data engineering is the area of data science that focuses on practical applications of data collection and analysis. This 12-video course helps learners explore distributed systems, batch versus in-memory processing, NoSQL uses, and the various tools available for data management/big data and the ETL (extract, transform, and load) process. Begin with an overview of distributed systems from a data perspective. Then look at differences between batch and in-memory processing. Learn about NoSQL stores and their use, and tools available for data management. Explore ETL—what it is, the process, and the different tools available. Learn to use Talend Open Studio to showcase the ETL concept. Next, examine data modeling and creating a data model in Talend Open Studio. Explore the hierarchy of needs when working with AI and machine learning. In another tutorial, learn how to create a data partition. Then move on to data engineering and best practices, with a look at approaches to building and using data reporting tools. Conclude with an exercise designed to create a data model.

Data Wrangling with Python/Pandas

Pandas, a popular Python library, is part of the open-source PyData stack. In this 10-video Skillsoft Aspire course, you will learn that Pandas represents data in a tabular format which makes it easy and intuitive to perform data manipulation, cleaning, and exploration. You will use Python's DataFrame a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). To take this course, you should already be familiar with Python programming language; all code writing is in Jupyter notebooks. You will work with basic Pandas data structures, Pandas Series objects representing a single column of data which can store numerical values, strings, Booleans, and more complex data types. Learn how to use Pandas DataFrame, which represents data in table form. Finally, learn to append and sort series values, add missing data, add columns, and aggregate data in a DataFrame. The closing exercise involves instantiating a Pandas Series object by using both a list and a dictionary; changing the Series index to something other than default value; and practicing sorting Series values in place.

Kurser

Data Science

Praktiske spørgsmål

Sender

Tak for din henvendelse

Teknisk fejl

Introduktion

Deltagerprofil

Udbytte

Det får du på onlinekurset

Indhold

Tidsforbrug

Form

Søgte du et andet online kursus?

Køb online kursus til flere

Downloads

Firmakurser - få et tilbud

Vil du vide mere?

Det skal du vide om Machine Learning

Hvad kan Machine Learning? Og hvordan kan du anvende det?

Data Science

Praktiske spørgsmål

Sender

Tak for din henvendelse

Teknisk fejl

Introduktion

Deltagerprofil

Udbytte

Det får du på onlinekurset

Indhold

Tidsforbrug

Form

Søgte du et andet online kursus?

Køb online kursus til flere

Downloads

Få ny inspiration til din kompetence­udvikling

Tilmelder nyhedsbrev

Tak for din tilmelding

Teknisk fejl

Firmakurser - få et tilbud

Vil du vide mere?

Det skal du vide om Machine Learning

Hvad kan Machine Learning? Og hvordan kan du anvende det?

Få ny inspiration til din kompetenceudvikling