Q. What do data engineers do?
A. Data engineers gather all kinds of data from different sources, process and combine this data, and then store the clean data in a place that can be easily accessed. The goal is to get everything ready so that a data scientist, business analyst, or really anyone at the company can use this data without having to do much wrangling. Data engineers automate this process by writing software programs called data pipelines. They work with large systems that they build test and maintain with code. More recently, data engineers are being asked to automate and scale machine learning algorithms as well. They're actually much closer to software engineers than they are to data scientists. As a data scientist, it's important that you have at least some data engineering skills. You'll often be communicating with engineers about what data you need and what format the data needs to be in. You can't build features and train a machine learning model with messy poorly formatted data. That's where data engineering comes into play.
Q. Roles of a Data Engineer
A. Roles of a data engineer can vary a lot depending on where you work and the size of your company. At larger companies, there are usually dedicated data engineers who pull from many sources, and pipeline the data into a usuable form. These sources can include website logs, smartphone app logs, user data, transaction data, sensor data, mechanical data, and much more. Analysts can then use these results for dashboard visualizations, predictive models, business decisions, recommendations, or other data science applications. Smaller companies often don't have the resources for a dedicated data engineer. For these companies, a data scientist might have more than one role taking on at least some of the responsibilities of a data engineer. As data scientists, we focus much of our time on building mathematical, machine learning, and domain specific business skills. However, improving our engineering skills sometimes gets little attention.
[Objectives]
1. Practice pulling data from various sources, transforming the data, and then storing the results in a database.
2. Create machine learning piplelines to streamline data preparation and model building.
Course Road-map
- Data Engineering
- Data Pipelines
- ETL (Extract Transform Load) Pipelines
- NLP Pipelines
- Text Processing
- Modeling
- Machine Learning Pipelines
- Scikit-learn pipelines
- Feature Union
- Grid Search
- Data Engineering Project
- Classify disaster response messages
- Skills: data pipelines, NLP pipelines, machine learning pipelines, supervised learning
'[Udacity] Data Engineer' 카테고리의 다른 글
Lesson 2: Relational Data Models (0) | 2021.09.22 |
---|---|
Lesson 1: Introduction to Data Modeling (0) | 2021.09.22 |