Data Cleaning and Transformation

Description: Test your knowledge on Data Cleaning and Transformation techniques commonly used in Big Data Analytics.
Number of Questions: 15
Created by:
Tags: data cleaning data transformation big data analytics
Attempted 0/15 Correct 0 Score 0

Which of the following is NOT a common data cleaning technique?

  1. Data Standardization

  2. Data Normalization

  3. Data Integration

  4. Data Augmentation


Correct Option: D
Explanation:

Data Augmentation is a technique used to increase the size of a dataset by creating new data points from existing ones, while Data Standardization, Normalization, and Integration are all data cleaning techniques.

The process of converting data from one format to another is known as:

  1. Data Cleaning

  2. Data Transformation

  3. Data Integration

  4. Data Mining


Correct Option: B
Explanation:

Data Transformation involves converting data from one format to another, while Data Cleaning, Integration, and Mining are different processes in the data analysis pipeline.

Which of the following is NOT a common data transformation technique?

  1. Data Aggregation

  2. Data Filtering

  3. Data Smoothing

  4. Data Discretization


Correct Option: C
Explanation:

Data Smoothing is not a common data transformation technique, while Data Aggregation, Filtering, and Discretization are all widely used.

The process of identifying and correcting errors and inconsistencies in data is known as:

  1. Data Cleaning

  2. Data Transformation

  3. Data Integration

  4. Data Mining


Correct Option: A
Explanation:

Data Cleaning involves identifying and correcting errors and inconsistencies in data, while Data Transformation, Integration, and Mining are different processes in the data analysis pipeline.

Which of the following is NOT a common data integration technique?

  1. Data Warehousing

  2. Data Federation

  3. Data Virtualization

  4. Data Lineage


Correct Option: D
Explanation:

Data Lineage is not a common data integration technique, while Data Warehousing, Federation, and Virtualization are all widely used.

The process of combining data from multiple sources into a single, unified view is known as:

  1. Data Cleaning

  2. Data Transformation

  3. Data Integration

  4. Data Mining


Correct Option: C
Explanation:

Data Integration involves combining data from multiple sources into a single, unified view, while Data Cleaning, Transformation, and Mining are different processes in the data analysis pipeline.

Which of the following is NOT a common data mining technique?

  1. Classification

  2. Clustering

  3. Association Rule Mining

  4. Data Visualization


Correct Option: D
Explanation:

Data Visualization is not a common data mining technique, while Classification, Clustering, and Association Rule Mining are all widely used.

The process of discovering patterns and relationships in data is known as:

  1. Data Cleaning

  2. Data Transformation

  3. Data Integration

  4. Data Mining


Correct Option: D
Explanation:

Data Mining involves discovering patterns and relationships in data, while Data Cleaning, Transformation, and Integration are different processes in the data analysis pipeline.

Which of the following is NOT a common data visualization technique?

  1. Bar Charts

  2. Pie Charts

  3. Scatter Plots

  4. Heat Maps


Correct Option: D
Explanation:

Heat Maps are not a common data visualization technique, while Bar Charts, Pie Charts, and Scatter Plots are all widely used.

The process of presenting data in a graphical or visual format is known as:

  1. Data Cleaning

  2. Data Transformation

  3. Data Integration

  4. Data Visualization


Correct Option: D
Explanation:

Data Visualization involves presenting data in a graphical or visual format, while Data Cleaning, Transformation, and Integration are different processes in the data analysis pipeline.

Which of the following is NOT a common data cleaning tool?

  1. OpenRefine

  2. Trifacta Wrangler

  3. RapidMiner

  4. Talend Open Studio


Correct Option: C
Explanation:

RapidMiner is not a common data cleaning tool, while OpenRefine, Trifacta Wrangler, and Talend Open Studio are all widely used.

The process of removing duplicate data points from a dataset is known as:

  1. Data Deduplication

  2. Data Normalization

  3. Data Integration

  4. Data Mining


Correct Option: A
Explanation:

Data Deduplication involves removing duplicate data points from a dataset, while Data Normalization, Integration, and Mining are different processes in the data analysis pipeline.

Which of the following is NOT a common data transformation tool?

  1. SAS

  2. SPSS

  3. R

  4. Tableau


Correct Option: D
Explanation:

Tableau is not a common data transformation tool, while SAS, SPSS, and R are all widely used.

The process of converting categorical data into numerical data is known as:

  1. Data Encoding

  2. Data Normalization

  3. Data Integration

  4. Data Mining


Correct Option: A
Explanation:

Data Encoding involves converting categorical data into numerical data, while Data Normalization, Integration, and Mining are different processes in the data analysis pipeline.

Which of the following is NOT a common data integration tool?

  1. Informatica PowerCenter

  2. Talend Open Studio

  3. Microsoft SSIS

  4. Tableau


Correct Option: D
Explanation:

Tableau is not a common data integration tool, while Informatica PowerCenter, Talend Open Studio, and Microsoft SSIS are all widely used.

- Hide questions