Data blending is the process of merging data from multiple sources into a single dataset. It is typically accomplished through the traditional process of extract, transform, load (ETL) or the modern approach of extract, load, transfer (ELT) technique. Activities include:
- Consolidating closely linked datasets from different sources to gain a single view of an issue, e.g., equipment performance,
- Consolidating somewhat disparate datasets to create additional dimensions for modeling,
- Combining datasets and keeping only their intersection to identify shared features,
- Supplementing master data about a particular issue with data from other sources to create additional aspects for modeling, e.g., adding whether data to a dataset about purchases,
- Correcting missing, incomplete, and outdated records,
- Aggregating values using arithmetic functions such as SUM, COUNT, MIN, MAX and AVERAGE, and
- Creating subsets by filtering or sampling records or by splitting the dataset.