The five data structures described below are sometimes reduced to two group: structured data and unstructured data.
In practice, the different data structures are often found together. Consider a help desk system. Its relational database management system may store structured data such as time stamps, user ID, type of problem, etc as well as unstructured data such as such free-form text from chat history or emails communication regarding the problem.
Structured Data #
Structured data has a tabular format; contains a defined data type, format and structure, e.g., CSV files, simple spreadsheets, database management systems.
Unstructured Data #
Traditional unstructured data: exists as a free entity and has no intrinsic structure, e.g., audio files, video files, PDF documents
Semi-structured data: textual data with no predefined data model, but nonetheless has a
noticeable format, e.g., XML and JSON files
Metadata: data about elements in a dataset, e.g., metadata about a photo can show where and when the photo was taken
Quasi-structured data: textual data with inconsistent data values and formats that can be formatted with time, effort and the right tools, e.g., clickstream data
Why Understanding Data Structures is Important #
The structure of data determines the techniques and tools required to process and analyze it, e.g., unstructured data may require distributed computing environments and massively parallel processing architectures.