Essential Tasks for Machine Learning Projects with Unstructured Data

A concise overview of the six vital tasks in machine learning projects handling unstructured data, covering aspects from feature engineering to workflow optimization.


Essential Tasks for Machine Learning Projects with Unstructured Data

Summary

A predictive modeling machine learning project can be divided into six main tasks, as described below using Python. These tasks are part of the prototyping process and are tailored for unstructured data, such as text, images, or video.

Task No. 1 | Define Problem

Task No. 2 | Analyze Data

Task No. 3 | Data Preparation

Task No. 4 | Evaluate Candidate Models: Baseline

Task No. 5 | Model Development

Task No. 6 | Model Evaluation and Interpretation

Note that there is overlap between these tasks and the tasks for structured data, but the subtasks and approaches are different due to the unique characteristics of unstructured data.