ChatGPT3で箇条書きメモからレポート文を出力して何となく見えてきたNotionAIとChatGPTそれぞれの活用スタイル #ChatGPT #NotionAI

ChatGPT3で箇条書きメモからレポート文を出力して何となく見えてきたNotionAIとChatGPTそれぞれの活用スタイル #ChatGPT #NotionAI

Clock Icon2023.04.26







Data-Centric Approach and Amazon SageMaker

Machine learning (ML) is a powerful tool that allows us to build predictive models using large amounts of data. However, traditional approaches to machine learning, such as the model-centric approach, do not always consider the quality of the data being used. In contrast, the data-centric approach focuses on improving the quality of the data, which can lead to significant improvements in model accuracy. In this report, we will explore the benefits of the data-centric approach and how Amazon SageMaker can help data scientists implement it.

Benefits of the Data-Centric Approach
The data-centric approach involves cleaning and improving the quality of the data, such as by formatting and addressing missing values. Research has shown that there can be up to a 10% difference in accuracy between clean and dirty data. The machine learning workflow includes three steps: data processing, model development, and deployment. Data processing involves data collection, labeling, exploration, and feature engineering. Model development includes preprocessing, model selection, training, tuning, and evaluation. Deployment involves deploying the model. However, the reasons why machine learning projects fail are due to insufficient data quality and the absence of specialized professionals such as data scientists. To build a successful machine learning project, it is important to have knowledge of machine learning, the ability to quickly try and improve, and a focus on differentiation.

Data Processing and Model Development
Automated machine learning (AutoML) technologies such as SageMaker can help automate and standardize the process from model development to deployment. SageMaker Data Wrangler provides a quick and easy way to prepare data for machine learning by improving data quality and exploration, as well as enriching data. SageMaker Autopilot automatically creates machine learning models with complete visibility into the model development process. It also automatically selects the type of prediction and can be linked with SageMaker Data Wrangler for model deployment. SageMaker Canvas generates accurate predictions without requiring the generation of prediction code.

Unstructured Data Pattern and Amazon SageMaker
SageMaker Ground Truth creates high-quality datasets for machine learning, particularly for unstructured data such as images, documents, and speech. SageMaker JumpStart offers pre-built solutions to common machine learning problems.

The data-centric approach offers a significant advantage over the model-centric approach, which does not consider the quality of the data used. By focusing on data quality, the accuracy of machine learning models can be significantly improved. Amazon SageMaker provides a suite of tools to help data scientists automate and standardize the machine learning workflow, from data processing to model development and deployment.


機械学習(ML)は、大量のデータを使用して予測モデルを構築することを可能にする強力なツールです。ただし、モデル中心アプローチなどの従来の機械学習手法では、使用されるデータの品質を常に考慮しているわけではありません。対照的に、データ中心アプローチは、データの品質を改善することに焦点を当てており、これによりモデルの精度が大幅に向上することがあります。このレポートでは、データ中心アプローチの利点と、Amazon SageMakerがデータサイエンティストが実装するのを助ける方法について探究します。










facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.