Machine Learning

Complete ML platform with Amazon SageMaker for building, training, and deploying machine learning models at scale

9 updates

Introducing Amazon SageMaker Data Agent for analytics and AI/ML development

Amazon SageMaker introduces a built-in AI agent that accelerates the development of data analytics and machine learning (ML) applications. SageMaker Data Agent is available in the new notebook experience in Amazon SageMaker Unified Studio and helps data engineers, analysts, and data scientists who spend significant time on manual setup tasks and boilerplate code when building analytics and ML applications. The agent generates code and execution plans from natural language prompts and integrates with data catalogs and business metadata to streamline the development process. SageMaker Data Agent works within the new notebook experience to break down complex analytics and ML tasks into manageable steps. Customers can describe objectives in natural language and the agent creates a detailed execution plan and generates the required SQL and Python code. The agent maintains awareness of the notebook context, including available data sources and catalog information, accelerating common tasks including data transformation, statistical analysis, and model development. To get started, log in to Amazon SageMaker and click on “Notebooks” on the left navigation. Amazon SageMaker Data Agent is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney). To learn more, read the AWS News Blog or visit the Amazon SageMaker documentation.

sagemakerunified studiolex

#sagemaker#unified studio#lex#ga

Announcing notebooks with a built-in AI agent in Amazon SageMaker

Amazon SageMaker introduces a new notebook experience that provides data and AI teams a high-performance, serverless programming environment for analytics and machine learning (ML) jobs. This helps customers quickly get started working with data without pre-provisioning data processing infrastructure. The new notebook gives data engineers, analysts, and data scientists one place to perform SQL queries, execute Python code, process large-scale data jobs, run ML workloads and create visualizations. A built-in AI agent accelerates development by generating code and SQL statements from natural language prompts while it guides users through their tasks. The notebook is backed by Amazon Athena for Apache Spark to deliver high-performance results, scaling from interactive SQL queries to petabyte-scale data processing. It’s available in the new one-click onboarding experience for Amazon SageMaker Unified Studio. Data engineers, analysts, and data scientists can flexibly combine SQL, Python, and natural language within a single interactive workspace. This removes the need to switch between different tools based on your workload. For example, you can start with SQL queries to explore your data, use Python for advanced analytics or to build ML models, or use natural language prompts to generate code automatically using the built-in AI agent. To get started, sign in to the console, find SageMaker, open SageMaker Unified Studio, and go to "Notebooks" in the navigation. You can use the SageMaker notebook feature in the following Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney). To learn more, read the AWS News Blog or see SageMaker documentation.

sagemakerunified studiolexathena

#sagemaker#unified studio#lex#athena#ga

Amazon CloudWatch Container Insights now supports Neuron UltraServers on Amazon EKS

Amazon CloudWatch Container Insights now supports Neuron UltraServers on Amazon EKS, providing enhanced observability for customers running large-scale, high-performance machine learning workloads on multi-instance nodes. This new capability enables data scientists and ML engineers to efficiently monitor and troubleshoot their containerized ML applications, offering aggregated metrics and simplified management across Neuron UltraServer groups. Neuron UltraServers combine multiple EC2 instances into a single logical server unit, optimized for machine learning workloads using AWS Trainium and Inferentia accelerators. Container Insights, a monitoring and diagnostics feature in Amazon CloudWatch, automatically collects metrics from containerized applications. With this launch, Container Insights introduces a new filter specifically for UltraServers in EKS environments. You can now select an UltraServer ID to view new aggregate metrics across all instances within that server, replacing the need to monitor individual instances separately. In addition to per-instance metrics, you can now view consolidated performance data for the entire UltraServer group, streamlining the monitoring of ML workloads running on AWS Neuron. Amazon CloudWatch Container Insights is available in all commercial AWS Regions, and the AWS GovCloud (US). To get started, see AWS Neuron metrics for AWS Trainium and AWS Inferentia in the Amazon CloudWatch User Guide

trainiuminferentianeuronec2eks+1 more

#trainium#inferentia#neuron#ec2#eks#cloudwatch

Enforce business glossary classification rules in Amazon SageMaker Catalog

Amazon SageMaker Catalog now supports metadata enforcement rules for glossary terms classification (tagging) at the asset level. With this capability, administrators can require that assets include specific business terms or classifications. Data producers must apply required glossary terms or classifications before an asset can be published. In this post, we show how to enforce business glossary classification rules in SageMaker Catalog.

sagemaker

#sagemaker#support

Amazon Kinesis Data Streams now supports up to 50 enhanced fan-out consumers

Amazon Kinesis Data Streams now supports 50 enhanced fan-out consumers for On-demand Advantage streams. A higher fan-out limit lets customers attach many more independent, low-latency, high-throughput consumers to the same stream—unlocking parallel analytics, ML pipelines, compliance workflows, and multi-team architectures without creating extra streams or causing throughput contention. On-demand Advantage is an account-level setting that unlocks more capabilities and provides a different pricing structure for all on-demand streams in an AWS Region. On-demand Advantage offers data usage with 60% lower pricing compared to On-demand Standard, with data ingest at $0.032/GB, data retrieval at $0.016/GB, and enhanced fan-out data retrieval at $0.016/GB. High fan-out workloads are most cost effective with On-demand Advantage. Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale. Enhanced fan-out is an Amazon Kinesis Data Streams feature that enables consumers to receive records from a data stream with dedicated throughput of up to 2 MB of data per second per shard, and this throughput automatically scales with the number of shards in a stream. A consumer that uses enhanced fan-out doesn't have to contend with other consumers that are receiving data from the stream. For accounts with On-demand Advantage enabled, you can continue to use the existing Kinesis API RegisterStreamConsumer to register new consumers to use enhanced fan-out up to the new 50 limit. Support for enhanced fan-out consumers is available in the AWS Regions listed here. For more information on Kinesis Data Streams quotas and limits, please see our documentation. For more information on On-demand Advantage, please see our documentation for On-demand Advantage.

rdskinesis

#rds#kinesis#support

Getting started with Amazon S3 Tables in Amazon SageMaker Unified Studio

In this post, you learn how to integrate SageMaker Unified Studio with S3 Tables and query your data using Amazon Athena, Amazon Redshift, or Apache Spark in EMR and AWS Glue.

sagemakerunified studios3emrredshift+2 more

#sagemaker#unified studio#s3#emr#redshift#glue

How Yelp modernized its data infrastructure with a streaming lakehouse on AWS

This is a guest post by Umesh Dangat, Senior Principal Engineer for Distributed Services and Systems at Yelp, and Toby Cole, Principle Engineer for Data Processing at Yelp, in partnership with AWS. Yelp processes massive amounts of user data daily—over 300 million business reviews, 100,000 photo uploads, and countless check-ins. Maintaining sub-minute data freshness with […]

#ga

Use trusted identity propagation for Apache Spark interactive sessions in Amazon SageMaker Unified Studio

In this post, we provide step-by-step instructions to set up Amazon EMR on EC2, EMR Serverless, and AWS Glue within SageMaker Unified Studio, enabled with trusted identity propagation. We use the setup to illustrate how different IAM Identity Center users can run their Spark sessions, using each compute setup, within the same project in SageMaker Unified Studio. We show how each user will see only tables or part of tables that they’re granted access to in Lake Formation.

sagemakerunified studioec2emriam+2 more

#sagemaker#unified studio#ec2#emr#iam#iam identity center

Federate access to SageMaker Unified Studio with AWS IAM Identity Center and Okta

This post shows step-by-step guidance to setup workforce access to Amazon SageMaker Unified Studio using Okta as an external Identity provider with AWS IAM Identity Center.

sagemakerunified studioiamiam identity center

#sagemaker#unified studio#iam#iam identity center