Quantized models can be seamlessly deployed on Amazon SageMaker AI using a few lines of code. In this post, we explore why quantization matters—how it enables lower-cost inference, supports deployment on resource-constrained hardware, and reduces both the financial and environmental impact of modern LLMs, while preserving most of their original performance. We also take a deep dive into the principles behind PTQ and demonstrate how to quantize the model of your choice and deploy it on Amazon SageMaker.
Machine Learning
Complete ML platform with Amazon SageMaker for building, training, and deploying machine learning models at scale
This post provides a detailed architectural overview of how TrueLook built its AI-powered safety monitoring system using SageMaker AI, highlighting key technical decisions, pipeline design patterns, and MLOps best practices. You will gain valuable insights into designing scalable computer vision solutions on AWS, particularly around model training workflows, automated pipeline creation, and production deployment strategies for real-time inference.
Observe.ai developed the One Load Audit Framework (OLAF), which integrates with SageMaker to identify bottlenecks and performance issues in ML services, offering latency and throughput measurements under both static and dynamic data loads. In this blog post, you will learn how to use the OLAF utility to test and validate your SageMaker endpoint.
In this post, we demonstrate how to optimize large language model (LLM) inference on Amazon SageMaker AI using BentoML's LLM-Optimizer to systematically identify the best serving configurations for your workload.
In this post, we show you how to unify governance and metadata across Amazon SageMaker Unified Studio and Atlan through a comprehensive bidirectional integration. You’ll learn how to deploy the necessary AWS infrastructure, configure secure connections, and set up automated synchronization to maintain consistent metadata across both platforms.
Today, AWS announces Neuron SDK 2.27.0, introducing support for Trainium3 UltraServer with expanded open source components. Neuron also introduces the Neuron Explorer tools suite, Enhanced NKI with open source NKI Compiler built on MLIR (private beta), the NKI Library of optimized kernels, native PyTorch support through TorchNeuron (private beta), and Neuron DRA for Kubernetes-native resource management (private beta). These updates enable standard frameworks to run unchanged on Trainium, removing barriers for researchers to experiment and innovate. For developers requiring deeper control, the enhanced Neuron Kernel Interface (NKI) Beta 2 provides direct access to hardware-level optimizations, enabling customers to scale AI workloads with improved performance. If you're interested in early access to new NKI features and improvements, you can join the Neuron private beta program. The new SDK version is available in all AWS Regions supporting Inferentia and Trainium instances, offering enhanced performance and monitoring capabilities for machine learning workloads. For more details, see: What’s New in Neuron AWS Neuron 2.27.0 Release Notes AWS Trainium
Today, AWS announces SOCI (Seekable Open Container Initiative) indexing support for Amazon SageMaker Studio, reducing container startup times by 30-50% when using custom images. Amazon SageMaker Studio is a fully integrated, browser-based environment for end-to-end machine learning development. SageMaker Studio provides pre-built container images for popular ML frameworks like TensorFlow, PyTorch, and Scikit-learn that enable quick environment setup. However, when data scientists need to tailor environments for specific use cases with additional libraries, dependencies, or configurations, they can build and register custom container images with pre-configured components to ensure consistency across projects. As ML workloads become increasingly complex, these custom container images have grown in size, leading to startup times of several minutes that create a bottlenecks in iterative ML development where quick experimentation and rapid prototyping are essential. SOCI indexing addresses this challenge by enabling lazy loading of container images, downloading only the necessary components to start applications with additional files loaded on-demand as needed. Instead of waiting several minutes for complete custom image downloads, users can begin productive work in seconds while the environment completes initialization in the background. To use SOCI indexing, create a SOCI index for your custom container image using tools like Finch CLI, nerdctl, or Docker with SOCI CLI, push the indexed image to Amazon Elastic Container Registry (ECR), and reference the image index URI when creating SageMaker Image resources. SOCI indexing is available in all AWS Regions where Amazon SageMaker Studio is available. To learn more about implementing SOCI indexing for your SageMaker Studio custom images, see Bring your own SageMaker image in the Amazon SageMaker Developer Guide.
In this post, we show how to use Amazon SageMaker Catalog to publish data from multiple sources, including Amazon S3, Amazon Redshift, and Snowflake. This approach enables self-service access while ensuring robust data governance and metadata management.