创建异步推理端点

创建异步终端节点的方式与使用 SageMaker 托管服务创建终端节点的方式相同：

使用在 SageMaker 中创建模型CreateModel.
使用创建端点配置CreateEndpointConfig.
使用创建 HTTPS 终端节点CreateEndpoint.

要创建终端节点，首先需要使用CreateModel，您可以在其中指向模型工件和 Docker 注册表路径（图像）。然后，您可以使用CreateEndpointConfig您可以在其中指定一个或多个使用CreateModel要部署的 API 以及您希望 SageMaker 预配置的资源。使用创建终端节点CreateEndpoint使用在请求中指定的终端节点配置。您可以使用UpdateEndpointAPI。通过以下方式发送和接收来自端点托管的模型的推理请求InvokeEndpointAsync. 您可以使用DeleteEndpointAPI。

有关可用 SageMaker 映像的完整列表，请参阅可用的 Deep Learning Containers 映像. 请参阅使用您自己的推理代码了解有关如何创建 Docker 映像的信息。

创建模型

以下示例显示了如何使用Amazon SDK for Python (Boto3). 前几行定义：

sagemaker_client：一个低级 SageMaker 客户端对象，可以轻松向Amazon服务。
sagemaker_role：具有 SageMaker IAM 角色的字符串变量：Amazon Resource Name (ARN)。
aws_region: 具有你的名字的字符串变量Amazon区域。


import boto3

# Specify your AWS Region
aws_region='<aws_region>'

# Create a low-level SageMaker service client.
sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# Role to give SageMaker permission to access Amazon services.
sagemaker_role= "arn:aws:iam::<account>:role/*"

下一步，指定存储在 Amazon S3 中的预先训练的模型的位置。在本例中，我们使用的是预先训练的 XGBoost 模型demo-xgboost-model.tar.gz. 完整的 Amazon S3 URI 存储在字符串变量中model_url：


#Create a variable w/ the model S3 URI
s3_bucket = '<your-bucket-name>' # Provide the name of your S3 bucket
bucket_prefix='saved_models'
model_s3_key = f"{bucket_prefix}/demo-xgboost-model.tar.gz"

#Specify S3 bucket w/ model
model_url = f"s3://{s3_bucket}/{model_s3_key}"

指定主容器。对于主容器，您可以指定包含推理代码的 Docker 映像、构件（来自先前的培训）以及推理代码在您部署预测模型时使用的自定义环境贴图。

在此示例中，我们指定了 xgBoost 内置算法容器镜像：


from sagemaker import image_uris

# Specify an AWS container image. 
container = image_uris.retrieve(region=aws_region, framework='xgboost', version='0.90-1')

使用在 Amazon SageMaker 中创建模型CreateModel. 指定以下内容：

ModelName：模型的名称（在本例中，它存储为字符串变量，名为model_name)。
ExecutionRoleArn：IAM 角色的 Amazon 资源名称 (ARN)，Amazon SageMaker 可代入此角色以访问模型项目和 Docker 映像以在 ML 计算实例上进行部署或批量转换作业。
PrimaryContainer：主要 Docker 映像的位置，该映像包含推理代码、关联构件和自定义环境贴图（在部署模型以进行预测时，推理代码将使用此贴图）。


model_name = '<The_name_of_the_model>'

#Create model
create_model_response = sagemaker_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': model_url,
    })

请参阅CreateModel有关 API 参数的完整列表，请参阅 SageMaker API 参考指南中的描述。

创建终端节点配置

拥有模型后，使用创建端点配置CreateEndpointConfig. Amazon SageMaker 托管服务使用此配置来部署模型。在配置中，您可以标识使用CreateModel，以部署您希望 Amazon SageMaker 预配置的资源。指定AsyncInferenceConfig对象并提供输出 Amazon S3 位置OutputConfig. 您可以选择指定Amazon SNS发送有关预测结果的通知的主题。有关 Amazon SNS 主题的更多信息，请参阅配置 Amazon SNS.

以下示例说明了如何使用创建终端节点配置。Amazon SDK for Python (Boto3)：


import datetime
from time import gmtime, strftime

# Create an endpoint config name. Here we create one based on the date  
# so it we can search endpoints based on creation time.
endpoint_config_name = f"XGBoostEndpointConfig-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

# The name of the model that you want to host. This is the name that you specified when creating the model.
model_name='<The_name_of_your_model>'

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name, # You will specify this name in a CreateEndpoint request.
    # List of ProductionVariant objects, one for each model that you want to host at this endpoint.
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": "ml.m5.xlarge", # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            # Location to upload response outputs when no location is provided in the request.
            "S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output"
            # (Optional) specify Amazon SNS topics
            "NotificationConfig": {
                "SuccessTopic": "arn:aws:sns:aws-region:account-id:topic-name",
                "ErrorTopic": "arn:aws:sns:aws-region:account-id:topic-name",
            }
        },
        "ClientConfig": {
            # (Optional) Specify the max number of inflight invocations per instance
            # If no value is provided, Amazon SageMaker will choose an optimal value for you
            "MaxConcurrentInvocationsPerInstance": 4
        }
    }
)

print(f"Created EndpointConfig: {create_endpoint_config_response['EndpointConfigArn']}")

在上述示例中，您可以指定以下键OutputConfig(对于 )AsyncInferenceConfig字段：

S3OutputPath：请求中未提供位置时上传响应输出的位置。
NotificationConfig:（可选）推理请求成功时向您发布通知的 SNS 主题 (SuccessTopic) 或者如果失败了 (ErrorTopic)。

您也可以指定以下可选参数ClientConfig中的AsyncInferenceConfig字段：

MaxConcurrentInvocationsPerInstance：（可选）SageMaker 客户端向模型容器发送的并发请求的最大数量。

创建终端节点

拥有模型和终端节点配置后，请使用CreateEndpoint用于创建终端节点的 API。终端节点名称在Amazon您的区域Amazonaccount.

以下操作将使用在请求中指定的终端节点配置创建终端节点。Amazon SageMaker 使用终端节点来配置资源和部署模型。


# The name of the endpoint.The name must be unique within an AWS Region in your AWS account.
endpoint_name = '<endpoint-name>' 

# The name of the endpoint configuration associated with this endpoint.
endpoint_config_name='<endpoint-config-name>'

create_endpoint_response = sagemaker_client.create_endpoint(
                                            EndpointName=endpoint_name, 
                                            EndpointConfigName=endpoint_config_name)

当你打电话给CreateEndpointAPI，Amazon SageMaker 异步推理会发送测试通知，以检查您是否配置了 Amazon SNS 主题。这可以让 SageMaker 检查您是否拥有所需的权限。可以简单地忽略该通知。测试通知具有以下形式：


{
    "eventVersion":"1.0",
    "eventSource":"aws:sagemaker",
    "eventName":"TestNotification"
}

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

先决条件

Invoke