Skip to content

@cdklabs/genai-idp-sagemaker-udop-processor

Constructs

BasicSagemakerClassifier

A basic SageMaker-based document classifier for the Pattern 3 document processor.

This construct provides a simple way to deploy a SageMaker endpoint with a document classification model that can categorize documents based on their content and structure. It supports models like RVL-CDIP or UDOP for specialized document classification tasks.

The basic classifier includes standard auto-scaling capabilities and sensible defaults for common use cases. For more advanced configurations, consider creating your own SageMaker endpoint and passing it directly to the SagemakerUdopProcessor.

Example

const classifier = new BasicSagemakerClassifier(this, 'Classifier', {
  outputBucket: bucket,
  modelData: ModelData.fromAsset('./model'),
  instanceType: InstanceType.ML_G4DN_XLARGE,
});

const processor = new SagemakerUdopProcessor(this, 'Processor', {
  environment,
  classifierEndpoint: classifier.endpoint,
  // ... other configuration
});

Initializers

import { BasicSagemakerClassifier } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new BasicSagemakerClassifier(scope: Construct, id: string, props: BasicSagemakerClassifierProps)
Name Type Description
scope constructs.Construct No description.
id string No description.
props BasicSagemakerClassifierProps No description.

scopeRequired
  • Type: constructs.Construct

idRequired
  • Type: string

propsRequired

Methods

Name Description
toString Returns a string representation of this construct.
with Applies one or more mixins to this construct.

toString
public toString(): string

Returns a string representation of this construct.

with
public with(mixins: ...IMixin[]): IConstruct

Applies one or more mixins to this construct.

Mixins are applied in order. The list of constructs is captured at the start of the call, so constructs added by a mixin will not be visited. Use multiple with() calls if subsequent mixins should apply to added constructs.

mixinsRequired
  • Type: ...constructs.IMixin[]

The mixins to apply.


Static Functions

Name Description
isConstruct Checks if x is a construct.

isConstruct
import { BasicSagemakerClassifier } from '@cdklabs/genai-idp-sagemaker-udop-processor'

BasicSagemakerClassifier.isConstruct(x: any)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

xRequired
  • Type: any

Any object.


Properties

Name Type Description
node constructs.Node The tree node.
endpoint @aws-cdk/aws-sagemaker-alpha.IEndpoint The SageMaker endpoint that hosts the document classification model.
model @aws-cdk/aws-sagemaker-alpha.IModel The SageMaker model deployed to the endpoint.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


endpointRequired
public readonly endpoint: IEndpoint;
  • Type: @aws-cdk/aws-sagemaker-alpha.IEndpoint

The SageMaker endpoint that hosts the document classification model.


modelRequired
public readonly model: IModel;
  • Type: @aws-cdk/aws-sagemaker-alpha.IModel

The SageMaker model deployed to the endpoint.

Exposed so that additional S3 bucket permissions can be granted (e.g. working bucket access when used with the LambdaHook bridge).


SagemakerUdopProcessor

SageMaker UDOP document processor facade over UnifiedDocumentProcessor.

Creates a LambdaHook bridge that uses pageOutputUri to read page artifacts (image, Textract JSON) from S3 and calls the SageMaker endpoint for classification. All other stages delegate to the unified processor.

Initializers

import { SagemakerUdopProcessor } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessor(scope: Construct, id: string, props: SagemakerUdopProcessorProps)
Name Type Description
scope constructs.Construct No description.
id string No description.
props SagemakerUdopProcessorProps No description.

scopeRequired
  • Type: constructs.Construct

idRequired
  • Type: string

propsRequired

Methods

Name Description
toString Returns a string representation of this construct.
with Applies one or more mixins to this construct.
metricBedrockRequestsFailed No description.
metricBedrockRequestsSucceeded No description.
metricBedrockRequestsTotal No description.
metricInputDocumentPages No description.
metricInputDocuments No description.
metricInputTokens No description.
metricLambdaHookRequestsFailed No description.
metricLambdaHookRequestsSucceeded No description.
metricLambdaHookRequestsTotal No description.
metricOutputTokens No description.

toString
public toString(): string

Returns a string representation of this construct.

with
public with(mixins: ...IMixin[]): IConstruct

Applies one or more mixins to this construct.

Mixins are applied in order. The list of constructs is captured at the start of the call, so constructs added by a mixin will not be visited. Use multiple with() calls if subsequent mixins should apply to added constructs.

mixinsRequired
  • Type: ...constructs.IMixin[]

The mixins to apply.


metricBedrockRequestsFailed
public metricBedrockRequestsFailed(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBedrockRequestsSucceeded
public metricBedrockRequestsSucceeded(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBedrockRequestsTotal
public metricBedrockRequestsTotal(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricInputDocumentPages
public metricInputDocumentPages(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricInputDocuments
public metricInputDocuments(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricInputTokens
public metricInputTokens(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricLambdaHookRequestsFailed
public metricLambdaHookRequestsFailed(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricLambdaHookRequestsSucceeded
public metricLambdaHookRequestsSucceeded(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricLambdaHookRequestsTotal
public metricLambdaHookRequestsTotal(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricOutputTokens
public metricOutputTokens(props?: MetricOptions): Metric
propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

Static Functions

Name Description
isConstruct Checks if x is a construct.

isConstruct
import { SagemakerUdopProcessor } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessor.isConstruct(x: any)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

xRequired
  • Type: any

Any object.


Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.
evaluationFunction @cdklabs/genai-idp.EvaluationFunction The evaluation function if evaluation is enabled for this processor.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyRequired
public readonly maxProcessingConcurrency: number;
  • Type: number

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


stateMachineRequired
public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Manages the sequence of processing steps and handles error conditions. This state machine is triggered for each document that needs processing and coordinates the entire extraction pipeline.


evaluationFunctionOptional
public readonly evaluationFunction: EvaluationFunction;
  • Type: @cdklabs/genai-idp.EvaluationFunction

The evaluation function if evaluation is enabled for this processor.

The evaluation function is created by the ProcessingEnvironment when evaluation baseline bucket and model are provided.


Structs

BasicSagemakerClassifierProps

Configuration properties for the basic SageMaker-based document classifier.

This classifier uses a SageMaker endpoint to categorize documents based on their content and structure, enabling targeted extraction strategies.

Initializer

import { BasicSagemakerClassifierProps } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const basicSagemakerClassifierProps: BasicSagemakerClassifierProps = { ... }

Properties

Name Type Description
instanceType @aws-cdk/aws-sagemaker-alpha.InstanceType The instance type to use for the SageMaker endpoint.
modelData @aws-cdk/aws-sagemaker-alpha.ModelData The model data for the SageMaker endpoint.
outputBucket aws-cdk-lib.aws_s3.IBucket The S3 bucket where classification outputs will be stored.
key aws-cdk-lib.aws_kms.IKey Optional KMS key for encrypting classifier resources.
maxInstanceCount number The maximum number of instances for the SageMaker endpoint.
minInstanceCount number The minimum number of instances for the SageMaker endpoint.
scaleInCooldown aws-cdk-lib.Duration The cooldown period after scaling in before another scale-in action can occur.
scaleOutCooldown aws-cdk-lib.Duration The cooldown period after scaling out before another scale-out action can occur.
targetInvocationsPerInstancePerMinute number The target number of invocations per instance per minute.

instanceTypeRequired
public readonly instanceType: InstanceType;
  • Type: @aws-cdk/aws-sagemaker-alpha.InstanceType

The instance type to use for the SageMaker endpoint.

Determines the computational resources available for document classification. For deep learning models, GPU instances are typically recommended.


modelDataRequired
public readonly modelData: ModelData;
  • Type: @aws-cdk/aws-sagemaker-alpha.ModelData

The model data for the SageMaker endpoint.

Contains the trained model artifacts that will be deployed to the endpoint. This can be a pre-trained document classification model like RVL-CDIP or UDOP.


outputBucketRequired
public readonly outputBucket: IBucket;
  • Type: aws-cdk-lib.aws_s3.IBucket

The S3 bucket where classification outputs will be stored.

Contains intermediate results from the document classification process.


keyOptional
public readonly key: IKey;
  • Type: aws-cdk-lib.aws_kms.IKey

Optional KMS key for encrypting classifier resources.

When provided, ensures data security for the SageMaker endpoint and associated resources.


maxInstanceCountOptional
public readonly maxInstanceCount: number;
  • Type: number
  • Default: 4

The maximum number of instances for the SageMaker endpoint.

Controls the maximum capacity for document classification during high load.


minInstanceCountOptional
public readonly minInstanceCount: number;
  • Type: number
  • Default: 1

The minimum number of instances for the SageMaker endpoint.

Controls the baseline capacity for document classification.


scaleInCooldownOptional
public readonly scaleInCooldown: Duration;
  • Type: aws-cdk-lib.Duration
  • Default: cdk.Duration.minutes(5)

The cooldown period after scaling in before another scale-in action can occur.

Prevents rapid fluctuations in endpoint capacity.


scaleOutCooldownOptional
public readonly scaleOutCooldown: Duration;
  • Type: aws-cdk-lib.Duration
  • Default: cdk.Duration.minutes(1)

The cooldown period after scaling out before another scale-out action can occur.

Prevents rapid fluctuations in endpoint capacity.


targetInvocationsPerInstancePerMinuteOptional
public readonly targetInvocationsPerInstancePerMinute: number;
  • Type: number
  • Default: 20

The target number of invocations per instance per minute.

Used to determine when to scale the endpoint in or out.


SagemakerUdopProcessorConfigurationDefinitionOptions

Options for configuring the SageMaker UDOP processor configuration definition.

Initializer

import { SagemakerUdopProcessorConfigurationDefinitionOptions } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const sagemakerUdopProcessorConfigurationDefinitionOptions: SagemakerUdopProcessorConfigurationDefinitionOptions = { ... }

Properties

Name Type Description
assessmentModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the assessment stage.
customPromptGeneratorFunction aws-cdk-lib.aws_lambda.IFunction Optional custom prompt generator Lambda function.
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the evaluation stage.
extractionModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the extraction stage.
summarizationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the summarization stage.

assessmentModelOptional
public readonly assessmentModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the assessment stage.


customPromptGeneratorFunctionOptional
public readonly customPromptGeneratorFunction: IFunction;
  • Type: aws-cdk-lib.aws_lambda.IFunction

Optional custom prompt generator Lambda function.


evaluationModelOptional
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the evaluation stage.


extractionModelOptional
public readonly extractionModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the extraction stage.


summarizationModelOptional
public readonly summarizationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the summarization stage.


SagemakerUdopProcessorProps

Configuration properties for the SageMaker UDOP document processor facade.

Initializer

import { SagemakerUdopProcessorProps } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const sagemakerUdopProcessorProps: SagemakerUdopProcessorProps = { ... }

Properties

Name Type Description
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
classifierEndpoint @aws-cdk/aws-sagemaker-alpha.IEndpoint The SageMaker endpoint used for document classification.
configuration ISagemakerUdopProcessorConfiguration Configuration for the SageMaker UDOP document processor.
configurationBucket aws-cdk-lib.aws_s3.IBucket The S3 bucket containing configuration files.

environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyOptional
public readonly maxProcessingConcurrency: number;
  • Type: number
  • Default: 100 concurrent workflows

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


classifierEndpointRequired
public readonly classifierEndpoint: IEndpoint;
  • Type: @aws-cdk/aws-sagemaker-alpha.IEndpoint

The SageMaker endpoint used for document classification.

A LambdaHook bridge reads page artifacts from pageOutputUri and calls this endpoint for classification.


configurationRequired
public readonly configuration: ISagemakerUdopProcessorConfiguration;

Configuration for the SageMaker UDOP document processor.


configurationBucketRequired
public readonly configurationBucket: IBucket;
  • Type: aws-cdk-lib.aws_s3.IBucket

The S3 bucket containing configuration files.


Classes

SagemakerUdopProcessorConfiguration

Configuration management for SageMaker UDOP document processing.

Initializers

import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessorConfiguration(definition: ISagemakerUdopProcessorConfigurationDefinition)
Name Type Description
definition ISagemakerUdopProcessorConfigurationDefinition The configuration definition.

definitionRequired

The configuration definition.


Methods

Name Description
bind Binds the configuration to a processor scope.

bind
public bind(scope: Construct, environment: IProcessingEnvironment): ISagemakerUdopProcessorConfigurationDefinition

Binds the configuration to a processor scope.

Writes the default configuration to the configuration table.

scopeRequired
  • Type: constructs.Construct

environmentRequired
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

Static Functions

Name Description
fromFile No description.
rvlCdipPackageSample No description.

fromFile
import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfiguration.fromFile(filePath: string, options?: SagemakerUdopProcessorConfigurationDefinitionOptions)
filePathRequired
  • Type: string

optionsOptional

rvlCdipPackageSample
import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfiguration.rvlCdipPackageSample(options?: SagemakerUdopProcessorConfigurationDefinitionOptions)
optionsOptional

Properties

Name Type Description
definition ISagemakerUdopProcessorConfigurationDefinition The configuration definition.

definitionRequired
public readonly definition: ISagemakerUdopProcessorConfigurationDefinition;

The configuration definition.


SagemakerUdopProcessorConfigurationDefinition

Configuration definition for SageMaker UDOP document processing.

Delegates to UnifiedDocumentProcessorConfigurationDefinition for loading configs from the unified config library. Maps SageMaker-specific options to unified options. Classification is handled by the SageMaker endpoint via LambdaHook, not by a Bedrock model.

Initializers

import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessorConfigurationDefinition()
Name Type Description

Static Functions

Name Description
fromFile Creates a configuration from a custom YAML file.
rvlCdipPackageSample RVL-CDIP package sample preset.

fromFile
import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfigurationDefinition.fromFile(filePath: string, options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

Creates a configuration from a custom YAML file.

filePathRequired
  • Type: string

optionsOptional

rvlCdipPackageSample
import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfigurationDefinition.rvlCdipPackageSample(options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

RVL-CDIP package sample preset.

optionsOptional

Protocols

ISagemakerUdopProcessor

Interface for SageMaker UDOP document processor implementation.

Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.
evaluationFunction @cdklabs/genai-idp.EvaluationFunction The evaluation function if evaluation is enabled for this processor.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyRequired
public readonly maxProcessingConcurrency: number;
  • Type: number

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


stateMachineRequired
public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Manages the sequence of processing steps and handles error conditions. This state machine is triggered for each document that needs processing and coordinates the entire extraction pipeline.


evaluationFunctionOptional
public readonly evaluationFunction: EvaluationFunction;
  • Type: @cdklabs/genai-idp.EvaluationFunction

The evaluation function if evaluation is enabled for this processor.

The evaluation function is created by the ProcessingEnvironment when evaluation baseline bucket and model are provided.


ISagemakerUdopProcessorConfiguration

Interface for SageMaker UDOP processor configuration.

Methods

Name Description
bind Binds the configuration to a processor scope.

bind
public bind(scope: Construct, environment: IProcessingEnvironment): ISagemakerUdopProcessorConfigurationDefinition

Binds the configuration to a processor scope.

Writes the default configuration to the configuration table.

scopeRequired
  • Type: constructs.Construct

environmentRequired
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

Properties

Name Type Description
definition ISagemakerUdopProcessorConfigurationDefinition The configuration definition.

definitionRequired
public readonly definition: ISagemakerUdopProcessorConfigurationDefinition;

The configuration definition.


ISagemakerUdopProcessorConfigurationDefinition

Interface for SageMaker UDOP processor configuration definition.

Properties

Name Type Description
ocrBackend string No description.
assessmentInferenceProvider @cdklabs/genai-idp.IInvokable No description.
customPromptGenerator aws-cdk-lib.aws_lambda.IFunction No description.
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable No description.
extractionInferenceProvider @cdklabs/genai-idp.IInvokable No description.
summarizationInferenceProvider @cdklabs/genai-idp.IInvokable No description.

ocrBackendRequired
public readonly ocrBackend: string;
  • Type: string

assessmentInferenceProviderOptional
public readonly assessmentInferenceProvider: IInvokable;
  • Type: @cdklabs/genai-idp.IInvokable

customPromptGeneratorOptional
public readonly customPromptGenerator: IFunction;
  • Type: aws-cdk-lib.aws_lambda.IFunction

evaluationModelOptional
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

extractionInferenceProviderOptional
public readonly extractionInferenceProvider: IInvokable;
  • Type: @cdklabs/genai-idp.IInvokable

summarizationInferenceProviderOptional
public readonly summarizationInferenceProvider: IInvokable;
  • Type: @cdklabs/genai-idp.IInvokable