Skip to content

@cdklabs/genai-idp-sagemaker-udop-processor

Constructs

BasicSagemakerClassifier

A basic SageMaker-based document classifier for the Pattern 3 document processor.

This construct provides a simple way to deploy a SageMaker endpoint with a document classification model that can categorize documents based on their content and structure. It supports models like RVL-CDIP or UDOP for specialized document classification tasks.

The basic classifier includes standard auto-scaling capabilities and sensible defaults for common use cases. For more advanced configurations, consider creating your own SageMaker endpoint and passing it directly to the SagemakerUdopProcessor.

Example

const classifier = new BasicSagemakerClassifier(this, 'Classifier', {
  outputBucket: bucket,
  modelData: ModelData.fromAsset('./model'),
  instanceType: InstanceType.ML_G4DN_XLARGE,
});

const processor = new SagemakerUdopProcessor(this, 'Processor', {
  environment,
  classifierEndpoint: classifier.endpoint,
  // ... other configuration
});

Initializers

import { BasicSagemakerClassifier } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new BasicSagemakerClassifier(scope: Construct, id: string, props: BasicSagemakerClassifierProps)
Name Type Description
scope constructs.Construct No description.
id string No description.
props BasicSagemakerClassifierProps No description.

scopeRequired
  • Type: constructs.Construct

idRequired
  • Type: string

propsRequired

Methods

Name Description
toString Returns a string representation of this construct.
with Applies one or more mixins to this construct.

toString
public toString(): string

Returns a string representation of this construct.

with
public with(mixins: ...IMixin[]): IConstruct

Applies one or more mixins to this construct.

Mixins are applied in order. The list of constructs is captured at the start of the call, so constructs added by a mixin will not be visited. Use multiple with() calls if subsequent mixins should apply to added constructs.

mixinsRequired
  • Type: ...constructs.IMixin[]

The mixins to apply.


Static Functions

Name Description
isConstruct Checks if x is a construct.

isConstruct
import { BasicSagemakerClassifier } from '@cdklabs/genai-idp-sagemaker-udop-processor'

BasicSagemakerClassifier.isConstruct(x: any)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

xRequired
  • Type: any

Any object.


Properties

Name Type Description
node constructs.Node The tree node.
endpoint @aws-cdk/aws-sagemaker-alpha.IEndpoint The SageMaker endpoint that hosts the document classification model.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


endpointRequired
public readonly endpoint: IEndpoint;
  • Type: @aws-cdk/aws-sagemaker-alpha.IEndpoint

The SageMaker endpoint that hosts the document classification model.

This endpoint is invoked during document processing to determine document types and categories.


SagemakerUdopProcessor

SageMaker UDOP document processor implementation that uses specialized models for document processing.

Initializers

import { SagemakerUdopProcessor } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessor(scope: Construct, id: string, props: SagemakerUdopProcessorProps)
Name Type Description
scope constructs.Construct No description.
id string No description.
props SagemakerUdopProcessorProps No description.

scopeRequired
  • Type: constructs.Construct

idRequired
  • Type: string

propsRequired

Methods

Name Description
toString Returns a string representation of this construct.
with Applies one or more mixins to this construct.
metricClassificationRequestsTotal Creates a CloudWatch metric for total classification requests.
metricInputDocumentPages Creates a CloudWatch metric for input document pages processed.
metricInputDocuments Creates a CloudWatch metric for input documents processed.

~~toString~~
public toString(): string

Returns a string representation of this construct.

~~with~~
public with(mixins: ...IMixin[]): IConstruct

Applies one or more mixins to this construct.

Mixins are applied in order. The list of constructs is captured at the start of the call, so constructs added by a mixin will not be visited. Use multiple with() calls if subsequent mixins should apply to added constructs.

mixinsRequired
  • Type: ...constructs.IMixin[]

The mixins to apply.


~~metricClassificationRequestsTotal~~
public metricClassificationRequestsTotal(props?: MetricOptions): Metric

Creates a CloudWatch metric for total classification requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

Optional metric configuration properties.


~~metricInputDocumentPages~~
public metricInputDocumentPages(props?: MetricOptions): Metric

Creates a CloudWatch metric for input document pages processed.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

Optional metric configuration properties.


~~metricInputDocuments~~
public metricInputDocuments(props?: MetricOptions): Metric

Creates a CloudWatch metric for input documents processed.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

Optional metric configuration properties.


Static Functions

Name Description
isConstruct Checks if x is a construct.

~~isConstruct~~
import { SagemakerUdopProcessor } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessor.isConstruct(x: any)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

xRequired
  • Type: any

Any object.


Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure resources.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.

~~node~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

This processor implements an intelligent document processing workflow that uses specialized models like UDOP (Unified Document Processing) or RVL-CDIP deployed on SageMaker for document classification, followed by foundation models for information extraction.

SageMaker UDOP Processor is ideal for specialized document types that require custom classification models beyond what's possible with foundation models alone, such as complex forms, technical documents, or domain-specific content. It provides the highest level of customization for document classification while maintaining the flexibility of foundation models for extraction.

public readonly node: Node;
  • Type: constructs.Node

The tree node.


~~environment~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

This processor implements an intelligent document processing workflow that uses specialized models like UDOP (Unified Document Processing) or RVL-CDIP deployed on SageMaker for document classification, followed by foundation models for information extraction.

SageMaker UDOP Processor is ideal for specialized document types that require custom classification models beyond what's possible with foundation models alone, such as complex forms, technical documents, or domain-specific content. It provides the highest level of customization for document classification while maintaining the flexibility of foundation models for extraction.

public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure resources.

Includes buckets, tables, API, encryption, and VPC configuration used by all processing functions within this processor.


~~maxProcessingConcurrency~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

This processor implements an intelligent document processing workflow that uses specialized models like UDOP (Unified Document Processing) or RVL-CDIP deployed on SageMaker for document classification, followed by foundation models for information extraction.

SageMaker UDOP Processor is ideal for specialized document types that require custom classification models beyond what's possible with foundation models alone, such as complex forms, technical documents, or domain-specific content. It provides the highest level of customization for document classification while maintaining the flexibility of foundation models for extraction.

public readonly maxProcessingConcurrency: number;
  • Type: number
  • Default: 100

The maximum number of documents that can be processed concurrently.

Controls the parallelism of the Step Functions state machine to balance throughput against resource consumption.


~~stateMachine~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

This processor implements an intelligent document processing workflow that uses specialized models like UDOP (Unified Document Processing) or RVL-CDIP deployed on SageMaker for document classification, followed by foundation models for information extraction.

SageMaker UDOP Processor is ideal for specialized document types that require custom classification models beyond what's possible with foundation models alone, such as complex forms, technical documents, or domain-specific content. It provides the highest level of customization for document classification while maintaining the flexibility of foundation models for extraction.

public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Coordinates OCR, classification, extraction, assessment, summarization, and evaluation steps in the correct sequence.


Structs

BasicSagemakerClassifierProps

Configuration properties for the basic SageMaker-based document classifier.

This classifier uses a SageMaker endpoint to categorize documents based on their content and structure, enabling targeted extraction strategies.

Initializer

import { BasicSagemakerClassifierProps } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const basicSagemakerClassifierProps: BasicSagemakerClassifierProps = { ... }

Properties

Name Type Description
instanceType @aws-cdk/aws-sagemaker-alpha.InstanceType The instance type to use for the SageMaker endpoint.
modelData @aws-cdk/aws-sagemaker-alpha.ModelData The model data for the SageMaker endpoint.
outputBucket aws-cdk-lib.aws_s3.IBucket The S3 bucket where classification outputs will be stored.
key aws-cdk-lib.aws_kms.IKey Optional KMS key for encrypting classifier resources.
maxInstanceCount number The maximum number of instances for the SageMaker endpoint.
minInstanceCount number The minimum number of instances for the SageMaker endpoint.
scaleInCooldown aws-cdk-lib.Duration The cooldown period after scaling in before another scale-in action can occur.
scaleOutCooldown aws-cdk-lib.Duration The cooldown period after scaling out before another scale-out action can occur.
targetInvocationsPerInstancePerMinute number The target number of invocations per instance per minute.

instanceTypeRequired
public readonly instanceType: InstanceType;
  • Type: @aws-cdk/aws-sagemaker-alpha.InstanceType

The instance type to use for the SageMaker endpoint.

Determines the computational resources available for document classification. For deep learning models, GPU instances are typically recommended.


modelDataRequired
public readonly modelData: ModelData;
  • Type: @aws-cdk/aws-sagemaker-alpha.ModelData

The model data for the SageMaker endpoint.

Contains the trained model artifacts that will be deployed to the endpoint. This can be a pre-trained document classification model like RVL-CDIP or UDOP.


outputBucketRequired
public readonly outputBucket: IBucket;
  • Type: aws-cdk-lib.aws_s3.IBucket

The S3 bucket where classification outputs will be stored.

Contains intermediate results from the document classification process.


keyOptional
public readonly key: IKey;
  • Type: aws-cdk-lib.aws_kms.IKey

Optional KMS key for encrypting classifier resources.

When provided, ensures data security for the SageMaker endpoint and associated resources.


maxInstanceCountOptional
public readonly maxInstanceCount: number;
  • Type: number
  • Default: 4

The maximum number of instances for the SageMaker endpoint.

Controls the maximum capacity for document classification during high load.


minInstanceCountOptional
public readonly minInstanceCount: number;
  • Type: number
  • Default: 1

The minimum number of instances for the SageMaker endpoint.

Controls the baseline capacity for document classification.


scaleInCooldownOptional
public readonly scaleInCooldown: Duration;
  • Type: aws-cdk-lib.Duration
  • Default: cdk.Duration.minutes(5)

The cooldown period after scaling in before another scale-in action can occur.

Prevents rapid fluctuations in endpoint capacity.


scaleOutCooldownOptional
public readonly scaleOutCooldown: Duration;
  • Type: aws-cdk-lib.Duration
  • Default: cdk.Duration.minutes(1)

The cooldown period after scaling out before another scale-out action can occur.

Prevents rapid fluctuations in endpoint capacity.


targetInvocationsPerInstancePerMinuteOptional
public readonly targetInvocationsPerInstancePerMinute: number;
  • Type: number
  • Default: 20

The target number of invocations per instance per minute.

Used to determine when to scale the endpoint in or out.


SagemakerUdopProcessorConfigurationDefinitionOptions

Options for configuring the SageMaker UDOP processor configuration definition.

Allows customization of extraction, evaluation, and summarization stages.

Initializer

import { SagemakerUdopProcessorConfigurationDefinitionOptions } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const sagemakerUdopProcessorConfigurationDefinitionOptions: SagemakerUdopProcessorConfigurationDefinitionOptions = { ... }

Properties

Name Type Description
assessmentModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional invokable model used for evaluating assessment results.
customPromptGeneratorFunction aws-cdk-lib.aws_lambda.IFunction Optional custom prompt generator Lambda function.
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional configuration for the evaluation stage.
extractionModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional configuration for the extraction stage.
summarizationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional configuration for the summarization stage.

assessmentModelOptional
public readonly assessmentModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable
  • Default: as defined in the definition file

Optional invokable model used for evaluating assessment results.

Can be a Bedrock foundation model, Bedrock inference profile, or custom model. Used to assess the quality and accuracy of extracted information by comparing assessment results against expected values.


customPromptGeneratorFunctionOptional
public readonly customPromptGeneratorFunction: IFunction;
  • Type: aws-cdk-lib.aws_lambda.IFunction

Optional custom prompt generator Lambda function.

When provided, the function ARN will be injected into the configuration at extraction.custom_prompt_lambda_arn.


evaluationModelOptional
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional configuration for the evaluation stage.

Defines the model and parameters used for evaluating extraction accuracy.


extractionModelOptional
public readonly extractionModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional configuration for the extraction stage.

Defines the model and parameters used for information extraction.


summarizationModelOptional
public readonly summarizationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional configuration for the summarization stage.

Defines the model and parameters used for generating document summaries.


SagemakerUdopProcessorProps

Configuration properties for the SageMaker UDOP document processor.

Initializer

import { SagemakerUdopProcessorProps } from '@cdklabs/genai-idp-sagemaker-udop-processor'

const sagemakerUdopProcessorProps: SagemakerUdopProcessorProps = { ... }

Properties

Name Type Description
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
classifierEndpoint @aws-cdk/aws-sagemaker-alpha.IEndpoint The SageMaker endpoint used for document classification.
configuration ISagemakerUdopProcessorConfiguration Configuration for the SageMaker UDOP document processor.
assessmentGuardrail @aws-cdk/aws-bedrock-alpha.IGuardrail Optional Bedrock guardrail to apply to assessment model interactions.
classificationGuardrail @aws-cdk/aws-bedrock-alpha.IGuardrail Optional Bedrock guardrail to apply to classification model interactions.
enableEditSections boolean Enable edit sections feature for classification updates.
evaluationBaselineBucket aws-cdk-lib.aws_s3.IBucket Optional S3 bucket containing baseline documents for evaluation.
evaluationEnabled boolean Controls whether extraction results are evaluated for accuracy.
extractionGuardrail @aws-cdk/aws-bedrock-alpha.IGuardrail Optional Bedrock guardrail to apply to extraction model interactions.
ocrMaxWorkers number The maximum number of concurrent workers for OCR processing.
sectionSplittingStrategy @cdklabs/genai-idp.SectionSplittingStrategy Section splitting strategy configuration.
summarizationGuardrail @aws-cdk/aws-bedrock-alpha.IGuardrail Optional Bedrock guardrail to apply to summarization model interactions.

~~environment~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


~~maxProcessingConcurrency~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly maxProcessingConcurrency: number;
  • Type: number
  • Default: 100 concurrent workflows

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


~~classifierEndpoint~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly classifierEndpoint: IEndpoint;
  • Type: @aws-cdk/aws-sagemaker-alpha.IEndpoint

The SageMaker endpoint used for document classification.

Determines document types based on content and structure analysis using specialized models like RVL-CDIP or UDOP deployed on SageMaker.

This is a key component of Pattern 3, enabling specialized document classification beyond what's possible with foundation models alone. Users can create their own SageMaker endpoint using any method (CDK constructs, existing endpoints, etc.) and pass it directly to the processor.


~~configuration~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly configuration: ISagemakerUdopProcessorConfiguration;

Configuration for the SageMaker UDOP document processor.

Provides customization options for the processing workflow, including schema definitions, prompts, and evaluation settings.


~~assessmentGuardrail~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly assessmentGuardrail: IGuardrail;
  • Type: @aws-cdk/aws-bedrock-alpha.IGuardrail
  • Default: No guardrail is applied

Optional Bedrock guardrail to apply to assessment model interactions.

Helps ensure model outputs adhere to content policies and guidelines by filtering inappropriate content and enforcing usage policies.


~~classificationGuardrail~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly classificationGuardrail: IGuardrail;
  • Type: @aws-cdk/aws-bedrock-alpha.IGuardrail
  • Default: No guardrail is applied

Optional Bedrock guardrail to apply to classification model interactions.

Helps ensure model outputs adhere to content policies and guidelines by filtering inappropriate content and enforcing usage policies.


~~enableEditSections~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly enableEditSections: boolean;
  • Type: boolean
  • Default: false

Enable edit sections feature for classification updates.

When enabled, allows users to modify document classification through the UI and trigger selective reprocessing of affected sections. This provides flexibility to correct classification errors without reprocessing entire documents.


~~evaluationBaselineBucket~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly evaluationBaselineBucket: IBucket;
  • Type: aws-cdk-lib.aws_s3.IBucket
  • Default: No evaluation baseline bucket is configured

Optional S3 bucket containing baseline documents for evaluation.

Used as ground truth when evaluating extraction accuracy by comparing extraction results against known correct values.

Required when evaluationEnabled is true.


~~evaluationEnabled~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly evaluationEnabled: boolean;
  • Type: boolean
  • Default: false

Controls whether extraction results are evaluated for accuracy.

When enabled, compares extraction results against expected values to measure extraction quality and identify improvement areas.


~~extractionGuardrail~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly extractionGuardrail: IGuardrail;
  • Type: @aws-cdk/aws-bedrock-alpha.IGuardrail
  • Default: No guardrail is applied

Optional Bedrock guardrail to apply to extraction model interactions.

Helps ensure model outputs adhere to content policies and guidelines by filtering inappropriate content and enforcing usage policies.


~~ocrMaxWorkers~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly ocrMaxWorkers: number;
  • Type: number
  • Default: 20

The maximum number of concurrent workers for OCR processing.

Controls parallelism during the text extraction phase to optimize throughput while managing resource utilization.


~~sectionSplittingStrategy~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly sectionSplittingStrategy: SectionSplittingStrategy;
  • Type: @cdklabs/genai-idp.SectionSplittingStrategy
  • Default: SectionSplittingStrategy.LLM_DETERMINED

Section splitting strategy configuration.

Controls how multi-page documents are divided into sections during classification. This affects how documents of the same type are grouped together and processed.

Options: - DISABLED: Entire document treated as single section with first detected class - PAGE: One section per page preventing automatic joining of same-type documents - LLM_DETERMINED: Uses LLM boundary detection with "Start"/"Continue" indicators


~~summarizationGuardrail~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models for accurate document categorization before extraction.

SageMaker UDOP Processor offers the highest level of customization for document processing, allowing you to deploy and use specialized models for document classification while still leveraging foundation models for extraction tasks. This processor is particularly useful for domain-specific document processing needs.

public readonly summarizationGuardrail: IGuardrail;
  • Type: @aws-cdk/aws-bedrock-alpha.IGuardrail
  • Default: No guardrail is applied

Optional Bedrock guardrail to apply to summarization model interactions.

Helps ensure model outputs adhere to content policies and guidelines by filtering inappropriate content and enforcing usage policies.


Classes

SagemakerUdopProcessorConfiguration

Configuration management for SageMaker UDOP document processing using SageMaker for classification.

This construct creates and manages the configuration for SageMaker UDOP document processing, including schema definitions, extraction prompts, and configuration values. It provides a centralized way to manage document classes, extraction schemas, and model parameters for specialized document processing with SageMaker.

Initializers

import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessorConfiguration(definition: ISagemakerUdopProcessorConfigurationDefinition)
Name Type Description
definition ISagemakerUdopProcessorConfigurationDefinition The configuration definition instance.

definitionRequired

The configuration definition instance.


Methods

Name Description
bind Binds the configuration to a processor instance.

bind
public bind(processor: ISagemakerUdopProcessor): ISagemakerUdopProcessorConfigurationDefinition

Binds the configuration to a processor instance.

This method applies the configuration to the processor.

processorRequired

Static Functions

Name Description
fromFile Creates a configuration from a YAML file.
rvlCdipPackageSample Creates a default configuration with standard settings.

fromFile
import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfiguration.fromFile(filePath: string, options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

Creates a configuration from a YAML file.

filePathRequired
  • Type: string

Path to the YAML configuration file.


optionsOptional

Optional configuration options to override file settings.


rvlCdipPackageSample
import { SagemakerUdopProcessorConfiguration } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfiguration.rvlCdipPackageSample(options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

Creates a default configuration with standard settings.

optionsOptional

Optional configuration options.


SagemakerUdopProcessorConfigurationDefinition

Configuration definition for SageMaker UDOP document processing.

Provides methods to create and customize configuration for SageMaker UDOP processing.

Initializers

import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessorConfigurationDefinition()
Name Type Description

Static Functions

Name Description
fromFile Creates a configuration definition from a YAML file.
rvlCdipPackageSample Creates a default configuration definition for SageMaker UDOP processing.

~~fromFile~~
import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfigurationDefinition.fromFile(filePath: string, options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

Creates a configuration definition from a YAML file.

Allows users to provide custom configuration files for document processing.

filePathRequired
  • Type: string

Path to the YAML configuration file.


optionsOptional

Optional customization for processing stages.


~~rvlCdipPackageSample~~
import { SagemakerUdopProcessorConfigurationDefinition } from '@cdklabs/genai-idp-sagemaker-udop-processor'

SagemakerUdopProcessorConfigurationDefinition.rvlCdipPackageSample(options?: SagemakerUdopProcessorConfigurationDefinitionOptions)

Creates a default configuration definition for SageMaker UDOP processing.

This configuration includes basic settings for extraction, evaluation, and summarization when using SageMaker for document classification.

optionsOptional

Optional customization for processing stages.


SagemakerUdopProcessorConfigurationSchema

Schema definition for SageMaker UDOP processor configuration. Provides JSON Schema validation rules for the configuration UI and API.

This class defines the structure, validation rules, and UI presentation for the SageMaker UDOP processor configuration, including document classes, attributes, extraction parameters, evaluation criteria, and summarization options. It's specialized for use with SageMaker endpoints for document classification.

Initializers

import { SagemakerUdopProcessorConfigurationSchema } from '@cdklabs/genai-idp-sagemaker-udop-processor'

new SagemakerUdopProcessorConfigurationSchema()
Name Type Description

Methods

Name Description
bind Binds the configuration schema to a processor instance.

bind
public bind(processor: SagemakerUdopProcessor): void

Binds the configuration schema to a processor instance.

Creates a custom resource that updates the schema in the configuration table.

processorRequired

The SageMaker UDOP document processor to apply the schema to.


Protocols

ISagemakerUdopProcessor

Interface for SageMaker UDOP document processor implementation.

Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.
evaluationFunction any The evaluation function if evaluation is enabled for this processor.

~~node~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models like RVL-CDIP or UDOP for accurate document categorization before extraction.

Use SageMaker UDOP Processor when: - Processing highly specialized or complex document types - You need custom classification models beyond what foundation models can provide - You have domain-specific document types requiring specialized handling - You want to leverage fine-tuned models for specific document domains

public readonly node: Node;
  • Type: constructs.Node

The tree node.


~~environment~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models like RVL-CDIP or UDOP for accurate document categorization before extraction.

Use SageMaker UDOP Processor when: - Processing highly specialized or complex document types - You need custom classification models beyond what foundation models can provide - You have domain-specific document types requiring specialized handling - You want to leverage fine-tuned models for specific document domains

public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


~~maxProcessingConcurrency~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models like RVL-CDIP or UDOP for accurate document categorization before extraction.

Use SageMaker UDOP Processor when: - Processing highly specialized or complex document types - You need custom classification models beyond what foundation models can provide - You have domain-specific document types requiring specialized handling - You want to leverage fine-tuned models for specific document domains

public readonly maxProcessingConcurrency: number;
  • Type: number

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


~~stateMachine~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models like RVL-CDIP or UDOP for accurate document categorization before extraction.

Use SageMaker UDOP Processor when: - Processing highly specialized or complex document types - You need custom classification models beyond what foundation models can provide - You have domain-specific document types requiring specialized handling - You want to leverage fine-tuned models for specific document domains

public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Manages the sequence of processing steps and handles error conditions. This state machine is triggered for each document that needs processing and coordinates the entire extraction pipeline.


~~evaluationFunction~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).

SageMaker UDOP Processor uses specialized document processing with SageMaker endpoints for document classification, combined with foundation models for extraction. This processor is ideal for specialized document types that require custom classification models like RVL-CDIP or UDOP for accurate document categorization before extraction.

Use SageMaker UDOP Processor when: - Processing highly specialized or complex document types - You need custom classification models beyond what foundation models can provide - You have domain-specific document types requiring specialized handling - You want to leverage fine-tuned models for specific document domains

public readonly evaluationFunction: any;
  • Type: any

The evaluation function if evaluation is enabled for this processor.

The evaluation function is created by the ProcessingEnvironment when evaluation baseline bucket and model are provided.


ISagemakerUdopProcessorConfiguration

Interface for SageMaker UDOP document processor configuration.

Provides configuration management for specialized document processing with SageMaker.

Methods

Name Description
bind Binds the configuration to a processor instance.

bind
public bind(processor: ISagemakerUdopProcessor): ISagemakerUdopProcessorConfigurationDefinition

Binds the configuration to a processor instance.

This method applies the configuration to the processor.

processorRequired

The SageMaker UDOP document processor to apply to.


ISagemakerUdopProcessorConfigurationDefinition

Interface for SageMaker UDOP processor configuration definition.

Defines the structure and capabilities of configuration for SageMaker UDOP processing.

Properties

Name Type Description
extractionModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable The invokable model used for information extraction.
assessmentModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional invokable model used for document assessment.
customPromptGenerator aws-cdk-lib.aws_lambda.IFunction Optional custom prompt generator Lambda function.
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional invokable model used for evaluating extraction results.
summarizationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional invokable model used for document summarization.

~~extractionModel~~Required
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).
public readonly extractionModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

The invokable model used for information extraction.

Can be a Bedrock foundation model, Bedrock inference profile, or custom model. Extracts structured data from documents based on defined schemas, transforming unstructured content into structured information.


~~assessmentModel~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).
public readonly assessmentModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional invokable model used for document assessment.

Can be a Bedrock foundation model, Bedrock inference profile, or custom model.


~~customPromptGenerator~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).
public readonly customPromptGenerator: IFunction;
  • Type: aws-cdk-lib.aws_lambda.IFunction
  • Default: undefined

Optional custom prompt generator Lambda function.

When provided, this function will be invoked during extraction to customize prompts. This is either the function provided via configuration options, or imported from the ARN specified in the configuration file.


~~evaluationModel~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional invokable model used for evaluating extraction results.

Can be a Bedrock foundation model, Bedrock inference profile, or custom model. Used to assess the quality and accuracy of extracted information by comparing extraction results against expected values.


~~summarizationModel~~Optional
  • Deprecated: This processor pattern is deprecated and will be removed in v0.5.0. Please migrate to Pattern 1 (BDA Processor) or Pattern 2 (Bedrock LLM Processor).
public readonly summarizationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional invokable model used for document summarization.

Can be a Bedrock foundation model, Bedrock inference profile, or custom model. When provided, enables automatic generation of document summaries that capture key information from processed documents.


ISagemakerUdopProcessorConfigurationSchema

Interface for SageMaker UDOP configuration schema.

Defines the structure and validation rules for SageMaker UDOP processor configuration.

Methods

Name Description
bind Binds the configuration schema to a processor instance.

bind
public bind(processor: SagemakerUdopProcessor): void

Binds the configuration schema to a processor instance.

This method applies the schema definition to the processor's configuration table.

processorRequired

The SageMaker UDOP document processor to apply the schema to.