Minimal Bedrock Document Processing
The absolute minimal document processing system using AppMod Catalog Blueprints. This example shows how to deploy a complete AI-powered document processing pipeline with zero configuration.
What This Example Does
Deploy a complete document processing system with just one construct:
- Document Upload: S3 bucket automatically configured for document uploads
- AI Classification: Automatically identifies document types using Amazon Bedrock
- Data Extraction: Extracts key information from documents using Claude 3.5 Sonnet
- Workflow Orchestration: Step Functions manages the entire processing pipeline
- Results Storage: DynamoDB stores processing results and metadata
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ S3 Bucket │ │ Step Functions│ │ Amazon │
│ Document │───▶│ Workflow │───▶│ Bedrock │
│ Upload │ │ │ │ (Claude 3.5) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ DynamoDB │ │ CloudWatch │
│ Results │ │ Logs │
└─────────────────┘ └─────────────────┘
The Code
The entire system is deployed with just a few lines:
// Create S3 bucket for document storage
const bucket = new Bucket(this, 'DocumentBucket', {
encryption: BucketEncryption.KMS,
});
// Deploy complete document processing!
new BedrockDocumentProcessing(this, 'MinimalDocProcessor', {
ingressAdapter: new QueuedS3Adapter({ bucket }),
// Enable cross-region inference for Claude Sonnet 4
classificationBedrockModel: {
useCrossRegionInference: true,
},
processingBedrockModel: {
useCrossRegionInference: true,
},
});
This creates:
- S3 bucket with proper event notifications
- Step Functions workflow for processing orchestration
- Lambda functions for classification and extraction
- DynamoDB table for results storage
- IAM roles with least-privilege permissions
- CloudWatch logs for monitoring
Quick Start
Prerequisites
- AWS CLI configured with appropriate permissions
- CDK CLI installed:
npm install -g aws-cdk - Node.js 18+
- Docker Desktop running (required for Python Lambda bundling)
- Amazon Bedrock model access (Claude 3.5 Sonnet)
Deploy
# Install dependencies
npm install
# Deploy the stack
npm run deploy
Deployment time: ~5 minutes
Outputs: After deployment, you'll see:
DocumentBucketName: S3 bucket name for uploading documentsStateMachineArn: Step Functions workflow ARN for monitoringProcessingTableName: DynamoDB table name for results
Test Document Processing
After deployment, upload a document to trigger processing:
Option 1: Using the upload script (recommended)
# Make the script executable
chmod +x upload-document.sh
# Upload a test document
./upload-document.sh sample-files/sample-invoice.jpg
Option 2: Using AWS CLI directly
# Get bucket name from deployment outputs
BUCKET_NAME=$(aws cloudformation describe-stacks \
--stack-name MinimalBedrockDocProcessingStack \
--query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' \
--output text)
# Upload document to raw/ prefix to trigger processing
aws s3 cp sample-files/sample-invoice.jpg s3://$BUCKET_NAME/raw/
View Results:
- Step Functions Console: Search for "MinimalBedrockDocProcessingStack" to view workflow executions
- DynamoDB Console: Open the processing table (from outputs) to see extracted data
- CloudWatch Logs: Check
/aws/lambda/log groups for detailed processing logs
Default Capabilities
Out of the box, this system can process:
- Invoices: Invoice numbers, amounts, dates, vendor information
- Receipts: Purchase details, merchant info, totals
- Contracts: Parties, dates, key terms
- Forms: Field extraction based on document structure
- Identity Documents: Names, addresses, ID numbers
What's Included
The BedrockDocumentProcessing construct provides:
- Automatic retries for transient failures
- Error handling with proper error states
- Cost optimization with appropriate resource sizing
- Security with encryption at rest and in transit
- Monitoring with CloudWatch metrics and logs
- Scalability with serverless auto-scaling
Troubleshooting
Docker Not Running
Error: Cannot connect to the Docker daemon
Solution: Start Docker Desktop on your machine. The construct uses Python Lambda functions that require Docker for bundling dependencies during CDK synthesis.
Deployment Issues
- Bedrock access denied: Verify you have model access in the Bedrock console (us-east-1 or your deployment region)
- No processing triggered: Ensure documents are uploaded to the S3 bucket (check CloudFormation outputs for bucket name)
- Processing timeout: Check document size - very large documents may need timeout adjustments
Monitoring
- Step Functions Console: View workflow execution progress and any errors
- CloudWatch Logs: Check Lambda function logs for detailed processing information
- DynamoDB Console: Query the table to see processing results
Next Steps
This example demonstrates the power of AppMod Blueprints defaults. For more advanced features, see:
- Bedrock Document Processing - Add custom prompts and configuration
- Agentic Document Processing - Advanced AI capabilities
- Full-Stack Web App - Complete application with web interface
Cleanup
npm run destroy