Aspose.OCR & AWS S3: 클라우드 OCR 통합 | File Format Processing Plugins for C# .NET Core

Aspose.OCR과 Amazon S3을 통합하면 개발자가 클라우드 스토리지를 활용하여 Optical Character Recognition (OCR) 결과를 효율적으로 저장할 수 있습니다.이 통신은 OCR 데이터 관리를 단순화 할뿐만 아니라 확장 가능성과 접근 가능성을 향상시킵니다. 이 튜토리얼에서, 우리는 AWS S3, 상세한 예와 최상의 관행을 제공하는 것과 함께 열심히 작동하기 위해 ASPOSE.OKR을 설정하는 과정을 통과 할 것입니다.

완전한 예제

원칙

.NET 8 (또는 .Net 6+) SDK가 설치되어 있습니다.
Amazon S3에 액세스할 수 있는 AWS 계정.
예를 들면 (예를 들어, my-ocr-demo-bucket귀하의 선호하는 지역에서 (예제 아래 사용) ap-south-1).
(선택) 평가를 넘어서 실행하려는 경우 라이센스 파일을 삭제합니다.

1단계 : .NET을 위한 Aspose.OCR 설정

새로운 콘솔 앱을 설치하고 NuGet 패키지를 추가합니다.

# Create project
dotnet new console -n OcrS3Demo -f net8.0
cd OcrS3Demo

# Add Aspose.OCR (CPU) OR Aspose.OCR-GPU (pick exactly one)
dotnet add package Aspose.OCR
# dotnet add package Aspose.OCR-GPU   # if you prefer GPU build

# Add AWS S3 SDK
dotnet add package AWSSDK.S3

아스포스는 둘 다 제공합니다. Aspose.OCR (CPU) 그리고 Aspose.OCR-GPU 패키지를 통해 NuGet; 당신은 하나만 필요합니다. (아스포스 문서)

단계 2: .NET에 대한 AWS SDK 설정

AWS 프로필을 설정하고 버켓을 만드십시오 (당신이 이미 하나를 가지고 있다면 스키).

# Configure credentials (creates ~/.aws/credentials and config)
aws configure --profile ocr-s3
# AWS Access Key ID: AKIA****************
# AWS Secret Access Key: ************************
# Default region name: ap-south-1
# Default output format: json

# Create a bucket in that region (bucket name must be globally unique)
aws s3api create-bucket \
  --bucket my-ocr-demo-bucket \
  --region ap-south-1 \
  --create-bucket-configuration LocationConstraint=ap-south-1

이 튜토리얼에 대한 권장 최소 IAM 정책 (당신의 사용자 / 역할에 연결) :

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket" },
    { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket/*" }
  ]
}

AWS SDK for .NET은 기본 인증 체인을 사용합니다. AWS_PROFILE=ocr-s3 코어 S3 패턴 (창조, 업로드, 다운로드)은 AWS의 공식 .NET 예제에 문서화됩니다. (AWS 문서)

단계 3 : Aspose.OCR API를 시작합니다.

기본을 만들기 Program.cs 우리는 또한 언어 및 문서 배열 탐지로 영어를 설정합니다. (아래에 표시된 모든 유형은 현재 Aspose.OCR API 표면에서 나옵니다.) (참조.aspose.com)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main(string[] args)
    {
        // Optional: load license if you have one
        // new License().SetLicense("Aspose.Total.lic");

        var ocr = new AsposeOcr();

        var settings = new RecognitionSettings
        {
            // pick your language(s); can combine if needed
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        // We'll fill in S3 + OCR steps next...
    }
}

우리는 다음과 같은 API를 사용할 것입니다 :

AsposeOcr.RecognizeImage(MemoryStream, RecognitionSettings) 돌아오는 A RecognitionResult.
RecognitionResult.RecognitionText / GetJson(bool) / Save(...) 결과를 TXT/JSON/PDF/DOCX로 수출하십시오. (참조.aspose.com)

단계 4 : S3에 이미지를 업로드

디스크에서 S3로 이미지를 업로드할 수 있습니다. PutObjectAsync(당신은 또한 스트림을 업로드 할 수 있습니다; 둘 다 AWS SDK에 의해 지원됩니다.)

// Configure S3 client (uses your AWS_PROFILE locally)
var region = RegionEndpoint.APSouth1; // change if needed
using var s3 = new AmazonS3Client(region);

// Local image you want to OCR:
string localImagePath = @"D:\samples\invoices\invoice-001.png";
string bucket = "my-ocr-demo-bucket";
string objectKey = "input/invoice-001.png";

// Upload the image to S3
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = objectKey,
    FilePath = localImagePath,
    ContentType = "image/png",
    // Optional: enable server-side encryption
    // ServerSideEncryptionMethod = ServerSideEncryptionMethod.AES256
});
Console.WriteLine($"Uploaded {objectKey} to s3://{bucket}.");

업로드 패턴에 대한 AWS의 .NET S3 예를 참조하십시오. (AWS 문서)

단계 5: 업로드 된 이미지에 OCR를 수행

S3 개체를 메모리로 직접 흐르고 통과합니다. MemoryStream 아스포스.OCR에 대한 자세한 내용

// Download S3 object and OCR in-memory (no temp files)
var get = await s3.GetObjectAsync(bucket, objectKey);
await using var s3Stream = get.ResponseStream;
using var ms = new MemoryStream();
await s3Stream.CopyToAsync(ms);
ms.Position = 0;

// Run OCR (with settings → structured result)
RecognitionResult result = ocr.RecognizeImage(ms, settings);

// Or: if you just need plain text and defaults
// string textFast = ocr.RecognizeImage(ms);

string recognizedText = result.RecognitionText;
Console.WriteLine("=== OCR TEXT ===");
Console.WriteLine(recognizedText);

그들의 RecognizeImage 과도한 충전 및 RecognitionResult.RecognitionText 현재 API 참조의 일부입니다. (참조.aspose.com)

단계 6 : S3에서 OCR 결과를 저장

깔끔한 텍스트, JSON, 또는 심지어 Aspose.OCR에 의해 생성 된 PDF/DOCX를 업로드 할 수 있습니다.

6.a) 명확한 텍스트로 저장

var textKey = "output/invoice-001.txt";
var textBytes = Encoding.UTF8.GetBytes(recognizedText);
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = textKey,
    InputStream = new MemoryStream(textBytes),
    ContentType = "text/plain"
});
Console.WriteLine($"Saved OCR text to s3://{bucket}/{textKey}");

6.b) 자세한 JSON 저장

var json = result.GetJson(true); // include additional data
var jsonKey = "output/invoice-001.json";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = jsonKey,
    InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
    ContentType = "application/json"
});
Console.WriteLine($"Saved OCR JSON to s3://{bucket}/{jsonKey}");

6.c) 검색 가능한 PDF (또는 DOCX)를 저장하고 S3에 넣어

// Export to PDF in-memory, then upload
using var outPdf = new MemoryStream();
result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
outPdf.Position = 0;

var pdfKey = "output/invoice-001.pdf";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = pdfKey,
    InputStream = outPdf,
    ContentType = "application/pdf"
});
Console.WriteLine($"Saved OCR PDF to s3://{bucket}/{pdfKey}");

수출 및 저장 방법 (RecognitionResult.Save) 및 형식 (TXT/PDF/DOCX)은 공식 API 참조에 있습니다. ([reference.aspose.com][4])

원래 제목: End-to-end `Program.cs`

여기에 당신이 떨어질 수있는 컴팩트한 끝에서 끝까지 버전이 있습니다. Program.cs (단계 3 - 6의 조합)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main()
    {
        // new License().SetLicense("Aspose.Total.lic"); // optional

        string bucket = "my-ocr-demo-bucket";
        string regionSystemName = "ap-south-1";
        string localImagePath = @"D:\samples\invoices\invoice-001.png";
        string imageKey = "input/invoice-001.png";

        var ocr = new AsposeOcr();
        var settings = new RecognitionSettings
        {
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        using var s3 = new AmazonS3Client(RegionEndpoint.GetBySystemName(regionSystemName));

        // Upload original
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = imageKey,
            FilePath = localImagePath,
            ContentType = "image/png"
        });

        // Get image as stream
        var get = await s3.GetObjectAsync(bucket, imageKey);
        await using var s3Stream = get.ResponseStream;
        using var ms = new MemoryStream();
        await s3Stream.CopyToAsync(ms);
        ms.Position = 0;

        // OCR
        RecognitionResult result = ocr.RecognizeImage(ms, settings);
        string text = result.RecognitionText;

        // Upload text
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.txt",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(text)),
            ContentType = "text/plain"
        });

        // Upload JSON
        string json = result.GetJson(true);
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.json",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
            ContentType = "application/json"
        });

        // Upload PDF
        using var outPdf = new MemoryStream();
        result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
        outPdf.Position = 0;
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.pdf",
            InputStream = outPdf,
            ContentType = "application/pdf"
        });

        Console.WriteLine("OCR complete and results stored in S3.");
    }
}

모범 사례

Aspose.OCR과 AWS S3를 통합하면 향상된 데이터 관리 및 확장 가능성을 포함하여 수많은 혜택을 제공합니다.

- 보안*
하드 코드 비밀을 사용하지 마십시오. aws configure + AWS_PROFILE 현지; 생산에서 IAM 역할을 사용합니다.
S3 서버 측 암호화 (AES256 또는 KMS) 결과 개체에 대해 고려하고, 최소한의 특권을 가진 각 부켓 정책 (위에 표시).AWS 문서)
성과
GPU 패키지를 사용하십시오 (Aspose.OCR-GPU) CUDA-능력 하드웨어에서 OCR를 가속화; 동일한 코드, 더 빠른 실행. (아스포스 문서)
품질을 위한 사전 처리 이미지 (deskew, denoise) 사용 RecognitionSettings / 필요하다면 프레세트, 그리고 올바른 선택 DetectAreasMode API 옵션은 참조에서 표시됩니다. (참조.aspose.com)
- 스케일링*
S3 Prefix를 사용하는 방법 input/ 그리고 output/ 작업에 따라, 그리고 추적 가능성을 위해 OCR 문서 (TXT/JSON/PDF)를 함께 저장합니다.
S3 버전을 활성화하려면 감사 스토리와 롤버크를 원합니다.
이 흐름을 컨테이너 또는 서버가없는 (예 : EFS와 함께 AWS Batch/ECS/Lambda)에서 동시 OCR 스케일로 실행하는 것을 고려하십시오.

이 지침을 따르면 Aspose.OCR을 AWS S3와 효과적으로 통합하여 OCR 작업 흐름을 단순화하고 응용 프로그램의 전체 성능을 향상시킬 수 있습니다.

참조

아스포스.OCR NuGet 패키지 및 설치 옵션 (Aspose.OCR, Aspose.OCR-GPU). (아스포스 문서)
AsposeOcr.RecognizeImage(...) 과도한 충전; RecognitionResult.RecognitionText, GetJson, Save(...). (참조.aspose.com)
AWS SDK for .NET : S3 생성/ 업로드/다운로드 예제. (AWS 문서)

당신이 원한다면, 나는 또한 작은 추가 할 수 있습니다. Makefile 또는 PowerShell 스크립트는 끝에서 끝까지 이 작업을 실행 하 고 CI 스니프트 (GitHub 액션) 결과를 S3에 압박 합니다.

[4] : https://reference.aspose.com/ocr/net/aspose.ocr/recognitionresult/ “Recognition결과

완전한 예제#

원칙#

1단계 : .NET을 위한 Aspose.OCR 설정#

단계 2: .NET에 대한 AWS SDK 설정#

단계 3 : Aspose.OCR API를 시작합니다.#

단계 4 : S3에 이미지를 업로드#

단계 5: 업로드 된 이미지에 OCR를 수행#

단계 6 : S3에서 OCR 결과를 저장#

6.a) 명확한 텍스트로 저장#

6.b) 자세한 JSON 저장#

6.c) 검색 가능한 PDF (또는 DOCX)를 저장하고 S3에 넣어#

원래 제목: End-to-end Program.cs#

모범 사례#

참조#

More in this category