Aspose.OCR i AWS S3: Integració de Cloud OCR | File Format Processing Plugins for C# .NET Core

La integració d’Aspose.OCR amb Amazon S3 permet als desenvolupadors utilitzar l’emmagatzematge en núvol per emmagatzemar eficientment els resultats de la Reconeixement de Caràcter Optic (OCR). Aquesta integración no només simplifica la gestió de les dades de OCR, sinó que també millora la escalabilitat i l’accessibilitat. En aquest tutorial, passem pel procés de configuració de l ́Aspo.ocR per treballar sense cap mena amb AWS S3, proporcionant exemples detallats i millors pràctiques al llarg del camí.

Exemple complet

Prerequisits

S’ha instal·lat el .NET 8 (o .Net 6+) SDK.
Un compte AWS amb accés a Amazon S3.
Una petita (per exemple, my-ocr-demo-bucket) a la seva regió preferida (exemple a continuació utilitza ap-south-1).
(Opcional) Asposa el fitxer de llicència si vols executar més enllà d’avaluació.

Pas 1: Establir Aspose.OCR per a .NET

Instal·la una nova aplicació de consola i afegeix paquets NuGet.

# Create project
dotnet new console -n OcrS3Demo -f net8.0
cd OcrS3Demo

# Add Aspose.OCR (CPU) OR Aspose.OCR-GPU (pick exactly one)
dotnet add package Aspose.OCR
# dotnet add package Aspose.OCR-GPU   # if you prefer GPU build

# Add AWS S3 SDK
dotnet add package AWSSDK.S3

L’Asposa proporciona els dos Aspose.OCR (CPU) i Aspose.OCR-GPU paquets a través de NuGet; només necessita un. (Documentació Asposa)

Pas 2: Configuració d’AWS SDK per a .NET

Configura el teu perfil AWS i crea un bucket (skip si ja tens un).

# Configure credentials (creates ~/.aws/credentials and config)
aws configure --profile ocr-s3
# AWS Access Key ID: AKIA****************
# AWS Secret Access Key: ************************
# Default region name: ap-south-1
# Default output format: json

# Create a bucket in that region (bucket name must be globally unique)
aws s3api create-bucket \
  --bucket my-ocr-demo-bucket \
  --region ap-south-1 \
  --create-bucket-configuration LocationConstraint=ap-south-1

Polítiques mínimes recomanades IAM (adhereix al teu usuari / paper) per a aquest tutorial:

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket" },
    { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket/*" }
  ]
}

El SDK AWS per a .NET utilitza la cadena de credencial estàndard; AWS_PROFILE=ocr-s3 Els patrons Core S3 (create, upload, download) estan documentats en els exemples oficials de .NET de AWS. (Documentació AWS)

Pas 3: Iniciar Aspose.OCR API

Crear una base Program.cs També configurarem l’anglès com a llenguatge i detecció del disseny del document. (Tots els tipus que es mostren a continuació són de la superfície actual d’Aspose.OCR API.) (de referència.aspose.com)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main(string[] args)
    {
        // Optional: load license if you have one
        // new License().SetLicense("Aspose.Total.lic");

        var ocr = new AsposeOcr();

        var settings = new RecognitionSettings
        {
            // pick your language(s); can combine if needed
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        // We'll fill in S3 + OCR steps next...
    }
}

Key APIs que utilitzarem a continuació:

AsposeOcr.RecognizeImage(MemoryStream, RecognitionSettings) Torna a RecognitionResult.
RecognitionResult.RecognitionText / GetJson(bool) / Save(...) Us deixem exportar els resultats a TXT/JSON/PDF/DOCX. (de referència.aspose.com)

Pas 4: Carregar imatges a S3

Podeu carregar imatges des del disc a S3 amb PutObjectAsync(També es poden carregar fluxos; tots dos estan recolzats per AWS SDK.)

// Configure S3 client (uses your AWS_PROFILE locally)
var region = RegionEndpoint.APSouth1; // change if needed
using var s3 = new AmazonS3Client(region);

// Local image you want to OCR:
string localImagePath = @"D:\samples\invoices\invoice-001.png";
string bucket = "my-ocr-demo-bucket";
string objectKey = "input/invoice-001.png";

// Upload the image to S3
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = objectKey,
    FilePath = localImagePath,
    ContentType = "image/png",
    // Optional: enable server-side encryption
    // ServerSideEncryptionMethod = ServerSideEncryptionMethod.AES256
});
Console.WriteLine($"Uploaded {objectKey} to s3://{bucket}.");

Veure exemples de .NET S3 d’AWS per als patrons de carregament. (Documentació AWS)

Pas 5: Permetre OCR a les imatges carregades

Fluir l’objecte S3 directament a la memòria i passar el MemoryStream A més a més.OCR.

// Download S3 object and OCR in-memory (no temp files)
var get = await s3.GetObjectAsync(bucket, objectKey);
await using var s3Stream = get.ResponseStream;
using var ms = new MemoryStream();
await s3Stream.CopyToAsync(ms);
ms.Position = 0;

// Run OCR (with settings → structured result)
RecognitionResult result = ocr.RecognizeImage(ms, settings);

// Or: if you just need plain text and defaults
// string textFast = ocr.RecognizeImage(ms);

string recognizedText = result.RecognitionText;
Console.WriteLine("=== OCR TEXT ===");
Console.WriteLine(recognizedText);

The RecognizeImage sobrecarregats i RecognitionResult.RecognitionText Són part de l’actual referència API. (de referència.aspose.com)

Pas 6: Emmagatzemar els resultats de l’OCR en S3

Podeu descarregar text pla, JSON, o fins i tot un PDF/DOCX produït per Aspose.OCR.

6.a) Salvar com a text clar

var textKey = "output/invoice-001.txt";
var textBytes = Encoding.UTF8.GetBytes(recognizedText);
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = textKey,
    InputStream = new MemoryStream(textBytes),
    ContentType = "text/plain"
});
Console.WriteLine($"Saved OCR text to s3://{bucket}/{textKey}");

6.b) Salvació detallada JSON

var json = result.GetJson(true); // include additional data
var jsonKey = "output/invoice-001.json";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = jsonKey,
    InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
    ContentType = "application/json"
});
Console.WriteLine($"Saved OCR JSON to s3://{bucket}/{jsonKey}");

6.c) Salvar un PDF buscable (o DOCX) i posar-lo a S3

// Export to PDF in-memory, then upload
using var outPdf = new MemoryStream();
result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
outPdf.Position = 0;

var pdfKey = "output/invoice-001.pdf";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = pdfKey,
    InputStream = outPdf,
    ContentType = "application/pdf"
});
Console.WriteLine($"Saved OCR PDF to s3://{bucket}/{pdfKey}");

Exportació i estalvi (RecognitionResult.Save) i els formats (TXT/PDF/DOCX) estan en la referència oficial de l’API. ([reference.aspose.com][4])

Opció: final a final `Program.cs`

Aquí teniu una versió compacta end-to-end que podeu descarregar Program.cs (Els passos 3 a 6 són els següents):

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main()
    {
        // new License().SetLicense("Aspose.Total.lic"); // optional

        string bucket = "my-ocr-demo-bucket";
        string regionSystemName = "ap-south-1";
        string localImagePath = @"D:\samples\invoices\invoice-001.png";
        string imageKey = "input/invoice-001.png";

        var ocr = new AsposeOcr();
        var settings = new RecognitionSettings
        {
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        using var s3 = new AmazonS3Client(RegionEndpoint.GetBySystemName(regionSystemName));

        // Upload original
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = imageKey,
            FilePath = localImagePath,
            ContentType = "image/png"
        });

        // Get image as stream
        var get = await s3.GetObjectAsync(bucket, imageKey);
        await using var s3Stream = get.ResponseStream;
        using var ms = new MemoryStream();
        await s3Stream.CopyToAsync(ms);
        ms.Position = 0;

        // OCR
        RecognitionResult result = ocr.RecognizeImage(ms, settings);
        string text = result.RecognitionText;

        // Upload text
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.txt",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(text)),
            ContentType = "text/plain"
        });

        // Upload JSON
        string json = result.GetJson(true);
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.json",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
            ContentType = "application/json"
        });

        // Upload PDF
        using var outPdf = new MemoryStream();
        result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
        outPdf.Position = 0;
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.pdf",
            InputStream = outPdf,
            ContentType = "application/pdf"
        });

        Console.WriteLine("OCR complete and results stored in S3.");
    }
}

Les millors pràctiques

Integrar Aspose.OCR amb AWS S3 ofereix nombrosos beneficis, incloent-hi una millor gestió de dades i una millora de l’escalabilitat.

- Seguretat *
Mai els secrets de codi dur. utilitzar aws configure + AWS_PROFILE local; utilitzar els rols IAM en la producció.
Considera la xifració del costat del servidor S3 (AES256 o KMS) en els objectes resultats, i les polítiques per-bucket amb el mínim privilegi (veurem a dalt).Documentació AWS)
Performances
Utilitzar el paquet GPU (Aspose.OCR-GPU) en el maquinari capaç de CUDA per accelerar OCR; el mateix codi, execució més ràpida. (Documentació Asposa)
Preprocés d’imatges per a la qualitat (descobrir, denegar) utilitzant RecognitionSettings / presets si és necessari, i triar el dret DetectAreasMode per a documents. opcions API es mostren en la referència. (de referència.aspose.com)
- Escalabilitat *
Utilitza els prefix S3 com input/ i output/ per feina, i emmagatzemar els artifactes OCR (TXT/JSON/PDF) junts per a la traçabilitat.
Aplicar la versió S3 si voleu un historial audible i rellotges.
Considera executar aquest flux en contenidors o sense servidor (per exemple, AWS Batch/ECS/Lambda amb EFS) per a OCR paral·lel a escala.

En seguir aquestes directrius, pot integrar eficaçment Aspose.OCR amb AWS S3 per simplificar el seu flux de treball OCR i millorar el rendiment general de la seva aplicació.

References

Els paquets i les opcions d’instal·lació (Aspose.OCR, Aspose.OCR-GPU). (Documentació Asposa)
AsposeOcr.RecognizeImage(...) sobrecarregats RecognitionResult.RecognitionText, GetJson, Save(...). (de referència.aspose.com)
AWS SDK per a .NET: S3 crea/upload/download exemples. (Documentació AWS)

Si vols, també puc afegir una petita Makefile o el guió PowerShell per executar aquest end-to-end, a més d’un snippet CI (Accions de GitHub) per empènyer els resultats a S3 en compromís.

[4]: https://reference.aspose.com/ocr/net/aspose.ocr/recognitionresult/ “RecognitionResultat de l’anàlisi

Exemple complet#

Prerequisits#

Pas 1: Establir Aspose.OCR per a .NET#

Pas 2: Configuració d’AWS SDK per a .NET#

Pas 3: Iniciar Aspose.OCR API#

Pas 4: Carregar imatges a S3#

Pas 5: Permetre OCR a les imatges carregades#

Pas 6: Emmagatzemar els resultats de l’OCR en S3#

6.a) Salvar com a text clar#

6.b) Salvació detallada JSON#

6.c) Salvar un PDF buscable (o DOCX) i posar-lo a S3#

Opció: final a final Program.cs#

Les millors pràctiques#

References#

More in this category