Aspose.OCR e AWS S3: integrazione cloud OCR | File Format Processing Plugins for C# .NET Core

L’integrazione di Aspose.OCR con Amazon S3 consente agli sviluppatori di sfruttare il cloud storage per memorizzare in modo efficiente i risultati di riconoscimento del carattere ottico. Questa integrazione non solo semplifica la gestione dei dati OCR ma migliora anche la scalabilità e l’accessibilità. In questo tutorial, andiamo attraverso il processo di configurazione dell’Aspose.OKR per lavorare senza sforzo con AWS S3, fornendo esempi dettagliati e migliori pratiche lungo il percorso.

Esempio completo

Prerequisiti

.NET 8 (o .Net 6+) SDK installato.
Un account AWS con accesso all’Amazon S3.
Un pezzo (ad esempio, my-ocr-demo-bucket) nella vostra regione preferita (esempio qui sotto utilizza ap-south-1).
(Opzionale) Asposa il file di licenza se si desidera eseguire oltre la valutazione.

Passo 1: Impostazione di Aspose.OCR per .NET

Installare una nuova app di console e aggiungere i pacchetti NuGet.

# Create project
dotnet new console -n OcrS3Demo -f net8.0
cd OcrS3Demo

# Add Aspose.OCR (CPU) OR Aspose.OCR-GPU (pick exactly one)
dotnet add package Aspose.OCR
# dotnet add package Aspose.OCR-GPU   # if you prefer GPU build

# Add AWS S3 SDK
dotnet add package AWSSDK.S3

Asposi fornisce entrambi Aspose.OCR CPU e Aspose.OCR-GPU pacchetti tramite NuGet; hai solo bisogno di uno. (Documentazione Asposa)

Passo 2: Configurare AWS SDK per .NET

Configurare il tuo profilo AWS e creare un bucket (skip se hai già uno).

# Configure credentials (creates ~/.aws/credentials and config)
aws configure --profile ocr-s3
# AWS Access Key ID: AKIA****************
# AWS Secret Access Key: ************************
# Default region name: ap-south-1
# Default output format: json

# Create a bucket in that region (bucket name must be globally unique)
aws s3api create-bucket \
  --bucket my-ocr-demo-bucket \
  --region ap-south-1 \
  --create-bucket-configuration LocationConstraint=ap-south-1

Politica minima raccomandata IAM (attacco al tuo utente / ruolo) per questo tutorial:

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket" },
    { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject"], "Resource": "arn:aws:s3:::my-ocr-demo-bucket/*" }
  ]
}

Il SDK AWS per .NET utilizza la catena di credenzialità predefinita; AWS_PROFILE=ocr-s3 I modelli Core S3 (creare, caricare, scaricare) sono documentati negli esempi ufficiali di AWS .NET. (Documentazione AWS)

Passo 3: Initializzare Aspose.OCR API

Creare una base Program.cs Con l’inizializzazione del motore OCR. Ci imposteremo anche il inglese come linguaggio e la rilevazione del layout del documento. (Tutti i tipi mostrati di seguito sono dalla superficie attuale dell’API Aspose.OCR.) (di riferimento.aspose.com)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main(string[] args)
    {
        // Optional: load license if you have one
        // new License().SetLicense("Aspose.Total.lic");

        var ocr = new AsposeOcr();

        var settings = new RecognitionSettings
        {
            // pick your language(s); can combine if needed
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        // We'll fill in S3 + OCR steps next...
    }
}

Le API chiave utilizzeremo il seguente:

AsposeOcr.RecognizeImage(MemoryStream, RecognitionSettings) Il ritorno a RecognitionResult.
RecognitionResult.RecognitionText / GetJson(bool) / Save(...) Lasciate che i risultati siano esportati a TXT/JSON/PDF/DOCX. (di riferimento.aspose.com)

Passo 4: Caricare le immagini su S3

Puoi caricare le immagini dal disco in S3 con PutObjectAsync(Puoi anche caricare i flussi; entrambi sono supportati da AWS SDK.)

// Configure S3 client (uses your AWS_PROFILE locally)
var region = RegionEndpoint.APSouth1; // change if needed
using var s3 = new AmazonS3Client(region);

// Local image you want to OCR:
string localImagePath = @"D:\samples\invoices\invoice-001.png";
string bucket = "my-ocr-demo-bucket";
string objectKey = "input/invoice-001.png";

// Upload the image to S3
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = objectKey,
    FilePath = localImagePath,
    ContentType = "image/png",
    // Optional: enable server-side encryption
    // ServerSideEncryptionMethod = ServerSideEncryptionMethod.AES256
});
Console.WriteLine($"Uploaded {objectKey} to s3://{bucket}.");

Vedi gli esempi di AWS .NET S3 per i modelli di caricamento. (Documentazione AWS)

Passo 5: Realizzare OCR sulle immagini caricate

Stream l’oggetto S3 direttamente nella memoria e passare il MemoryStream Sulla base di Aspose.OCR.

// Download S3 object and OCR in-memory (no temp files)
var get = await s3.GetObjectAsync(bucket, objectKey);
await using var s3Stream = get.ResponseStream;
using var ms = new MemoryStream();
await s3Stream.CopyToAsync(ms);
ms.Position = 0;

// Run OCR (with settings → structured result)
RecognitionResult result = ocr.RecognizeImage(ms, settings);

// Or: if you just need plain text and defaults
// string textFast = ocr.RecognizeImage(ms);

string recognizedText = result.RecognitionText;
Console.WriteLine("=== OCR TEXT ===");
Console.WriteLine(recognizedText);

Il RecognizeImage supercarico e RecognitionResult.RecognitionText sono parte dell’attuale riferimento API. (di riferimento.aspose.com)

Passo 6: Salvataggio dei risultati OCR in S3

È possibile caricare testo piatto, JSON o persino un PDF/DOCX prodotto da Aspose.OCR.

6.a) Salva come testo piatto

var textKey = "output/invoice-001.txt";
var textBytes = Encoding.UTF8.GetBytes(recognizedText);
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = textKey,
    InputStream = new MemoryStream(textBytes),
    ContentType = "text/plain"
});
Console.WriteLine($"Saved OCR text to s3://{bucket}/{textKey}");

6.b) Salva dettagliato JSON

var json = result.GetJson(true); // include additional data
var jsonKey = "output/invoice-001.json";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = jsonKey,
    InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
    ContentType = "application/json"
});
Console.WriteLine($"Saved OCR JSON to s3://{bucket}/{jsonKey}");

6.c) Salva un PDF di ricerca (o DOCX) e metti in S3

// Export to PDF in-memory, then upload
using var outPdf = new MemoryStream();
result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
outPdf.Position = 0;

var pdfKey = "output/invoice-001.pdf";
await s3.PutObjectAsync(new PutObjectRequest
{
    BucketName = bucket,
    Key = pdfKey,
    InputStream = outPdf,
    ContentType = "application/pdf"
});
Console.WriteLine($"Saved OCR PDF to s3://{bucket}/{pdfKey}");

Metodi di esportazione e risparmio (RecognitionResult.Save) e i formati (TXT/PDF/DOCX) sono nella riferimento API ufficiale. ([reference.aspose.com][4])

Opzione: finito alla fine `Program.cs`

Ecco una compatta versione end-to-end che puoi scendere in Program.cs (combinare i passaggi 3-6):

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using Aspose.OCR;

class Program
{
    static async Task Main()
    {
        // new License().SetLicense("Aspose.Total.lic"); // optional

        string bucket = "my-ocr-demo-bucket";
        string regionSystemName = "ap-south-1";
        string localImagePath = @"D:\samples\invoices\invoice-001.png";
        string imageKey = "input/invoice-001.png";

        var ocr = new AsposeOcr();
        var settings = new RecognitionSettings
        {
            Language = Language.Eng,
            DetectAreasMode = DetectAreasMode.DOCUMENT
        };

        using var s3 = new AmazonS3Client(RegionEndpoint.GetBySystemName(regionSystemName));

        // Upload original
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = imageKey,
            FilePath = localImagePath,
            ContentType = "image/png"
        });

        // Get image as stream
        var get = await s3.GetObjectAsync(bucket, imageKey);
        await using var s3Stream = get.ResponseStream;
        using var ms = new MemoryStream();
        await s3Stream.CopyToAsync(ms);
        ms.Position = 0;

        // OCR
        RecognitionResult result = ocr.RecognizeImage(ms, settings);
        string text = result.RecognitionText;

        // Upload text
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.txt",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(text)),
            ContentType = "text/plain"
        });

        // Upload JSON
        string json = result.GetJson(true);
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.json",
            InputStream = new MemoryStream(Encoding.UTF8.GetBytes(json)),
            ContentType = "application/json"
        });

        // Upload PDF
        using var outPdf = new MemoryStream();
        result.Save(outPdf, SaveFormat.Pdf, "Arial", PdfOptimizationMode.Basic);
        outPdf.Position = 0;
        await s3.PutObjectAsync(new PutObjectRequest
        {
            BucketName = bucket,
            Key = "output/invoice-001.pdf",
            InputStream = outPdf,
            ContentType = "application/pdf"
        });

        Console.WriteLine("OCR complete and results stored in S3.");
    }
}

Migliori pratiche

L’integrazione di Aspose.OCR con AWS S3 offre numerosi vantaggi, tra cui miglioramento della gestione dei dati e migliore scalabilità. ecco alcune delle migliori pratiche da considerare:

• Sicurezza *
Non usare mai i segreti di hardcode aws configure + AWS_PROFILE locale; utilizzare i ruoli IAM nella produzione.
Considerare la crittografia sul lato del server S3 (AES256 o KMS) su oggetti di risultato, e le politiche per-bucket con il minimo privilegio (mostrato sopra).Documentazione AWS)
Esecuzione
Utilizzare il pacchetto GPU (Aspose.OCR-GPU) sul hardware CUDA-capabile per accelerare OCR; lo stesso codice, esecuzione più veloce. (Documentazione Asposa)
Immagini preliminari per la qualità (descubbimento, denio) utilizzando RecognitionSettings / preset se necessario, e selezionare il giusto DetectAreasMode per i documenti. le opzioni API sono visualizzate nel riferimento. (di riferimento.aspose.com)
- Scalabilità *
Utilizzare i prefixes S3 come input/ e output/ per lavoro, e memorizzare insieme gli articoli OCR (TXT/JSON/PDF) per la tracciabilità.
Attivare la versione S3 se si desidera una storia auditiva e ritorni.
Considera di eseguire questo flusso in contenitori o senza server (ad esempio, AWS Batch/ECS/Lambda con EFS) per OCR parallelo su scala.

Seguendo queste linee guida, puoi integrare efficacemente Aspose.OCR con AWS S3 per semplificare il tuo flusso di lavoro OCR e migliorare le prestazioni generali della tua applicazione.

Riferimenti

I pacchetti e le opzioni di installazione (Aspose.OCR, Aspose.OCR-GPU). (Documentazione Asposa)
AsposeOcr.RecognizeImage(...) di sovraccarico; RecognitionResult.RecognitionText, GetJson, Save(...). (di riferimento.aspose.com)
AWS SDK per .NET: S3 creare / caricare / scaricare esempi. (Documentazione AWS)

Se vuoi, puoi anche aggiungere un piccolo Makefile o PowerShell script per eseguire questo end-to-end, oltre a un snippet CI (GitHub Actions) per spingere i risultati a S3 su commit.

[4]: https://reference.aspose.com/ocr/net/aspose.ocr/recognitionresult/ “RiconoscimentoRisulta

Esempio completo#

Prerequisiti#

Passo 1: Impostazione di Aspose.OCR per .NET#

Passo 2: Configurare AWS SDK per .NET#

Passo 3: Initializzare Aspose.OCR API#

Passo 4: Caricare le immagini su S3#

Passo 5: Realizzare OCR sulle immagini caricate#

Passo 6: Salvataggio dei risultati OCR in S3#

6.a) Salva come testo piatto#

6.b) Salva dettagliato JSON#

6.c) Salva un PDF di ricerca (o DOCX) e metti in S3#

Opzione: finito alla fine Program.cs#

Migliori pratiche#

Riferimenti#

More in this category