Skip to content

PdfTextExtractor

Category: Web

Source: pdf_text_extractor.dart

Classes

PdfExtractionResult

Result of PDF text extraction.

Constructor

dart
PdfExtractionResult({this.text, this.error, this.pageCount = 0})
dart
factory PdfExtractionResult.withError(String error)
dart
PdfExtractionResult(error: error)

Properties

PropertyTypeDescription
textString?
errorString?
pageCountint
isSuccessbool get
isSuccessbool get

PdfTextExtractor

Extracts text from PDF bytes using the pdftotext CLI tool.

pdftotext is part of poppler-utils, available on macOS (brew install poppler), Linux (apt install poppler-utils), and Windows (scoop/choco).

Constructor

dart
PdfTextExtractor({this.timeoutSeconds = 60})

Properties

PropertyTypeDescription
timeoutSecondsint

Methods

static bool isPdfContent(Uint8List bytes)

Check if the PDF magic bytes are present.

static bool isPdfContentType(String contentType)

Check if a content-type header indicates PDF.

static Future<bool> checkPdftotextAvailable()

Check whether pdftotext is available on this system.

Future<PdfExtractionResult> extract(Uint8List bytes)

Extract text from PDF [bytes] using pdftotext.

Writes bytes to a temp file, runs pdftotext, reads output, cleans up.

Released under the MIT License.