Class AltoTextReader
java.lang.Object
io.goobi.viewer.controller.model.alto.AltoTextReader
Parses an ALTO XML document and extracts its plain text content, preserving the structural
hierarchy of pages, text blocks, lines, and words. Optionally applies a chain of
TextEnricher instances to augment individual word tokens (e.g. with named-entity markup).-
Constructor Summary
ConstructorsConstructorDescriptionAltoTextReader(String altoString, String encoding, TextEnricher... textEnricher) -
Method Summary
-
Constructor Details
-
AltoTextReader
public AltoTextReader(String altoString, String encoding, TextEnricher... textEnricher) throws IOException, org.jdom2.JDOMException - Throws:
IOExceptionorg.jdom2.JDOMException
-
-
Method Details
-
extractText
-