Class AltoTextReader

java.lang.Object
io.goobi.viewer.controller.model.alto.AltoTextReader

public class AltoTextReader extends Object
Parses an ALTO XML document and extracts its plain text content, preserving the structural hierarchy of pages, text blocks, lines, and words. Optionally applies a chain of TextEnricher instances to augment individual word tokens (e.g. with named-entity markup).
  • Constructor Details

  • Method Details

    • extractText

      public String extractText()