Class AltoSearchParser


  • public class AltoSearchParser
    extends AbstractSearchParser

    AltoSearchParser class.

    Author:
    florian
    • Constructor Detail

      • AltoSearchParser

        public AltoSearchParser()
    • Method Detail

      • findWordMatches

        public List<List<de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word>> findWordMatches​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word> words,
                                                                                                                   String regex)

        findWordMatches.

        Parameters:
        words - a List object.
        regex - a String object.
        Returns:
        a List object.
      • findLineMatches

        public Map<org.apache.commons.lang3.Range<Integer>,​List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line>> findLineMatches​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> lines,
                                                                                                                                                   String regex)

        findLineMatches.

        Parameters:
        lines - a List object.
        regex - a String object.
        Returns:
        a Map object.
      • getText

        public String getText​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> lines)

        getText.

        Parameters:
        lines - a List object.
        Returns:
        a String object.
      • getLines

        public List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> getLines​(de.intranda.digiverso.ocr.alto.model.structureclasses.logical.AltoDocument doc)

        getLines.

        Parameters:
        doc - a AltoDocument object.
        Returns:
        a List object.
      • getWords

        public List<de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word> getWords​(de.intranda.digiverso.ocr.alto.model.structureclasses.logical.AltoDocument doc)

        getWords.

        Parameters:
        doc - a AltoDocument object.
        Returns:
        a List object.
      • getContainingLines

        public List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> getContainingLines​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> allLines,
                                                                                                   int indexStart,
                                                                                                   int indexEnd)

        getContainingLines.

        Parameters:
        indexStart - a int.
        indexEnd - a int.
        allLines - a List object.
        Returns:
        a List object.
      • getLineStartIndex

        public int getLineStartIndex​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> allLines,
                                     de.intranda.digiverso.ocr.alto.model.structureclasses.Line line)

        getLineStartIndex.

        Parameters:
        allLines - a List object.
        line - a Line object.
        Returns:
        a int.
      • getLineEndIndex

        public int getLineEndIndex​(List<de.intranda.digiverso.ocr.alto.model.structureclasses.Line> allLines,
                                   de.intranda.digiverso.ocr.alto.model.structureclasses.Line line)

        getLineEndIndex.

        Parameters:
        allLines - a List object.
        line - a Line object.
        Returns:
        a int.
      • getPrecedingText

        public String getPrecedingText​(de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word w,
                                       int maxLength)

        getPrecedingText.

        Parameters:
        w - a Word object.
        maxLength - a int.
        Returns:
        a String object.
      • getSucceedingText

        public String getSucceedingText​(de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word w,
                                        int maxLength)

        getSucceedingText.

        Parameters:
        w - a Word object.
        maxLength - a int.
        Returns:
        a String object.