Package io.goobi.viewer.controller
Class ALTOTools
java.lang.Object
io.goobi.viewer.controller.ALTOTools
ALTOTools class.
-
Field Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected static String
alto2Txt.static XMLStreamReader
static String
getALTOCoords
(de.intranda.digiverso.ocr.alto.model.superclasses.GeometricData element) getALTOCoords.static String
getFulltext
(String alto, String charset, boolean mergeLineBreakWords) getFullText.static String
getFulltext
(Path path, String encoding) Read the plain full-text from an alto file.static int
getMatchALTOWord
(de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word eleWord, String[] words) getMatchALTOWord.getNERTags
(String alto, String inCharset, NERTag.Type type) getNERTags.static String
getRotatedCoordinates
(String inCoords, int rotation, Dimension pageSize) getRotatedCoordinates.getWordCoords
(String altoString, String charset, Set<String> searchTerms) getWordCoords.getWordCoords
(String altoString, String charset, Set<String> searchTerms, int rotation) getWordCoords
(String altoString, String charset, Set<String> searchTerms, int proximitySearchDistance, int rotation) protected static Rectangle
rotate.
-
Field Details
-
TAG_LABEL_IGNORE_REGEX
ConstantTAG_LABEL_IGNORE_REGEX
.- See Also:
-
-
Method Details
-
getFulltext
Read the plain full-text from an alto file. Don't merge line breaks.- Parameters:
path
-encoding
-- Returns:
String
containing plain text from ALTO at the given path- Throws:
IOException
-
getFulltext
getFullText.
-
getNERTags
getNERTags.
- Parameters:
alto
- aString
object.inCharset
-type
- aNERTag.Type
object.- Returns:
- a
List
object.
-
alto2Txt
protected static String alto2Txt(String alto, String charset, boolean mergeLineBreakWords) throws IOException, XMLStreamException, org.jdom2.JDOMException alto2Txt.
- Parameters:
alto
- aString
object.charset
- ALTO charsetmergeLineBreakWords
- a boolean.- Returns:
- a
String
object. - Throws:
IOException
- if any.XMLStreamException
- if any.org.jdom2.JDOMException
-
createXmlParser
public static XMLStreamReader createXmlParser(InputStream is) throws FactoryConfigurationError, XMLStreamException -
getWordCoords
public static List<String> getWordCoords(String altoString, String charset, Set<String> searchTerms) getWordCoords.
-
getRotatedCoordinates
getRotatedCoordinates.
-
getWordCoords
public static List<String> getWordCoords(String altoString, String charset, Set<String> searchTerms, int rotation) - Parameters:
altoString
- String containing the ALTO XML documentcharset
-searchTerms
- Set of search termsrotation
- Image rotation in degrees- Returns:
- a
List
object.
-
getWordCoords
-
rotate
rotate.
-
getMatchALTOWord
public static int getMatchALTOWord(de.intranda.digiverso.ocr.alto.model.structureclasses.lineelements.Word eleWord, String[] words) getMatchALTOWord.
- Parameters:
eleWord
- aWord
object.words
- an array ofString
objects.- Returns:
- 1 if there is a match; 0 otherwise
-
getALTOCoords
public static String getALTOCoords(de.intranda.digiverso.ocr.alto.model.superclasses.GeometricData element) getALTOCoords.
- Parameters:
element
- aGeometricData
object.- Returns:
- a
String
object.
-