segment#
This module provides utilities to segment a document into components.
See:
- class aws_textract_pipeline.segment.SegmentPdfResult(page_pdf_list: ~typing.List[~pymupdf.Document] = <factory>, page_image_list: ~typing.List[~pymupdf.Pixmap] = <factory>)[source]#
Returned object of
segment_pdf().To save
fitz.Documentobject to local file, use the following code:>>> res = SegmentPdfResult(...) >>> page = res.page_pdf_list[0] >>> page.save("/path/to/save/page.pdf")
To save
fitz.Pixmapobject to local file, use the following code:>>> res = SegmentPdfResult(...) >>> pixmap = res.page_image_list[0] >>> pixmap.save("/path/to/save/image.png", output="png")
To get width and height of the image, use the following code:
>>> pixmap.width >>> pixmap.height
- aws_textract_pipeline.segment.segment_pdf(pdf_content: bytes, dpi: int = 200) SegmentPdfResult[source]#
Segment PDF into pages.
- Parameters:
pdf_content – PDF content in bytes.
dpi – DPI of the image.