迁移指南:从 1.x 到 2.x

PyPDF2<2.0.0文档)与 PyPDF2>=2.0.0文档)差异较大。

幸运的是,大多数更改只是简单的命名调整。本指南将帮助您从 PyPDF2 1.x(甚至原始的 PyPdf)迁移到 PyPDF2>=2.0.0

您可以使用更新版本运行代码,并通过运行以下命令显示弃用警告:

python -W all your_code.py  

导入和模块

  • PyPDF2.utils 已被移除。

  • PyPDF2.pdf 已被移除。您可以直接从 PyPDF2PyPDF2.generic 中导入所需内容。

命名调整

类名

基础类已重命名,因为它们不仅可以操作文件,还可以操作 ByteIO 流。同时,strict 参数的默认值从 strict=True 改为 strict=False

  • PdfFileReaderPdfReader

  • PdfFileWriterPdfWriter

  • PdfFileMergerPdfMerger

PdfFileReaderPdfFileMerger 不再支持 overwriteWarnings 参数。新行为默认 overwriteWarnings=False

函数、方法和属性名称

PyPDF2.xmp.XmpInformation:

  • rdfRootrdf_root

  • xmp_createDatexmp_create_date

  • xmp_creatorToolxmp_creator_tool

  • xmp_metadataDatexmp_metadata_date

  • xmp_modifyDatexmp_modify_date

  • xmpMetadataxmp_metadata

  • xmpmm_documentIdxmpmm_document_id

  • xmpmm_instanceIdxmpmm_instance_id

PyPDF2.generic:

  • readObjectread_object

  • convertToIntconvert_to_int

  • DocumentInformation.getTextDocumentInformation._get_text : 此方法通常不应使用;如果您需要,请告诉我。

  • readHexStringFromStreamread_hex_string_from_stream

  • initializeFromDictionaryinitialize_from_dictionary

  • createStringObjectcreate_string_object

  • TreeObject.hasChildrenTreeObject.has_children

  • TreeObject.emptyTreeTreeObject.empty_tree

在许多地方:

  • getObjectget_object

  • writeToStreamwrite_to_stream

  • readFromStreamread_from_stream

PdfReader class:

  • reader.getPage(pageNumber)reader.pages[page_number]

  • reader.getNumPages() / reader.numPageslen(reader.pages)

  • getDocumentInfometadata

  • flattenedPages attribute ➔ flattened_pages

  • resolvedObjects attribute ➔ resolved_objects

  • xrefIndex attribute ➔ xref_index

  • getNamedDestinations / namedDestinations attribute ➔ named_destinations

  • getPageLayout / pageLayoutpage_layout attribute

  • getPageMode / pageModepage_mode attribute

  • getIsEncrypted / isEncryptedis_encrypted attribute

  • getOutlinesget_outlines

  • readObjectHeaderread_object_header

  • cacheGetIndirectObjectcache_get_indirect_object

  • cacheIndirectObjectcache_indirect_object

  • getDestinationPageNumberget_destination_page_number

  • readNextEndLineread_next_end_line

  • _zeroXref_zero_xref

  • _authenticateUserPassword_authenticate_user_password

  • _pageId2Num attribute ➔ _page_id2num

  • _buildDestination_build_destination

  • _buildOutline_build_outline

  • _getPageNumberByIndirect(indirectRef)_get_page_number_by_indirect(indirect_ref)

  • _getObjectFromStream_get_object_from_stream

  • _decryptObject_decrypt_object

  • _flatten(..., indirectRef)_flatten(..., indirect_ref)

  • _buildField_build_field

  • _checkKids_check_kids

  • _writeField_write_field

  • _write_field(..., fieldAttributes)_write_field(..., field_attributes)

  • _read_xref_subsections(..., getEntry, ...)_read_xref_subsections(..., get_entry, ...)

PdfWriter class:

  • writer.getPage(pageNumber)writer.pages[page_number]

  • writer.getNumPages()len(writer.pages)

  • addMetadataadd_metadata

  • addPageadd_page

  • addBlankPageadd_blank_page

  • addAttachment(fname, fdata)add_attachment(filename, data)

  • insertPageinsert_page

  • insertBlankPageinsert_blank_page

  • appendPagesFromReaderappend_pages_from_reader

  • updatePageFormFieldValuesupdate_page_form_field_values

  • cloneReaderDocumentRootclone_reader_document_root

  • cloneDocumentFromReaderclone_document_from_reader

  • getReferenceget_reference

  • getOutlineRootget_outline_root

  • getNamedDestRootget_named_dest_root

  • addBookmarkDestinationadd_bookmark_destination

  • addBookmarkDictadd_bookmark_dict

  • addBookmarkadd_bookmark

  • addNamedDestinationObjectadd_named_destination_object

  • addNamedDestinationadd_named_destination

  • removeLinksremove_links

  • removeImages(ignoreByteStringObject)remove_images(ignore_byte_string_object)

  • removeText(ignoreByteStringObject)remove_text(ignore_byte_string_object)

  • addURIadd_uri

  • addLinkadd_link

  • getPage(pageNumber)get_page(page_number)

  • getPageLayout / setPageLayout / pageLayoutpage_layout attribute

  • getPageMode / setPageMode / pageModepage_mode attribute

  • _addObject_add_object

  • _addPage_add_page

  • _sweepIndirectReferences_sweep_indirect_references

PdfMerger class

  • __init__ parameter: strict=Truestrict=False (the PdfFileMerger still has the old default)

  • addMetadataadd_metadata

  • addNamedDestinationadd_named_destination

  • setPageLayoutset_page_layout

  • setPageModeset_page_mode

Page class:

  • artBox / bleedBox / cropBox / mediaBox / trimBoxartbox / bleedbox / cropbox / mediabox / trimbox

    • getWidth, getHeight width / height

    • getLowerLeft_x / getUpperLeft_xleft

    • getUpperRight_x / getLowerRight_xright

    • getLowerLeft_y / getLowerRight_ybottom

    • getUpperRight_y / getUpperLeft_ytop

    • getLowerLeft / setLowerLeftlower_left property

    • upperRightupper_right

  • mergePagemerge_page

  • rotateClockwise / rotateCounterClockwiserotate_clockwise

  • _mergeResources_merge_resources

  • _contentStreamRename_content_stream_rename

  • _pushPopGS_push_pop_gs

  • _addTransformationMatrix_add_transformation_matrix

  • _mergePage_merge_page

XmpInformation class:

  • getElement(..., aboutUri, ...)get_element(..., about_uri, ...)

  • getNodesInNamespace(..., aboutUri, ...)get_nodes_in_namespace(..., aboutUri, ...)

  • _getText_get_text

utils.py:

  • matrixMultiply ➔ `matrix_multiply

  • RC4_encrypt is moved to the security module

参数名称

  • PdfWriter.get_page: pageNumberpage_number

  • PyPDF2.filters (all classes): decodeParmsdecode_parms

  • PyPDF2.filters (all classes): decodeStreamDatadecode_stream_data

  • pagenumpage_number

  • PdfMerger.merge: positionpage_number

  • PdfWriter.add_outline_item_destination: destpage_destination

  • PdfWriter.add_named_destination_object: destpage_destination

  • PdfWriter.encrypt: user_pwduser_password

  • PdfWriter.encrypt: owner_pwdowner_password

弃用

一些类/函数已被弃用且没有替换:

  • PyPDF2.utils.ConvertFunctionsToVirtualList

  • PyPDF2.utils.formatWarning

  • PyPDF2.isInt(obj): 使用 instance(obj, int) 替代

  • PyPDF2.u_(s): 直接使用 s

  • PyPDF2.chr_(c): 使用 chr(c) 替代

  • PyPDF2.barray(b): 使用 bytearray(b) 替代

  • PyPDF2.isBytes(b): 使用 instance(b, type(bytes())) 替代

  • PyPDF2.xrange_fn: 使用 range 替代

  • PyPDF2.string_type: 使用 str 替代

  • PyPDF2.isString(s): 使用 instance(s, str) 替代

  • PyPDF2._basestring: 使用 str instead

  • b_(...) 已被删除。您通常应该能够直接使用字节对象,否则您可以复制此内容