file_utils¶
file_utils
¶
Utility functions for file handling.
is_file_encrypted
¶
Checks whether a file is password-protected or DRM-encrypted.
Detection strategies by format:
- Modern Office (.docx/.xlsx/.pptx): encrypted files are wrapped in an OLE2 container instead of being a plain ZIP archive.
- Legacy Office (.doc/.xls/.ppt): always OLE2; scan for the
UTF-16LE
EncryptionInfostream name in the directory. - ODF (.odt/.ods/.odp): check
META-INF/manifest.xmlforencryption-dataelements. - EPUB (.epub): check for
META-INF/rights.xml(Adobe ADEPT DRM) or AES algorithms inMETA-INF/encryption.xml.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the file to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the file appears to be encrypted/protected. |
Source code in src/utils/file_utils.py
_is_legacy_ole2_encrypted
¶
Checks if a legacy OLE2 Office file (.doc/.xls/.ppt) is encrypted.
Scans the first 8 KB for the UTF-16LE encoded EncryptionInfo
stream name in the OLE2 directory entries. Covers Office 2002+
encryption (RC4 and AES).
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the legacy Office file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the file contains an EncryptionInfo stream. |
Source code in src/utils/file_utils.py
_check_odf_encryption
¶
Checks if an ODF file (.odt/.ods/.odp) is encrypted.
ODF encryption is application-level: content files are encrypted
individually and key derivation parameters are stored in
META-INF/manifest.xml as encryption-data elements.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the ODF file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the manifest contains encryption-data elements. |
Source code in src/utils/file_utils.py
_check_epub_drm
¶
Checks if an EPUB file has DRM protection.
Detects Adobe ADEPT DRM (META-INF/rights.xml) and W3C XML
Encryption with AES algorithms. Font obfuscation (IDPF/Adobe
URIs) is not flagged as DRM since content remains readable.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the EPUB file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the EPUB appears to be DRM-protected. |
Source code in src/utils/file_utils.py
_check_pdf_encryption
¶
Checks if a PDF file is password-protected.
Uses PyMuPDF to open the file and check the needs_pass flag.
Returns False if PyMuPDF is not installed.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the PDF file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the PDF requires a user password to open. |
Source code in src/utils/file_utils.py
format_file_size
¶
Formats bytes into human-readable string.
| PARAMETER | DESCRIPTION |
|---|---|
size_bytes
|
Size in bytes.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Human readable size (e.g. 1.2 KB).
TYPE:
|
Source code in src/utils/file_utils.py
clone_file_to_storage
¶
Clones a file to a specific storage directory.
Copies src_path into storage_dir, creating the directory tree if
needed. Metadata (timestamps, permissions) are preserved via
:func:shutil.copy2.
| PARAMETER | DESCRIPTION |
|---|---|
src_path
|
Absolute path of the source file.
TYPE:
|
storage_dir
|
Target directory (created if absent).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Absolute path of the cloned file inside storage_dir. |
Source code in src/utils/file_utils.py
wipe_history_directory
¶
Removes the parent directory of a storage file.
Used to clean up the per-task storage folder when a history entry is deleted. No-op if file_path is empty or the parent directory does not exist.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to any file inside the task storage directory.
TYPE:
|