by 89jobrien
处理 PDF 文件通常需要掌握多个 Python 库及其 API 的专业知识。此技能为使用 pdfplumber 和 pypdf 库提取文本和表格、填写表单、合并文档以及处理 PDF 任务提供全面指导。
1. 打开 Claude 聊天界面
2. 点击下方 "📋 复制" 按钮
3. 粘贴到 Claude 聊天框中并发送
4. 输入 "使用 pdf-processing 技能" 开始使用
=== pdf-processing 技能 === 作者: 89jobrien 描述: 处理 PDF 文件通常需要掌握多个 Python 库及其 API 的专业知识。此技能为使用 pdfplumber 和 pypdf 库提取文本和表格、填写表单、合并文档以及处理 PDF 任务提供全面指导。 使用方法: 1. 调用技能: "使用 pdf-processing 技能" 2. 提供相关信息: 根据技能要求提供必要参数 3. 查看结果: 技能会返回处理结果 示例: "使用 pdf-processing 技能,帮我分析一下这段代码"
这种方法适用于所有 Claude 用户,不需要安装额外工具。
productivity
safe
Use pdfplumber to extract text from PDFs:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Extract tables from PDFs with automatic detection:
import pdfplumber
with pdfplumber.open("report.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
Process multi-page documents efficiently:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
full_text = ""
for page in pdf.pages:
full_text += page.extract_text() + "\n\n"
print(full_text)
For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.
Combine multiple PDF files:
from pypdf import PdfMerger
merger = PdfMerger()
for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
Extract specific pages or ranges:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Extract pages 2-5
for page_num in range(1, 5):
writer.add_page(reader.pages[page_num])
with open("output.pdf", "wb") as output:
writer.write(output)
Extract and save text:
import pdfplumber
with pdfplumber.open("input.pdf") as pdf:
text = "\n\n".join(page.extract_text() for page in pdf.pages)
with open("output.txt", "w") as f:
f.write(text)
Extract tables to CSV:
import pdfplumber
import csv
with pdfplumber.open("tables.pdf") as pdf:
tables = pdf.pages[0].extract_tables()
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
for table in tables:
writer.writerows(table)
Handle common PDF issues:
import pdfplumber
try:
with pdfplumber.open("document.pdf") as pdf:
if len(pdf.pages) == 0:
print("PDF has no pages")
else:
text = pdf.pages[0].extract_text()
if text is None or text.strip() == "":
print("Page contains no extractable text (might be scanned)")
else:
print(text)
except Exception as e:
print(f"Error processing PDF: {e}")
View Count
0
Download Count
0
Favorite Count
0
Quality Score
71