by softaworks
许多技能浪费令牌在克劳德已经知道的内容上。本技能提供了一个系统化的8维度120分评分框架,用于评估技能设计质量、识别令牌浪费并生成可操作的改进建议。
1. 打开 Claude 聊天界面
2. 点击下方 "📋 复制" 按钮
3. 粘贴到 Claude 聊天框中并发送
4. 输入 "使用 skill-judge 技能" 开始使用
=== skill-judge 技能 === 作者: softaworks 描述: 许多技能浪费令牌在克劳德已经知道的内容上。本技能提供了一个系统化的8维度120分评分框架,用于评估技能设计质量、识别令牌浪费并生成可操作的改进建议。 使用方法: 1. 调用技能: "使用 skill-judge 技能" 2. 提供相关信息: 根据技能要求提供必要参数 3. 查看结果: 技能会返回处理结果 示例: "使用 skill-judge 技能,帮我分析一下这段代码"
这种方法适用于所有 Claude 用户,不需要安装额外工具。
productivity
safe
A comprehensive evaluation framework for assessing Agent Skill quality against official specifications and best practices. This skill provides multi-dimensional scoring and actionable improvement suggestions for SKILL.md files and skill packages.
Skill Judge exists to solve a critical problem: most Skills waste tokens on knowledge Claude already has.
The skill helps you evaluate whether a Skill actually adds value by measuring its "knowledge delta" - the gap between what the Skill provides and what Claude already knows. A good Skill should be a compressed expert brain, not a tutorial.
Good Skill = Expert-only Knowledge - What Claude Already Knows
This skill helps you identify:
Use Skill Judge when you need to:
Trigger phrases:
First Pass - Knowledge Delta Scan: Read the SKILL.md and categorize each section as:
Structure Analysis: Check frontmatter validity, line count, reference files, design pattern, and loading triggers
Score Each Dimension: Evaluate against 8 dimensions with specific evidence and justifications
Calculate Total and Grade: Sum scores (max 120 points) and assign grade
Generate Report: Produce structured report with scores, critical issues, and improvements
| Dimension | Max Points | What It Measures |
|---|---|---|
| D1: Knowledge Delta | 20 | Does the Skill add genuine expert knowledge? (THE CORE DIMENSION) |
| D2: Mindset + Procedures | 15 | Does it transfer expert thinking patterns and domain-specific workflows? |
| D3: Anti-Pattern Quality | 15 | Does it have effective NEVER lists with specific reasons? |
| D4: Specification Compliance | 15 | Is the frontmatter valid? Is the description comprehensive? |
| D5: Progressive Disclosure | 15 | Is content properly layered for on-demand loading? |
| D6: Freedom Calibration | 15 | Is specificity appropriate for task fragility? |
| D7: Pattern Recognition | 10 | Does it follow an established official pattern? |
| D8: Practical Usability | 15 | Can an Agent actually use this Skill effectively? |
| Grade | Percentage | Meaning |
|---|---|---|
| A | 90%+ (108+) | Excellent - production-ready expert Skill |
| B | 80-89% (96-107) | Good - minor improvements needed |
| C | 70-79% (84-95) | Adequate - clear improvement path |
| D | 60-69% (72-83) | Below Average - significant issues |
| F | <60% (<72) | Poor - needs fundamental redesign |
The skill teaches you to recognize three types of content:
| Type | Definition | Treatment |
|---|---|---|
| Expert | Claude genuinely doesn't know this | Must keep - this is the Skill's value |
| Activation | Claude knows but may not think of | Keep if brief - serves as reminder |
| Redundant | Claude definitely knows this | Should delete - wastes tokens |
Skill Judge identifies and evaluates against five established patterns:
| Pattern | Lines | Best For | Example |
|---|---|---|---|
| Mindset | ~50 | Creative tasks requiring taste | frontend-design |
| Navigation | ~30 | Multiple distinct scenarios | internal-comms |
| Philosophy | ~150 | Art/creation requiring originality | canvas-design |
| Process | ~200 | Complex multi-step projects | mcp-builder |
| Tool | ~300 | Precise operations on specific formats | docx, pdf, xlsx |
The skill identifies 9 common failure patterns:
Evaluate the skill at skills/my-new-skill/SKILL.md
Compare the quality of skills/skill-a and skills/skill-b
How can I improve the knowledge delta in my skill?
What pattern does this skill follow, and is it the right choice?
Skill Judge produces a structured evaluation report:
# Skill Evaluation Report: [Skill Name]
## Summary
- **Total Score**: X/120 (X%)
- **Grade**: [A/B/C/D/F]
- **Pattern**: [Mindset/Navigation/Philosophy/Process/Tool]
- **Knowledge Ratio**: E:A:R = X:Y:Z
- **Verdict**: [One sentence assessment]
## Dimension Scores
[Table with scores for all 8 dimensions]
## Critical Issues
[Must-fix problems]
## Top 3 Improvements
[Prioritized improvement suggestions]
## Detailed Analysis
[In-depth analysis for dimensions scoring below 80%]
Do:
Never:
When evaluating any Skill, always ask:
"Would an expert in this domain, looking at this Skill, say: 'Yes, this captures knowledge that took me years to learn'?"
If yes, the Skill has genuine value. If no, it's compressing what Claude already knows.
SKILL EVALUATION QUICK CHECK
KNOWLEDGE DELTA (most important):
[ ] No "What is X" explanations for basic concepts
[ ] No step-by-step tutorials for standard operations
[ ] Has decision trees for non-obvious choices
[ ] Has trade-offs only experts would know
[ ] Has edge cases from real-world experience
MINDSET + PROCEDURES:
[ ] Transfers thinking patterns (how to think about problems)
[ ] Has "Before doing X, ask yourself..." frameworks
[ ] Includes domain-specific procedures Claude wouldn't know
ANTI-PATTERNS:
[ ] Has explicit NEVER list
[ ] Anti-patterns are specific, not vague
[ ] Includes WHY (non-obvious reasons)
SPECIFICATION:
[ ] Valid YAML frontmatter
[ ] Description answers: WHAT, WHEN, KEYWORDS
[ ] Description specific enough for Agent activation
STRUCTURE:
[ ] SKILL.md < 500 lines (ideal < 300)
[ ] Loading triggers embedded in workflow
[ ] Has "Do NOT Load" for preventing over-loading
FREEDOM:
[ ] Creative tasks -> High freedom (principles)
[ ] Fragile operations -> Low freedom (exact scripts)
USABILITY:
[ ] Decision trees for multi-path scenarios
[ ] Working code examples
[ ] Error handling and fallbacks
None. Skill Judge is self-contained and requires no external tools or dependencies.
View Count
0
Download Count
0
Favorite Count
0
Quality Score
72