Skip to content

Reproducing auto-rubric results #27

@RAgNarokBear

Description

@RAgNarokBear

Thanks for maintaining this awesome project! It helps me a lot for my research in Reward Modeling.

In auto-rubric evaluation, I failed to reproduce the quantitative results reported in the paper.

So I wonder what is the prompt used in the official evaluation. Simply concatinating the Rubric Evaluation Prompt (Fig. 8) and Rubrics (Sec. G.1) as follows seems not working.

## Task Description
I will provide you with a set of rubrics, along with the current query and two responses. These rubrics are the primary basis for selecting the best answer. You must follow the steps specified in the Evaluation Process when conducting your evaluation process.

## Rubrics
## Theme 1: Factual Accuracy and Canonical Consistency
Theme: Ensure factual accuracy, canonical consistency, and avoid fabrication or hallucination in responses.
• Tip 1: For queries about Undertale, ensure all character motivations and gameplay mechanics align with established lore, avoiding speculative or contradictory claims.
• Tip 2: When discussing historical milestones like early synchronized sound cartoons, correctly attribute ”Steamboat Willie” instead of ”My Old Kentucky Home” to maintain reliability.
• Tip 3: In responses involving Hogwarts students, include only canonically portrayed students with academically accurate achievements, excluding professors or non-student figures.
• Tip 4: Avoid inventing Sumerian texts or fabricated survey links; instead, acknowledge missing context and request clarification when necessary, especially for niche cultural references.

## Theme 2: Strict Adherence to Prompt Requirements
Theme: Maintain strict adherence to prompt structure, formatting, and explicit user requirements.
• Tip 1: When asked for a single word, provide exactly one word without redundancy or additional suggestions, as in responses requiring minimal output.
• Tip 2: For prompts specifying 100 items, deliver a complete list even if the topic is broad, proactively selecting a relevant subject to fulfill the quantitative requirement.
• Tip 3: In tagline creation, directly incorporate core technology benefits like ”distance at impact” and avoid vague or redundant phrasing that dilutes product relevance.
• Tip 4: When the prompt requires the word ”scenery” followed by a colon and a one-word term, follow this exact syntactic structure without deviation.

## Theme 3: Clarity and Structured Organization
Theme: Prioritize clarity, conciseness, and structured organization to enhance readability and directness.
• Tip 1: For a ”Thank you” prompt, respond with a concise acknowledgment and an open invitation for further questions, avoiding assumptions about the user being a student or lawyer.
• Tip 2: When summarizing steps for building a dropshipping agent business, use bullet points or numbered lists to present key points logically and avoid hallucinated information.
• Tip 3: In audit findings related to deposit insurance boards, structure responses with precise, actionable items and conclude with a concise summary emphasizing implications.
• Tip 4: Avoid excessive formatting like bold text or unnecessary punctuation when explaining grammatical correctness, maintaining a straightforward and professional tone.

## Theme 4: Comprehensive and Detailed Analysis
Theme: Deliver comprehensive, detailed, and thematically coherent narratives or analyses that fully address all prompt elements.
• Tip 1: For a CFA Institute Investment Foundations® Certificate explanation, include curriculum, eligibility, exam format, preparation resources, benefits, and continuing education with specific examples.
• Tip 2: In a fantasy story response, incorporate rich narrative detail, distinct character development, and immersive world-building such as vivid settings and dynamic interactions.
• Tip 3: When addressing a tax-proportional legislature, outline mechanics, implications, data collection, representation quotas, equity concerns, and constitutional considerations comprehensively.
• Tip 4: For a horror anime scene, use INT./EXT. designations, emphasize atmospheric tension, and describe creature details like a rhombus tail and chameleon-like head to align with anime style.

## Theme 5: Narrative and Contextual Fidelity
Theme: Ensure narrative and contextual fidelity by preserving character dynamics, tone, and worldbuilding consistency.
• Tip 1: In responses involving Jade’s character, maintain her authoritative yet professional tone, avoiding hostile shifts that contradict established behavior.
• Tip 2: For stories featuring Emily from KikoRiki, preserve her role as a mischievous prankster and integrate the whimsical tone when describing her failed morph into Rosa and the orange rear end mishap.
• Tip 3: When continuing a narrative about diaper use over potty training, maintain a playful, child-friendly tone and avoid contradictions with the original theme.
• Tip 4: In therapeutic role-play scenarios, prioritize immersive engagement with the patient’s imaginative world through dialogue and validation, rather than clinical checklists.

## Process
1. Confirm the task scenario of the current query and select the corresponding evaluation rubrics.
2. Identify the best response that meets the most selected rubrics.

## Query
{query}

## Response A
{response_a}

## Response B
{response_b}

## Output Requirements
Please choose the better response. Response "A", "B", or "tie" within the tags. 
<preference>A/B/tie</preference>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions