CyberSkill

Claude Certified Architect

Mock Exam - by CyberSkill

All sample questions
Extraction pipelines

Extraction pipelines sample questions

Five original sample questions with answers and explanations for the extraction pipelines domain of the Claude Certified Architect - Foundations exam. They are written by CyberSkill and kept separate from the mock question bank, so the answers are shown here. This is an unofficial study aid and is not affiliated with or endorsed by Anthropic.

1. An invoice extractor reads dates like 03/04/2025 that could be March 4 or April 3. The design that avoids silent errors is to:

  • A. Assume the United States month-first format everywhere.

    A blanket assumption silently mislabels every locale that writes the day first.

  • B. Require an ISO date in the output schema, and when the input is ambiguous, flag the field for review instead of guessing. (correct)

    A strict output format plus an explicit ambiguity flag keeps bad dates out of the data and surfaces the ones that need a human.

  • C. Store the date as the raw string and sort it out later.

    Deferring the parse pushes the ambiguity downstream where it is harder to resolve.

  • D. Drop any date that is ambiguous.

    Silently dropping data loses records that are often recoverable with a quick check.

Why: Pin the output to an unambiguous format and flag genuinely ambiguous inputs for review, rather than assuming one locale or discarding data.

2. A field the schema expects is simply not present in the source document. The extractor should:

  • A. Fill the field with a plausible value inferred from the rest of the document.

    Inventing a value is a hallucination that corrupts the record quietly.

  • B. Return null for that field and mark it as not found, leaving the rest of the extraction intact. (correct)

    An explicit null with a not-found marker is honest and keeps the other fields usable.

  • C. Fail the entire extraction because one field is missing.

    One absent optional field should not throw away the data that was found.

  • D. Repeat the previous record value for that field.

    Carrying a value over from another record fabricates data and is hard to detect later.

Why: Represent a missing field as an explicit null marked not-found, rather than inventing a value or discarding the whole record.

3. A contract is too long to fit in one context window, and you need fields from across the whole document. The dependable approach is to:

  • A. Truncate the document to what fits and extract from the first part.

    Truncation drops the sections where later fields live and silently loses them.

  • B. Chunk the document with slight overlap, extract per chunk, then merge and reconcile the fields. (correct)

    Overlapping chunks plus a merge step cover the whole document and catch fields that straddle a boundary.

  • C. Summarize the document first, then extract from the summary.

    A summary drops the exact values extraction needs and adds a lossy step.

  • D. Raise the temperature so the model fills in the missing parts.

    Temperature changes variation, not coverage; it cannot recover content the model never saw.

Why: For oversized inputs, chunk with overlap and merge the per-chunk results, rather than truncating or extracting from a lossy summary.

4. An extractor must label each support ticket with one of five priority levels. To stop the model from inventing new labels, you should:

  • A. Ask for the priority as free text and clean it up afterward.

    Free text invites variants like urgent, URGENT, and P1 that you then have to reconcile.

  • B. Constrain the field to the five allowed values in the schema or tool definition, and reject anything else. (correct)

    An enumerated, validated field forces the output into the known set and rejects stray labels at the source.

  • C. List the five levels in the prompt and hope the model complies.

    A prompt hint is not enforcement; the model can still return something off-list.

  • D. Accept any label and map unknowns to the closest match later.

    Post-hoc mapping is guesswork that can silently mis-bucket tickets.

Why: Constrain category fields to an enumerated, validated set so the output stays on-list, rather than relying on prompt wording or cleanup afterward.

5. An extractor pulls line items and an invoice total from a receipt. The strongest integrity check before accepting the output is to:

  • A. Trust the total field because it is printed prominently.

    A printed total can be misread or itself be wrong; trust is not a check.

  • B. Verify that the line items sum to the extracted total, and on a mismatch retry or flag the record. (correct)

    A sum check ties the parts to the whole and catches both misread line items and a misread total.

  • C. Check only that the total is a number.

    A type check confirms it is numeric, not that it is correct.

  • D. Accept the first extraction without checking.

    Skipping validation lets arithmetic errors pass straight into the data.

Why: Cross-check that line items reconcile with the stated total and retry or flag on a mismatch, rather than trusting any single printed figure.

Sample questions for the other domains

Practice extraction pipelines for real

These five are a taste. The full free mock has 15 extraction pipelines questions among its 60, under a 120-minute timer and scored against the 720 pass line, with an explanation on every option.