Skip to content

Week 2 :- Intermediate Representation (IR) Schema #3

Description

@Abhishek-Kumar-Rai5

Objective

Design and implement the project's Intermediate Representation (IR), the shared data model that bridges document understanding and information extraction.

The IR is designed as a structured curation workspace rather than a database schema. It preserves reported scientific observations, provenance, uncertainty, and transformation history while remaining suitable for downstream extraction, validation, and export.


Planned Work

IR Schema

Implement immutable Pydantic models for:

  • Citation
  • Site
  • Treatment
  • Species
  • Management events
  • Observation

Observation Model

The observation model will preserve:

  • reported value
  • reported units
  • converted value (when applicable)
  • statistical encoding
  • temporal information
  • provenance
  • confidence
  • observation level
  • unresolved status

Reported values will never be overwritten by derived values.

Provenance

Every extracted entity must retain:

  • originating document object
  • page
  • section
  • source span
  • table cell or figure reference (when applicable)

Uncertainty

Support explicit uncertainty representation including:

  • bounded date intervals
  • unresolved fields
  • confidence metadata

Calibration Protocol

Implement the protocol-specific design decisions discussed during project planning, including:

  • date min/max representation
  • observation-level distinction between treatment means and aggregated summaries

Deliverables

  • Complete IR schema
  • Pydantic validation
  • Comprehensive test suite
  • Fixture IR instances for all five ground-truth papers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions