Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Seaton, Alexa"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Building an AI-Powered Archive for US Science Policy Documents
    (Rice University, 2025) Xu, Yujie; Seaton, Alexa; Fondren Fellows
    This project aims to develop an open-source, Django-based web application to support the automated processing of complex, born-digital documents—specifically PDFs released under Freedom of Information Act (FOIA) requests. The application will serve as a digital repository within the White House Scientist and Science Policy Dynamic Digital Archive, hosted by the Woodson Research Center. Leveraging advanced AI technologies, including optical and layout recognition and integration with Large Language Models (LLMs), the tool will streamline data extraction, analysis, and searchability of irregularly formatted documents, enhancing accessibility and research capabilities.
  • Loading...
    Thumbnail Image
    Item
    Employing ML Methods on Digitized FOIA Requests for Improved Discoverability and Policy Research
    (Rice University, 2025) Seaton, Alexa; Xu, Yujie; Von Arx, Devin; Traylor, Jordan; Jin, Ying; Evans, Kenneth Mellinger; Baker Institute, Science and Technology Policy Program
    Born-digital records pose challenges for digital preservation due to their unstructured formats and noncompliance with accessibility standards. This project introduces a modular, open-source workflow to batch process large, mixed media PDFs—many obtained through FOIA requests—by leveraging OCR, AI, and named-entity recognition. Built for the White House Scientists Archive, this system enhances discoverability and usability of digitized records across administrations and supports metadata extraction at scale. Key tools include Mistral AI for OCR, Apache Tika for entity recognition, and a finet uned Mistral model for metadata generation.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892