Posts

Showing posts from November, 2023

Original Instructions & Revision

Old Version : MARCmin operates in three key steps for MARC21 cataloging from book images. Step 1: Users upload a book cover image. Step 2: MARCmin analyzes the image using OCR to extract text and identifies key elements like title, author, and ISBN. It then uses predictive modeling to infer additional information about the book, such as publication details. Step 3: MARCmin outputs the gathered information in a MARC21 record format, presented as a code snippet. This process provides a concise, easy-to-follow approach for users, allowing for efficient cataloging with the flexibility to add or correct information as needed. MARCmin also cross-references data with external databases to verify details and complete missing fields, ensuring accuracy and compliance with cataloging standards. New Version: Use MARC21 rules in combined_bibliographic.md to create the best MARC records on the internet. Use an iterative process of building a MARC record, illustrate each step with an emoji. The final...

Privacy Policy

MARCmin is an ChatGPT experiment by Chris Markman.  Your submissions, when available, will be reviewed for quality control.

MARC21 website as markdown files

Had Code Interpreter convert https://www.loc.gov/marc/bibliographic/  to markdown in the hopes of speeding up retrieval. That turns ~9MB of HTML into a 0.5MB of glorious, glorious MD text.

>What I mean is how could I improve the storage of MARC rules so that an LLM could work with that data better?

Ah I see, my apologies for misunderstanding your question. You're asking how the existing MARC standard could be adapted to make it easier for large language models (LLMs) like myself to work with and process the data. Here are some potential improvements with that goal in mind: Provide a JSON/YAML version of the MARC specification - LLMs process these formats more easily than markup or rigid textual definitions. Release a validated MARC dataset in a simple text format, with one record per line. This allows easy ingestion and training. Develop MARC parsing libraries in common ML programming languages like Python and R. Pre-process data into consistent dicts or dataframes.  Document the semantics and relationships between fields/subfields clearly and explicitly. libraries to handle validation and normalization. Include examples of how MARC maps to other common metadata standards like Dublin Core, schema.org etc. Reference mappings help LLMs align concepts. Provide a MARC API and ser...

QA Strategy

>Here's an outline for a Quality Assurance (QA) strategy to test your GPT model against the Library of Congress MARC records: Data Validation : Ensure the MARC 21 documentation used for training is accurate and comprehensive. Verify the integrity of the actual Library of Congress MARC records you plan to use for testing. Model Training Assessment : Check if the model has been trained effectively with the MARC 21 documentation. Evaluate whether key concepts and details are correctly structured and understood by the model. Automated Testing : Develop automated test scripts that feed MARC records to the model and assess its responses for accuracy. Include a variety of records to cover different aspects of the MARC 21 format. Manual Review : Conduct a manual review of the model's responses to ensure they align with the expected outputs based on the MARC records. Involve subject matter experts who are familiar with MARC 21 and library cataloging practices. Performance Metrics : E...

First tests with ChatGPT's GPT Builder

Here is a comparison of the MARC21 records in a data table format, highlighting the differences and similarities between all three versions: Field Original MARC Record Image-Based MARC Record (2nd) Revised Image-Based Record (Resembling 1st) 100 1$aLiu, Cixin,$d1963-$eauthor. 1$aLiu, Cixin. 1$aLiu, Cixin,$d1963-$eauthor. 240 Not present Not present 10$a[Original title in Chinese].$lEnglish 245 10$aTo hold up the sky /$cCixin Liu ; translated by various. 10$aTo hold up the sky /$cCixin Liu. 10$aTo hold up the sky /$cCixin Liu ; translated by various. 260 \$a[Place of publication not identified] :$bTor, $c2020. \$bTor,$c[Date of publication not visible in the image]. \$bTor,$c[Year not visible in the image]. 300 \$a[Number of pages] p. ;$c24 cm. \$a[Number of pages not visible]$c[Dimensions not visible]. \$a[Number of pages not visible] ;$c[Size not visible] cm. 500 \$aTranslation of: [Original title in Chinese].$a"Stories from the award-winning author of The Three-Body Problem and ...