Posts

Switching to Local LLM Setup

This project isn't over, but I am looking at what it would take to set up the same "multi-modal insta-MARC" functionality with an offline setup. To me this seems like the more interesting use case. I think I've already proved that yes, adding the MARC21 rulebook in an RAG setup can produce better results, but what's the point if your Chatbot is basically doing a Bing search with ISBN numbers? Copy-cataloging has been around for a long time (and perhaps catalogers would be better suited creating custom GPTs based on their own local practices or integrating with a tool like MarcEdit ).  I'm also not seeing any librarians looking seriously at local LLMs as an option...c'mon, Chat with RTX  is right there! Mixtral ! Gemma ! Don't put all of your eggs in one basket!

The 5 Laws of Library Science but for Generative AI

For reference, here's a wiki about the original 5 laws proposed by S. R. Ranganathan in 1931 . And now the revision: Adaptive Purpose : AI should dynamically serve specific user needs. Inclusive Personalization : AI must cater to diverse users respectfully. Ethical Insight : AI responses should be accurate, context-aware, and ethically informed. Efficient Utility : AI must provide quick and clear answers to maximize user time. Continual Growth : AI should evolve continuously, learning from interactions and data. So I had a convo w/ GPT-4 about how to revise these and make them more concise. Weird how they almost sound like custom instructions, no? We basically have #1-4 already in place, and #5 is very close to being real w/ things like MemGPT already out there .  (I haven't read much about how memory management works w/ Generative AI but my guess is that we'll quickly need better garbage collection ...and certain apps will probably need to avoid long-term memory entirely)

Book Publishing as a "Synthetic Data" Source (with LoC as Source of Truth)

"Your idea of using Large Language Models (LLMs) to catalog new books and then cross-referencing this with actual cataloging done by the Library of Congress is quite innovative and could have several benefits in the realm of data management and library science. Let’s explore this idea in more detail: Automated Cataloging with LLMs : LLMs can process and understand large volumes of text efficiently. They could be trained to read new books and extract key information such as titles, authors, themes, genres, and other metadata. This could significantly speed up the cataloging process. Quality and Consistency Control : By cross-referencing with the Library of Congress catalog, which is a standard in bibliographic data, you can ensure that the data generated by the LLM is accurate and consistent with established cataloging practices. Handling Large Volumes of Books: With the sheer volume of books published every year, LLMs can provide a scalable solution to catalog new books quickly a...

Original Instructions & Revision

Old Version : MARCmin operates in three key steps for MARC21 cataloging from book images. Step 1: Users upload a book cover image. Step 2: MARCmin analyzes the image using OCR to extract text and identifies key elements like title, author, and ISBN. It then uses predictive modeling to infer additional information about the book, such as publication details. Step 3: MARCmin outputs the gathered information in a MARC21 record format, presented as a code snippet. This process provides a concise, easy-to-follow approach for users, allowing for efficient cataloging with the flexibility to add or correct information as needed. MARCmin also cross-references data with external databases to verify details and complete missing fields, ensuring accuracy and compliance with cataloging standards. New Version: Use MARC21 rules in combined_bibliographic.md to create the best MARC records on the internet. Use an iterative process of building a MARC record, illustrate each step with an emoji. The final...

Privacy Policy

MARCmin is an ChatGPT experiment by Chris Markman.  Your submissions, when available, will be reviewed for quality control.

MARC21 website as markdown files

Had Code Interpreter convert https://www.loc.gov/marc/bibliographic/  to markdown in the hopes of speeding up retrieval. That turns ~9MB of HTML into a 0.5MB of glorious, glorious MD text.

>What I mean is how could I improve the storage of MARC rules so that an LLM could work with that data better?

Ah I see, my apologies for misunderstanding your question. You're asking how the existing MARC standard could be adapted to make it easier for large language models (LLMs) like myself to work with and process the data. Here are some potential improvements with that goal in mind: Provide a JSON/YAML version of the MARC specification - LLMs process these formats more easily than markup or rigid textual definitions. Release a validated MARC dataset in a simple text format, with one record per line. This allows easy ingestion and training. Develop MARC parsing libraries in common ML programming languages like Python and R. Pre-process data into consistent dicts or dataframes.  Document the semantics and relationships between fields/subfields clearly and explicitly. libraries to handle validation and normalization. Include examples of how MARC maps to other common metadata standards like Dublin Core, schema.org etc. Reference mappings help LLMs align concepts. Provide a MARC API and ser...