I, Oliver Bonham-Carter 👋

Assistant Professor in Computer Science, Allegheny College

I, Oliver Bonham-Carter, In June

I, Oliver Bonham-Carter 👋

Assistant Professor in Computer Science, Allegheny College

Infomaid: An AI and RAG-Enabled Learning Application

logo

Date: 4 June 2024

Oliver Bonham-Carter

Email: obonhamcarter at allegheny.edu

MIT Licence

Contents

Overview

Infomaid is a simple AI prompt-based solution with built in Retrieval augmented generation (RAG) support!

Welcome to this simple AI application! Infomaid is an experimental AI prompt-driven solution (i.e., each “chat” involves writing a separate prompt to use with a new execution of Infomaid) to help complete text-based work with information.

The software runs locally, without the need to send information for processing to another machine online. This application requires Ollama, for service. Parts of this project code for working with PDFs were borrowed from Pixegami’s RAG tutorial at Reference. Much thanks!

Prerequisites

Before you start, make sure you have the following softwares installed.

Set Up Local Models for Ollama

The below commands will install the models that Ollama will require to perform its functions. Note, a typical model for Ollama is about 4 GB in size. As there are two models to install, this project will take about 8 GB of space.

ollama pull mistral
ollama pull nomic-embed-text

Setting Up the Project

We will use Poetry to manage the virtual environment for the project. Use install to download all the necessary packages for your project.

poetry install

Check that the software is working on your system.

poetry run infomaid --help

Use online help to help you to remember how to use the parameters. Sample commands are privided to copy and paste with editing.

poetry run infomaid --bighelp

Some of the types of parameters are the following.

poetry run infomaid --count 1
poetry run infomaid --count 2 --prompt "name four shapes"
poetry run infomaid --resetdb
poetry run infomaid --usepdfdata --promptfile "promptFiles/myPrompt.txt"
poetry run infomaid --count 2 --usepdfdata --prompt "what is the article's main idea?"

Execution

Parameters

Generation

With Infomaid, users may ask the AI to prepare information from prompts such as outlines, emails, and other types of information. Requests can be made with a prompt that may be entered at the command line, inputted after execution, or entered as a text file. The text file may contain large prompts where there are lots of details to consider. In addition, the text file may help to automate jobs where the prompt is created automatically by another task.

Output

All results are placed in the 0_out/ directory. The output files are listed according to the Ollama model that was used to create their content. Automatically created will be a hidden file called .mistral_currentStoryIndex.txt that will keep track of the file indexing system. If the file is removed, then Infomaid may overwrite the existing files. For new jobs on Infomaid, it is recommended that 0_out/ be removed, or moved somewhere outside of the project directory, in efforts to organize output by prompt.

Working with PDF Data

Infomaid also allows the user to interact with PDF documents to search for ideas which are contained (somewhere) in the documents. Retrieval Augmented Generation, or (RAG), is a natural language processing (NLP) technique that harnesses information retrieval from documents for the delivery of generated information through the use of generative-based artificial intelligence (AI) models.

Sample Project

For instance, imagine that the user wishes to create a draft of a recommendation letter for someone (i.e., a student) who has supplied a current curriculum vitae (CV) as a PDF document. Using Infomaid, a draft of the letter may be written that has been informed by the CV. To use this RAG functionality in this project, the command line parameter, --usepdfdata, must be utilized to execute the program. See the project’s online help for a sample bash command line scrip to engage the RAG feature.

A prompt for such a task would be the following;

Write a letter of recommendation for MIT graduate school for AstroBill. Use the details from the data to complete the draft.

To set up the project, the PDF of the CV must first be copied into the data/ directory of the project. It is important to note that other non-related PDF documents ought to be removed from this directory to prevent interference with the letter-writing task. The following command is necessary to update the working dataset involving PDFs.

poetry run infomaid --resetdb

The below output will confirm that the database has been updated with the new PDF-derived information

Output:

Resetting database: {resetDB}
Clearing Database
Number of existing documents in DB: 0
Adding new documents: 113

Next, the prompt may be introduced with the following command. Note, this command will return three potential letters that may differ in quality.

poetry run infomaid --count 3 --usepdfdata --prompt "Write a letter of recommendation for MIT graduate school for AstroBill. Use the details from the data to complete the draft."

Same command using the --promptfile FILE.TXT parameter

poetry run infomaid --count 3 --usepdfdata --promptfile promptFiles/mit.txt

Command Output:

Code prompt: Write a letter of recommendation for MIT graduate school for AstroBill. Use the details from the data to complete the draft. Model: mistral Number of stories to create: 3

The results are Markdown files that will appear in the 0_out/ directory.

Dear Admissions Committee,

I am writing this letter in the highest regard and with great enthusiasm to recommend a remarkable individual, AstroBill, for your esteemed graduate program at MIT. AstroBill, also known as Bill on planet Zirconia, is a multifaceted genius with an insatiable thirst for knowledge and innovation that sets him apart from his peers.

...

In summary, AstroBill's unique blend of intellect, creativity, adaptability, determination, and compassion make him an outstanding candidate for MIT's graduate program. His diverse skills and experiences will enrich any academic or professional environment, and I have no doubt that he will continue to excel in your esteemed institution.

(Or whatever!! Now the letter can be edited to add a human-touch and extra value.)

Testing the Code

The code may be tested to determine functionality. At present, there are two tests; (1) general execution and (2) to determine whether the querying system if working for pdf data.

To run tests using pytest which is already installed, use the below commands.

poetry install # initialize project
poetry run infomaid --resetdb --usepdf # populate pdf db
poetry run pytest # run tests with the pdf database.

Ethical Note

While there is a lot of convenience in using AI to prepare drafts of letters and other communications, in all this automation, it is important to have a human presence to preside over the generated textual (or graphical work). While AI systems excel at processing vast amounts of data and executing tasks with remarkable efficiency, they lack the nuanced understanding and ethical judgment inherent to human cognition, in addition to the sense of ethics that ought to come from the human world.

Involving ethics in decisions where machines have made the choices (as strange as that may seem) is essential in domains involving communication. Human oversight ensures that communications, whether they involve customer interactions, inter-office correspondence, or public statements, adhere to ethical standards, tone, and context sensitivity. In addition, decisions influenced by AI algorithms must be subjected to human judgment before implementation. Human evaluators can consider broader implications, ethical ramifications, and potential biases that AI systems might overlook. This “human-touch” can therefore help to safeguard against the potential and unintended consequences which may occur at the intersection of data and decision-making, to name one such area.

With this in mind, the Infomaid project must be used responsibly. The project is to serve educational purposes – it is to instruct on the uses of AI, allow for discovery and to entertain (in a way!). Please use Infomaid responsibly.


A Work In Progress

Check back often to see the evolution of the project!! Infomaid is a work-in-progress. Updates will come periodically.

If you would like to contribute to this project, then please do! For instance, if you see some low-hanging fruit or task that you could easily complete, that could add value to the project, then I would love to have your insight.

Otherwise, please create an Issue for bugs or errors. Since I am a teaching faculty member at Allegheny College, I may not have all the time necessary to quickly fix the bugs. I welcome the OpenSource Community to further the development of this project. Much thanks in advance.

If you appreciate this project, please consider clicking the project’s Star button. :-)