Building Your Own Formulation AI Assistant – A Privacy-First Approach for Formulators (Or: How to Make Friends with a Computer Without the Hassle)


By Philip Riachy

Look, I get it. You became a formulator because you love chemistry, not computers. The periodic table makes sense. Your computer’s error messages do not. You can explain micelle formation in your sleep, but the phrase “cloud computing” makes you irrationally angry because clouds are made of water droplets, not data.

But here’s the thing: artificial intelligence tools have gotten so good at helping with formulation work that we can’t ignore them anymore. And before you click away muttering about privacy concerns and subscription fees, let me tell you about a solution that keeps your proprietary formulations locked safely on your own computer while costing roughly the same as a fancy coffee maker. No cloud servers, no monthly fees, no sending your trade secrets through someone else’s computers.

This is about running your own AI assistant locally—meaning on your actual, physical computer that you can unplug and throw in a lake if you want to. (Don’t do that. But you could.)

Why You Should Care (Even If You Don’t Want To)

Remember when you used ChatGPT or Claude and thought, “This is actually helpful for brainstorming formulation ideas”? And then immediately thought, “But I’m not pasting my actual formulas into this thing because who knows where that data goes”? That’s the entire problem.

Commercial AI services are like chatty lab assistants who happen to tell everything to their 50 closest friends. Sure, they’re helpful, but you can’t trust them with your secrets. A local AI is like having a brilliant but antisocial lab assistant who lives in your basement and never talks to anyone else. Weird? Maybe. Private? Absolutely.

Plus, there’s the money thing. Cloud AI services charge you every time you ask a question, like a very expensive Magic 8-Ball. Once you’ve paid for your local setup (which we’ll keep reasonable, I promise), you can ask it ten thousand questions and the only extra cost is the electricity. Your power bill goes up by roughly the same amount as leaving a light bulb on. That’s it.

The Technology Stack (Don’t Panic, I’ll Explain)

Here’s what you need, explained in chemist terms:

The LLM (Language Model): This is your AI brain. Think of it as your reaction mechanism—it’s what actually does the thinking work. Just like you wouldn’t use the same conditions for making an emulsion as you would for saponification, different models have different capabilities and requirements.

AnythingLLM (The Platform): This is your lab bench—the interface where you actually do the work. It’s the thing that makes the scary computer stuff look friendly and lets you have normal conversations with your AI. Someone already built this for you. You just download it and click buttons. This is important because I know you’d rather titrate acid all day than write code.

Your Knowledge Base: This is your collection of everything you know—your formulation notes, ingredient data sheets, that PDF about emulsion stability you downloaded three years ago and swore you’d read. The AI reads all of this and becomes your personal expert system. It’s like having a lab partner who memorized every formulation you’ve ever made and can instantly recall them when needed.

Picking Your Model (Or: Size Matters, But Not How You Think)

Here’s where we talk about computer memory (RAM), which is different from hard drive space, because of course it is, because computers love to be confusing.

RAM is like your lab bench workspace. A bigger bench lets you work on more complex experiments simultaneously. In computer terms, more RAM lets you run smarter AI models.

If you have 8-16GB of RAM (most normal laptops): You can run Llama 3.2 3B. The “3B” means 3 billion parameters, which sounds impressive until you realize the bigger models have 70 billion. Think of this as your summer intern—enthusiastic, sometimes helpful, occasionally says things that make you wonder if they were paying attention in chemistry class. It can handle basic questions about ingredients and simple formulation suggestions.

If you have 16-32GB of RAM (decent desktop or newer laptop): You can run Llama 3.1 8B or Mistral 7B. This is your competent lab technician—reliable, understands context, rarely says anything stupid. It can handle complex formulation discussions and actually understands when you’re talking about stability issues versus rheology problems.

If you have 32GB+ RAM or a gaming GPU (you lucky person): You can run Llama 3.1 70B or Mixtral 8x22B. This is your PhD-level colleague who remembers everything and can discuss seventeen different aspects of a formulation simultaneously. It’s almost scary how good these are.

Now, a critical note: these are “quantized” models, which is a fancy way of saying they’ve been compressed to fit on normal computers. It’s like when you reduce a solution to concentrate it—you lose a tiny bit in the process, but what remains is still highly functional. The loss in accuracy is minimal and totally acceptable for formulation work where you’re using the AI as a thinking partner, not as the final authority on whether your formula will work.

The Computer Stuff (I’ll Hold Your Hand)

The main limitation is RAM, not processing power. Your computer’s CPU (the chip that does calculations) can be relatively old and slow—it doesn’t matter much. But RAM? That’s critical. The entire AI model has to fit in RAM or nothing works.

Quick optimization tricks for when your computer is wheezing:

  1. Close everything else. And I mean everything. Your seventeen open browser tabs about surfactant chemistry? Close them. That PDF reader? Close it. That Excel spreadsheet you’ve been meaning to update since 2019? Definitely close it. The AI needs room to breathe.
  2. Reduce the “context window”—this is how much conversation history the AI remembers. A smaller window uses less memory. For formulation work, this is usually fine because each question tends to be independent anyway. You’re asking “What’s a good alternative to cetyl alcohol?” not “Remember that thing we discussed three hours ago about that thing?”
  3. If you have a gaming graphics card, you can use it to help run the AI. This is called “GPU acceleration” and it’s like discovering you can use a hotplate and a stirrer simultaneously—suddenly everything goes faster and you free up resources.

To check your RAM on Windows: Press Ctrl+Shift+Esc, click “Performance,” click “Memory.” That number at the top right? That’s what you have to work with.

On Mac: Click the Apple icon, choose “About This Mac,” look at Memory. See? You’re doing computer stuff already.

Building Your Knowledge Base (The Actual Fun Part)

This is where the magic happens, and it’s basically just digital hoarding but productive.

Gather every formulation-related document you have:

  • Your formulation notebooks (digitize those hand-written ones if needed)
  • Ingredient data sheets from suppliers
  • That book about emulsion science that you definitely read cover to cover (okay, maybe just chapters 3 and 7)
  • FDA guidelines you’re supposed to follow
  • Your “failed formulation” notes (these are often more educational than successes)
  • Research papers you’ve collected
  • Regulatory documents

The AI will read ALL of this and index it. Then, when you ask a question, it searches through everything and pulls out relevant information. It’s like having a research assistant who never gets tired and doesn’t judge you for asking the same question about HLB values for the fifth time this month.

Pro tip: The AI can’t read your mind, only your documents. If you have a brilliant formulation but never wrote down WHY you chose those specific ingredients, the AI won’t know either. Include your reasoning in your notes. Your future self (and your AI) will thank you.

Another pro tip: PDFs that are basically just images with no actual text are useless to the AI. If you have scanned documents, run them through OCR (Optical Character Recognition) software first. Many free options exist. Google it. Or ask your new AI to explain how to do it once you get it running. (Meta, I know.)

Installation (Easier Than Making an Emulsion, I Promise)

  1. Go to the AnythingLLM website
  2. Download the installer for your operating system (Windows, Mac, or Linux)
  3. Double-click the installer
  4. Click “Next” a bunch of times (reading the agreements is optional, we both know you won’t)
  5. That’s it. No code. No command line. No tears.

When you first open AnythingLLM, it’ll ask you to choose your LLM provider. Select “Ollama” for local models. Then pick which model to download based on your RAM situation from earlier.

The download will take a while. A 7B model is about 4-8GB, so go make coffee, check your pH calibration, or contemplate the meaning of “cosmetically elegant.”

Once it’s downloaded, you’ll see some settings:

  • Similarity threshold (set to 0.7): How closely documents must match your question. Lower means it casts a wider net.
  • Top-k (set to 5): How many relevant chunks to retrieve per question.

These are like your temperature and mixing speed settings—you can adjust them, but the defaults work fine for most purposes.

Adding Your Documents (Digital Filing, But It Actually Matters)

Create a workspace called something like “Formulation Lab” or “My Precious Formulas” or whatever makes you happy.

Click the upload button. Select your documents. Wait while it processes them (it’s reading everything and creating mathematical representations of the content—don’t worry about how, just know it works).

Organize as you go:

  • Regulatory docs: prefix with REG_
  • Ingredient data: prefix with ING_
  • Your formulations: prefix with FORM_
  • Literature: prefix with LIT_

Why? Because six months from now when you have 500 documents uploaded, you’ll want some system. Trust me on this. I’m speaking from experience, which is just a polite term for “mistakes I made so you don’t have to.”

What You Can Actually Do With This Thing

Formulation review: Paste in a formula and ask “What stability issues might this have?” or “Can I substitute ingredient X with ingredient Y here?” The AI will reference your uploaded ingredient data and formulation history to give informed suggestions.

Regulatory checking: Upload FDA monographs or EU regulations, then ask “Does this formulation comply with OTC sunscreen requirements?” It’s not a lawyer, but it’s great for first-pass screening before you pay actual lawyers to review things.

Historical search: Instead of digging through notebooks, ask “Show me all formulations with niacinamide above 3% that were stable at 45°C.” The AI searches everything instantly. It’s like having perfect recall of every formula you’ve ever made.

Learning partner: Ask it to explain concepts from your uploaded textbooks. “Explain polymeric emulsifiers in simple terms” or “What’s the difference between HLB and PIT in emulsion selection?” It’ll pull from your documents and explain.

Ingredient substitution: “What can I use instead of carbomer in a gel formulation?” It’ll search your ingredient database and previous formulations to suggest alternatives you actually have access to.

Important Reality Check (Managing Expectations)

Your local AI is smart, but it’s not omniscient. Think of it as a very well-read lab assistant, not a magical oracle.

It can’t really do math. Oh, it’ll try. It’ll give you numbers that seem reasonable. But small models especially are terrible at calculations. If it tells you a percentage, verify it yourself. Always. The AI is like that colleague who’s brilliant at theory but can’t use a calculator to save their life.

It doesn’t understand chemistry. Shocking, I know, given how helpful it is. But the AI is pattern-matching text, not actually comprehending molecular interactions. It might suggest adding an acid and a base that would just neutralize each other because it’s seen both ingredients in similar formulations separately. Your chemistry knowledge remains essential.

It hallucinates. No, not like your pH meter after you dropped it. AI hallucination means it makes stuff up when it doesn’t know something. It’ll confidently cite papers that don’t exist or give you “facts” it invented. Always ask for sources, and verify anything important.

It needs updates. Your knowledge base isn’t self-updating. As you learn new things, you need to add new documents. An AI trained on your 2020 knowledge won’t know about ingredients or techniques you discovered in 2024.

Keeping Your AI Useful

Once a month, update your knowledge base:

  • Add new formulation records
  • Upload new supplier data sheets
  • Include new research papers
  • Remove obsolete information

Think of it like maintaining your lab equipment—regular upkeep keeps everything working well.

If the AI starts giving weird answers, check your uploaded documents. Maybe there’s contradictory information confusing it. Or maybe you accidentally uploaded your grocery list instead of a formulation. These things happen. I’m not saying they happened to me, but I’m not NOT saying that either.

How to Talk to Your AI (Prompt Engineering, But Less Scary)

The way you ask questions matters. Compare:

Bad: “moisturizer”

Good: “I’m formulating an O/W moisturizer for dry skin with 5% glycerin. Based on my uploaded formulations, which emulsifier systems showed good stability with this glycerin level?”

See the difference? Context is everything. The second version tells the AI what you’re trying to do, what constraints you have, and where to look for information.

Use a system prompt (a standing instruction for all conversations): “You are a formulation chemist assistant. Always cite specific documents when making recommendations. If information isn’t in the uploaded documents, say so clearly. Remind me to verify all calculations independently because you’re terrible at math.”

Break complex questions into steps:

  • First: “What actives are effective for anti-aging?”
  • Second: “Which of these are stable in emulsions at pH 6?”
  • Third: “What’s a good emulsifier system for a formula with these actives?”
  • Fourth: “What preservation system would work here?”

Sequential questions get better answers than one giant question.

The Privacy Thing (Why This Actually Matters)

Running locally means your formulas never leave your computer. But take extra precautions:

  • Encrypt your drive: If someone steals your laptop, they shouldn’t get your formulations too.
  • Back up your data: The AnythingLLM data folder contains everything. Back it up like you back up your lab notebooks. (You DO back up your lab notebooks, right?)
  • Don’t upload what you can’t store: Some client contracts prohibit digital storage of formulations. Read those contracts. The AI is great, but litigation is not.

If you share your computer, create separate user accounts. You don’t want someone accidentally accessing your proprietary formulations because they borrowed your laptop.

The Money Talk

Cloud AI services: $20-50/month for casual use, $200+/month for heavy use. That’s $240-$2,400 per year.

Local AI: $200-2,000 upfront for hardware (maybe—you might already have adequate hardware), then about $3-5/month in electricity.

Break-even happens somewhere between 6-18 months, after which you’re basically using it free. Plus, unlimited queries. Want to ask 500 questions on a Sunday while you’re reformulating everything? Go for it. No usage caps, no throttling, no surprise bills.

What’s Coming (The Future Is Weird)

The AI field moves fast. Keep an eye on chemistry-specific models as they develop. Eventually we’ll have AIs specifically trained on formulation science, and they’ll be even better at this.

AnythingLLM keeps adding features. Future versions might connect directly to inventory systems or analytical equipment. Imagine asking “Do I have enough Cetearyl Alcohol for this batch?” and getting an answer from your actual inventory database. We’re not there yet, but we’re heading that direction.

The Bottom Line

Look, I know this seems like a lot. You’re a chemist, not a computer person. But here’s the thing: this technology is user-friendly enough now that if you can operate a pH meter, you can operate this AI system.

Is it perfect? No. Will it occasionally say something dumb? Yes. Is it still incredibly useful for literature review, formulation brainstorming, regulatory research, and finding that one formula you made two years ago that definitely contained ceteareth-20 but you can’t remember which notebook it’s in? Absolutely yes.

The combination of privacy, cost-effectiveness, and capability makes local AI worth the initial learning curve. Plus, once you get it running, you’ll feel unreasonably proud of yourself for doing a computer thing. Embrace that feeling. You earned it.

And who knows? Maybe computers aren’t so bad after all. They’re basically just very stupid machines that follow instructions extremely quickly—kind of like how saponification is basically just a very simple reaction that happens to make something useful.

Now go forth and build your AI assistant. And when it works, resist the urge to tell it about your feelings. It’s a computer, not a therapist. Though honestly, given some of the stability issues I’ve dealt with, sometimes a therapist isn’t a bad idea.

Disclaimer: I am not affiliated with AnythingLLM, Ollama, Meta (creators of Llama), Mistral AI, or any other companies or products mentioned in this article. These are just tools I think work well for this purpose. Use your own judgment. If you break something, that’s on you. I’m just a chemist who happens to like computers slightly more than average.