Using AI - Costs and practicalities.

sjb007 · July 3

Hi all,
I see a number of people are using AI to contribute to Bonsai. I've just been playing a bit with Github's Copilot, and have been surprised at how effective it was. I have two questions that arise after my brief experimentation, that maybe the heavier users could answer (I'm looking at you @Gorgious and @theoryshaw):
1. Are you funding your usage? If so, what sort of costs/month are you hitting? (I'm just testing with the free monthly allowance of 1500 credits, but that isn't going to go very far.)
2. It lets you generate quite a lot of code very quickly. How much review are you doing, and do you have a good grasp of that code before submitting?

I've created one fairly trivial PR that I merged because it was easy to understand. I'm more hesitant with this one because while I tested it and got the AI to correct issues, there are quite a few lines. From a quick scan of the code and testing it all looks OK, but is that enough to go ahead and merge? Or should this be in BonsaiPR for a spell first?

I think what I've realised is that we could be dealing with a tsunami of mostly AI written contributions (which is potentially positive), but that it's vital that they are well tested for correctness.

theoryshaw · July 3

Yes, it truly is unbelievable.
1. Yes. Claude $20/month
2. To be honest, not thoroughly until I push to the main, which I don't do much anymore. Most of my work, in the last couple months, are still PR's. I battle test them daily by using the BonsaiPR bleeding edge build. I've found myself having to resolve a lot conflicts in order to keep these PR's mergeable with the main branch. Which can be a little time consuming. Luckily, i've created a couple prompts to help with that. https://github.com/falken10vdl/bonsaiPR/tree/main/prompts and the logs here: https://github.com/falken10vdl/bonsaiPR/tree/main/logs

sjb007 · July 3

$20/month doesn't sound too bad. Are you getting a lot of usage, and do you hit the limits?
So many options!?!?!

Gorgious · July 3

Hello. First let me preface by saying this is very much biased by my personal experience.
I've been experimenting with different AI coding agents for the last 4 years of so. There has been a real shift at the start of last year where commercially available solutions went from "AI can write scripts to do specific things" to "AI can write entire softwares A-Z". It doesn't write good software out of the box though, and it still doesn't. It's not designed to AFAIK.

I am using Claude Max (180€/ month), not specifically for Bonsai, I'm developping other applications in-house for my business, and have shipped a few in the past year. I have tested a few other solutions and found this one to be the most reliable. I think about how much I bill one hour of my time, and I think about how much this would have taken you if I were to do it manually. For "hobbyists" that doesn't apply of course. Most solutions offer a free tier that lets your test it for modifying a few functions here and there, you can test and see which one suits you most. Do NOT judge it by how many lines it can spit per hour, but rather 1. how well it understands what you want it to do 2. how many times you have to tell it to not mess things up 3. how consistent the results are 4. how easy or hard it is to change a feature or create a new one 5. how well it respects current architecture 6. how easily you can steer it into using your tools 7. ??

Regarding your question about the second PR I would advise to not merge it if you're not sure about what you're doing or confident you can fix related bugs. If the agent finds a bug while exploring the codebase and working on a PR you need to consider cautiously if it is reasonable to ship it alongside. Also PRs should always include test coverage to 1. ensure you're not breaking existing tests, 2. update existing tests if you modify the contract 3. add tests for new features.

If you're experimenting with AI for PRs I would also suggest first practice on separate PR and merge only after either getting some feedback or battle testing it on a live Blender environment.

If you want to ensure your agent does what you want it to be, you need to restrict it as much as possible, otherwise it will start to either hallucinate or gloss over things and make false assumptions in the way code is modified. Prompts are nice but they guide the agent only on its initial phase. Skills guide the agent from start to finish, and add additional hard restrictions. I use a couple ones like /plan-feature /refactor /bugfix /check etc. I think we also need ot update AGENTS.md to add a bit more restrictions like mandatory test coverage ,etc. Will look into it. The quality and CI gates (that are NOT AI driven but regular tools that are partly integrated in the codebase) particulary are mandatory if you want to maintain code quality and clean architecture.

Cheers

sjb007 · July 3

Claude app: No free tier that includes coding as far as I can tell.
Cursor app: Gave it a task. It gave most of what was needed, but quite a few bugs and things not working right. It also blew >50% of free monthly allowance. So much in fact that it refused to fix its own mistakes because there wasn't enough left. LOL I'm currently manually working through those. Identify, understand, fix.
Copilot on Github: It worked well, corrected things under direction. Used about 40% of the free tier on the two features in my first post that I gave it to work on. Haven't tried adding it to VSCodium yet.

My lack of experience with AI means I might be burning tokens unnecessarily, so my usage might be atypical.

The bit I really have no idea over is how much bang per buck you should get. I know Copilot recently changed how they bill, so heavy users got the shock of their lives when the new predictive estimates told them their previous monthly usage would be ten of thousands of dollars. Claude and Cursor have low tier's starting at $20. Is that a useful amount of work? Based on what I've done with Copilot, $20 worth of credits will not go very far. Maybe 4 or 5 modest Bonsai features/bugfixes. Is Copilot just outrageously overpriced?

Gorgious · July 3

Well really I feel like can't offer good advice here because I have a business-oriented reflexion on the pricing. Depending on your trade and the criticality of the piece of software $20 might be between 1 minute of your time and half an hour of your time or something like that. I think also it helps to reframe the thinking not on "How many PRs and features can I pump out" but "What value will it bring to me and my team when the feature is done, integrated, bug-free, accepted by end users". You can target hundredfold return on investment but you also have to take into account your own time in guiding the agent, which is after all "just" a tool. I have my own opinions about which ones are the best and which ones are scam-adjacent, but I'll keep these ones for myself as I don't want to steer conversation in this direction.

If you want to reduce token consumption, try to not drag out conversations for too long, be focused, and to the point. Use skill files, they can be shared between different AI agents. Some agents feature so called 'exploration agents' that are just good as exploring the codebase and giving a summary. eg here's how I would approach your ifc autosave issue "Investigate how autosave feature works in vanilla Blender, and report on how we would create a similar feature for the sidecar ifc file we use when working with Bonsai. and this is where knowldege and experience comes into play Take into account saving ifc files may take a lot of time if the file is big, especially on a NAS or cloud folder. For information, Blender puts autosaves in a temporary fodler of the computer. Explore throttling the save file so users don't experience hiccups every few minutes. Explore how video games solve this problem. Integrate a new entry into the File > Recover Blender menu that will parse the ifc autosaves folder. Deliverable is a detailed implementation plan, including the test suite and a green quality pipeline."

Again, not saying it's the way to go or the way to think about it, just sharing my personal workflow. Do your own research, and see how it integrates in your workflow.

theoryshaw · July 3

Are you getting a lot of usage, and do you hit the limits?

.. it's not too bad. Put it this way, i'm actually glad when it runs out, then i can do billable work. ;)

sjb007 · July 3

😆
Step 2: Make an AI to do the billable work for you...
Step 3: Profit!

carlopav · July 3

Step 4: lose your job because you got replaced by AI

sjb007 · July 3

I think Ryan is his own boss... which just means he'll be the one to turn out the lights for the last time.

Gorgious · July 6

Yup, this is a really delicate topic, we need to be conscious about the impact it has on our lives and the lives of others. I just want to add, for resilience, I suggest to NOT integrate AI into your actual production workflows de facto. There are publicly available APIs for almost everything now, that offer highly performant battle tested features. For instance don't use an AI agent as a glorified search engine to look for public tenders in your region, rather use it to build an app that will call the relevant available APIs and aggregate the data. Cheers

theoryshaw · July 6

Ha, funny thing, coincidently I just noticed a major usage acceleration on my account (and i'm not even using 'Fable'). Guess it was only a matter of time. It's listening. :)

sjb007 · July 6

I think there's a definite lack of understanding on my part on how to make efficient use of AI. In my first experiment I basically said "fix this bug". I think it maybe cloned and analysed the whole IfcOpenShell/Bonsai code base consuming a massive amount of credits for a small change in a single file. When I was more targeted with my requests it was much more parsimonious/thrifty. When I just paste the error and ask my dumb questions it is very effective at clarifying, which accelerates my effort by me not getting bogged down figuring out tangential stuff.

Using AI - Costs and practicalities.

Comments