Large Language Models (LLMs) have improved in capabilities to an extent
where a lot of manual workflows can be automated by just providing
natural language instructions.
On such manual work is to create custom visualizations. I have found the
process to be really tedious if you want to make something non-standard
with common tools like matplotlib or d3. These frameworks provide low
level abstractions that you can then use to make your own
visualizations.
Earlier to make a new custom visualization, I would open two windows in
Emacs, one for code, other for the generated image. In this talk, I will
show how a powerful LLM could lead to a much more natural interface
where I only need to work with text instructions and feedback on the
currently generated plot. The system isn't perfect, but it shows us how
the future or such work could look like.
I am a Programmer and Machine Learning Engineer who has been in love
with Emacs' extendability from the moment I pressed M-x. Since then, I
have been doing as many things inside Emacs as I can. In this talk, I
will cover a recent attempt at automating one of my workflows inside
Emacs.
Q: Sometimes LLMs hallucinate. Can we trust the graph that it
produces?
A: Not always, but the chances of hallucinations impacting
'generated code' that causes a harmful but not identifiable
hallucinations are a little lower. Usually hallucination in code
show up as very visible bug so you can always do a retry. But I
haven't done a thorough analysis here yet.
Q: What are your thoughts on the carbon footprint of LLM useage?
(not the speaker): to add a bit more to power usage of LLMs, it is not inherent that the models must take many megawatts to train and run. work is happening and seems promising to decrease power usage
Hi, my name is Abhinav and I'm going to talk aboutthis tool that I've been working on called MatplotLLM.MatplotLLM is a natural language interfaceover matplotlib, which is a library I use a lotfor making visualizations.It's a pretty common Python library used a lot everywherewhere there's need of plotting and graphing.I usually use it in reports.Whenever I'm writing a report in org mode,I tend to write a code block which is in Python.And then that code block has usage of matplotlibto produce some reports.That works really well.But at times what happens isI have to make a very custom graph, let's say.And then while I'm writing a report,it's kind of a huge leap of abstractionwhen I'm working on textversus going into actual low-level matplotlib codeto do that graphing.So that's something I don't want to do.Here's an example.This is a graph which is... I think it was madelike five or six years back.And then there are some common thingslike scatter plot here,the dots that you can see here scattered.Then... But there are a few things which, to do them,to make them, you will actually have to go--at least me,I have to go to the documentationand figure out how to do it. Which is fine,but I don't want to do this, you know,spend so much time here, when I'm working ona tight deadline for a report.That's the motivation for this tool.This tool basically allows meto get rid of the complexity of the libraryby working via an LLM.
So an LLM is a large language model.These are models which aretrained to produce text, generate text.And just by doing that,they actually end up learning a lot of common patterns.For example, if you ask a question,you can actually get a reasonable response.If you ask to write a code for something,you'll actually get codewhich can also be very reasonable.So this tool is basically a wrapperthat uses an LLM. For the current version,we use GPT-4, which is OpenAI's model.It's not open in the sense of open source.So that's a problem that it has.But for this version, we are going to use that.
Using this library is pretty simple.You basically require the libraryand then you set up your OpenAI API key here.Then you get a code blockwhere you can specify the language as matplotllm.And then what you can do is,you can basically describe what you wantin natural language.I'll take this example of this data set.It's called the Health and Wealth of Nations.I think that wasthe name of a visualization where it was used.This is basically life expectancy,GDP of various countries starting from 1800.I think it goes up to 2000 somewhere.So earlier, I would try to write code which reads this CSVand then does a lot of matplotlib stuffand then finally produces a graph.But with this tool, what I'll do isI'll just provide instructions in two forms.So the first thing I'll do isI'll just describe how the data looks like.So I'll say data is in a file called data.csv,which is this file, by the way, on the right.It looks like the following.I just pasted a few lines from the top, which is enough.Since it's a CSV, there's already a structure to it.But let's say if you have a log filewhere there's more complexities to be parsed and all,that also works out really well.You just have to describe how the data looks likeand the system will figure out how to work with this.Now, let's do the plotting. So what I can do is...Let's start from a very basic plotbetween life expectancy and GDP per capita.I'll just do this."Can you make a scatter plotfor life expectancy and GDP per capita?"Now, you can see there are some typos,and probably there will be some grammatical mistakesalso coming through.But that's all OK, because the models are supposed tohandle those kinds of situations really well.So I send the request to the model.Since it's a large model--GPT-4 is really large--it actually takes a lot of time to get the response back.So this specific response took 17 seconds,which is huge.It's not something you would expectin a local file running on a computer.But I've got what I wanted. Right.So there's a scatter plot here, as you can see below,which is plotting what I specified it to do,though it looks a little dense.
What I can do isI can provide further instructions as feedback.I try to feed back on this. So I can say,"Can you only show points where year is the multiple of 50?"So since it's starting from 1800, the data points,there are too many years,so I'll just try to thin them down a little.Now what's happening in the backgroundis that everything below this last instructionis going out as the context to the modelalong with the code that it wrote till now.And then this instruction is added on top of itso that it basically modifies the code to make it workaccording to this instruction.As you can see now, the data points are much fewer.This is what I wanted also.Let's also do a few more things.I want to see the progression through time.So maybe I'll do something like, color more recent yearswith a darker shade of...Let's change the color map also.Now, this again goes back to the model.Again, everything below before this lineis the context along with the current code,and then this instruction is going to the modelto make the changes. So now this should happen, I guess.Once this happens. Yeah. So. OK.So we have this new color map,and there's also this change of color.And also there's this range of color from 1800 to 2000,which is a nice addition.Kind of smart. I didn't expect...I didn't exactly ask for it, but it's nice.So there's a couple more things.Let's make it more minimal. "Let's make it more minimal.Can you remove the bounding box?"Also, let's annotate a few points.So I want to annotate the pointwhich has the highest GDP per capita."Also annotate the point with highest GDP per capitawith the country and year."So again, forget about the grammar.The language model works out well.Usually it takes care ofall those complexities for you.This is what we have got after that.As you can see, there's the annotation, which is here.I think it's still overlapping,so probably it could be done better,but the box is removed.
Now, as you can see, the system is...You will be able to see thisthat the system is not really robust.So the GitHub repository has some exampleswhere it fails miserably,and you'll actually have to go into the codeto figure out what's happening.But we do expect that to improve slowly,because the models are improving greatly in performance.This is a very general model.This is not even tuned for this use case.The other thing is thatwhile I was trying to provide feedback,I was still using text here all the time,but it can be made more natural.So, for example, if I have to annotatethis particular point,I actually can just point my cursor to it.Emacs has a way to figure outwhere your mouse pointer is.And with that, you can actually go back into the codeand then see which primitiveis being drawn here in Matplotlib.So that there is a way to do that.And then, if you do that, then it's really nice tojust be able to sayput your cursor here and then say something like,"Can you make this?Can you annotate this point?"Because text is, you know... There are limitations to text.And if you're producing an image,you should be able to do that, too.So I do expect that to happen soonish.If not, from the model side, the hack that I mentionedcould be made to work.So that will come in in a later version, probably.Anyway, so that's the end of my talk.You can find more details in the repository link.Thank you for listening. Goodbye.