Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How's the current state of hiring in the LLM field?
76 points by boredemployee on March 5, 2024 | hide | past | favorite | 72 comments
How are the hiring trends in the LLM field currently?

Do you think they will grow in the coming years like it did for data scientists?

Is it worth entering the field today starting from scratch?



I'm a DS with ~10 yoe mostly in classical ML and I've been wondering how much/how fast I need to pick up "high level" LLM skills. I've seen the way the winds are blowing wrt to the DS job title and so have been moving towards more of engineering roles. But it seems like every job posting right now for ML engineering positions is asking for "pytorch, DL, LLM" experience, even for companies where getting into the nuts and bolts of inference would not really be a core value prop. Like I really find it hard to believe that financial/insurance companies (my primary domain) are building and fine tuning their own models for chat/RAG applications or whatever (or if they are, should they be?). I haven't really seen huge value add from even the most sophisticated RAG setups yet.

I use LLMs every day for code assistance so I'm not totally naive to the whole thing, but I seriously wonder what direction to take my career, exactly. Should I be going all-in on learning CUDA? Is knowing how to wire up a few API calls enough for "soft" applications? Should I in fact be keeping up with the latest RAG techniques (which seem to change every time a new foundational model gets released)? Should I just stay out of the whole thing and double down on classical ML applications in my domain?


>> Should I in fact be keeping up with the latest RAG techniques (which seem to change every time a new foundational model gets released)? >> Should I just stay out of the whole thing and double down on classical ML applications in my domain?

Thank you. Both questions are what I'm wondering right now.


If you want my personal opinion, I think the hype around RAG techniques is mostly a fad. I don't see how it becomes a super broad high value technique that your average DS or ML Engineer is getting paid big bucks to know. It's also in danger of becoming totally obsolete with further LLM advances (Claude 3 already really good at needle in the haystack tasks). Not something worth spending a ton of time on unless you're basically already knee deep in that kind of problem space. I think the only reason it's trendy right now is because LLMs are hammers looking for nails and semantic Q&A is one of the few identified nails.

I'm pursuing of a strategy of:

* Doubling down on domain knowledge. Even if coding becomes completely automated, there will always be value in knowing some particular corner of an industry very well. For me, that's insurance. This includes knowing how classical ML techniques can be applied in my industry (LLMs are not replacing xgboost any time soon).

* Keeping tabs on bleeding edge LLM capabilities, just to have a sense of what's possible. However, staying out of highly technical LLM implementation details like CUDA, inference, batching, etc. Perhaps fine tuning will be valuable but probably I would focus on understanding "fine tuning as a service" offerings rather than rolling my own.

* Investing in knowledge of infrastructure/orchestration of LLM services in a way that doesn't tie me to any particular problem space. It's hard to predict right now what kinds of new products LLMs might enable. But I think there will be value in being able to translate business problems into a high level architecture for an LLM-enabled app, even if that just means wiring together existing service offerings.


> It's also in danger of becoming totally obsolete with further LLM advances (Claude 3 already really good at needle in the haystack tasks)

Maybe, but for every haystack that an LLM can search, RAG could search an even larger one. I don't think RAG is a killer app but I think tools that help LLMs find the right context will continue to be useful.

> Doubling down on domain knowledge

I'm a dev, but this is my main strategy right now. I am digging deep into my corner of fintech and hoping that will stave off some of the disruption I see coming for tech. Personally, I don't see why an AI couldn't also become a domain expert in my niche, but it should buy me some time and potentially align me with some powerful stakeholders (portfolio managers just need to convince the big money that a human touch is good and I will still be business).


Excellent advice.

Question: in what ways does insurance domain knowledge set you apart?

I understand certain industries benefit from Engineers with domain experience (e.g. Healthtech companies needing HIPAA knowledge).

I'm curious as to what particular skills insurance companies need from Engineers?


For pure SWE I don't think I have a great answer for you, but on the ML/DE side there are a lot of industry-specific problems, especially on the P&C side, it helps to have context and experience with:

* Rating (a deep, classical, tabular ML problem) is still catching up to SOTA techniques in many insurance companies + uneven regulatory environment makes it tricky to navigate.

* Marketing is another classical ML problem, the goal is to optimize ad/lead broker spend/conversion funnel.

* CS of course has huge opportunity with LLMs.

* Automated claim fraud detection is a very difficult problem amenable to LLMs/MM DL models. Lots of "pivot to AI" going on here.

And there are others of course. The main thing is that insurance companies have limits on product innovation - policy terms are in many cases mandated by law, so the main differentiator is operational efficiency and exellency. Solutions that move the needle on big revenue/cost levers are very valuable and can be applied widely.


Yes. I agree with everything you said, and "infrastructure/orchestration" in the LLM space, seems to be what some ppl are already calling as "LLMops".

Thanks for your input


I feel like there's a fuckload of hype and froth in whatever the LLM market is. Lots of money getting dumped into it and not a lot to show for it.


lot of investment being pumped into the shovel makers (nvidia), and the shovel sellers (csps).

those panning for gold (app devs, startups etc) may or may not find it. remains to be seen and i remain skeptical.


Yeah seems like more of a shovel-rush than a gold-rush


That's a good way of putting it. In my experience so far everyone is very excited to build LLM-enabled apps but outside of some customer service deployments (mostly at the megascale enterprise level), I haven't seen many actually successful efforts reported. Even the customer service bots always end up roundly mocked on social media once someone figures out they're talking to an LLM and tells it to pretend it's a fairy princess.


Yeah I was in the same boat coming from classical ML. I am just doing MLOps/DE these days. Although I have been down levelled to almost junior level after having 10+ years of experience. Struggling a bit with being treated like a junior again! Much less autonomy to pick my own task.


I'm going through a similar transition right now. I wouldn't say I've been downleveled but I left FAANG to go to a startup (pay cut) where I've been able to transition pretty fluidly from "Data Science" to "MLOps + some backend SWE + some infra + some actual ML Engineering (non-DL)". Aside from the pay cut (which was not the end of the world, think high 200s to low 200s), it's worked out very well, and I'm getting the experience that I think will allow me to rebrand successfully.

As to whether that rebrand itself is successful, I don't know. It feels like the DS role is undergoing a maturation where different skillsets are being cleaved into different roles. I was always a more natural fit for the Ops side, I don't have the academic creds for a research-oriented position. But I'm not sure how much of ops means "LLM ops".


> I seriously wonder what direction to take my career, exactly.

same. future looks very uncertain. on one hand, the new figure of "ai-assisted x" where x can range from developer to copywriter to a whealth of other primarily creative job will replace the non ai assisted variants due of the tangible increase in productivity

on the other hand the reduction in these cost possibly means 5x less capital requirement to build a startup, whose primary cost today is personell, at least in their early stages.

wether will that increase in company creation be enough to raise the demand for personell to a level where the society is stable I don't really know.

the primary risk I see is not from career prospet but from the massive capital investment that is required to be in the club of model providers, these will be the new elite, with unparallel power over economies and entire states. if you think the robber baron period was harsh, wait until three companies are ingrained in every aspect of your life from bureoucracy to healthcare and having to have to argue with a self moderated model instead of a bureaucrat, and if that's not enough imagine that maybe you have a uncommon or suggestive name and the model won't believe you no matter what proof you might give it. imagine these model being part one way or another in the production of entartainment, generating a monoculture that is extremely prude and milquetoast. or in general the model being wrong about anything really, you cannot reason with them.

my concern is that there's a lot that should be happening right now at a legislative level, not in term of stifling innovation, but let's say putting certain role outside of the sole control of AI, which it isn't, and if we wait to be reactive about it, it's gonna end up like it did for self drivin: blood on the road and I cannot have my car acting as a taxi and earning for me during off hours because for safety reason only the tech owner can profit from it, yeah, right.


> I've seen the way the winds are blowing wrt to the DS job title

Can you elaborate on this?


I mention this in another reply but tl;dr the DS role is maturing and getting cleaved into different specialist skillsets. Research, ML Engineering, MLOps, Data Engineering, Analytics are what I see as the main ones. In 2016 you could get hired as a Data Scientist knowing some SQL, XGBoost, and how to use git (I know I did). That's not really the case in 2024, you are expected to be an expert in at least one of those areas and job titles are starting to reflect the specialization. Companies that are still hiring for generic "Data Scientist" positions with scikit-learn in the job description are not places you want to work.


The questions is how deep should you go of each subset ?


definitely shifting to more engineering-focused type of roles so at the very I'd work on swe/infra skills


The work seems to be exclusively about using openAI apis and hacking response pipelines together, but the hiring processes are focused on getting ML specialists.

So yea, it’s weird


I think at least part of it is that many people creating the job postings literally don't know the difference between a machine learning specialist and an "AI Engineer" who is just calling an API. One thing that is confusing is that with the powerful models that have come out in the last few years, someone using the right API and maybe a good prompt can actually handle many tasks that three years ago would have taken an ML engineer months to train a custom model for.

I have been saying I'm a "software engineer with a recent focus on generative AI". Half of my clients are convinced that they need to train a custom model in order to solve their problems at all. Because they typed a request into ChatGPT and it didn't do exactly what they wanted. But usually if I just do an API call with an actual system message and temperature 0 then I can demonstrate that they don't need a custom model.

There are good reasons to want custom models, but those are probably phase two or three of most projects. And 100 X as expensive and complex. Phase one for many projects will work fine as a proof of concept using an API call or two.

I think there will be jobs for ML specialists that know how to do fine-tuning of common models. But you will be competing with people like me who don't have an ML PhD. But beyond that, it seems that you have to build specialized architectures that can compete with the general ML architectures built by leading edge researchers at huge companies. And it's becoming increasingly difficult to invent a new architecture that is truly better in a niche.


> The work seems to be exclusively about using openAI apis and hacking response pipelines together

And ironically, this is not going to be a marketable skill for long, in any scenario. If the hype train crashes and burns, this is no longer a useful skill. But if the hype is real, an ML model will be able to do this sort of thing for you.


LLMs produce useful work right now, it doesn't need a hype train to keep going.

OTOH ML models seldom "do" much of anything on their own, it needs integration work to make it matter.


I think a very large number of people have already experienced significant results with LLMs. I don’t see how it’d be hype tbh (unless for the fringe ppl expecting AGI)


I mean the baseline value is already there in removing writers block and helping you get started with writing, listing obvious things from whatever topic you ask about, and basic and boilerplate code.

What's missing is any novel insight and real high quality writing. I guess what I'm trying to say is if LLM development totally stopped tomorrow I would still use it for generating a bunch of ideas and volume of writing to then pick from.

It's like being an editor vs being an author, and tweaking text output is easier than starting from scratch.


If you find LLMs interesting, then by all means. But you shouldn't focus on LLMs as a path to career success.

In most cases business just want to use an LLM but don't need any expertise in how they work. 80% of use cases are "I want to write an English sentence to make a SQL query". With a little bit of fine tuning or RAG you get there.

Being a data scientist who understands LLMs would be far more likely to be a good career move.


As an outsider (offering prototyping project help to clients), the main thing I am seeing at the moment is AI-knowledge experts as project managers. As in: a company wants to go about an AI project, and needs someone who can guide them through the jungle of tools (services, budgeting, operating costs) as well as figuring out potential partners (individual developers to AI-integration companies). People on the inside of these companies aren't trained 'enough' to fully grasp the status quo, what's possible or not possible, where the risks currently are and where it is all heading, despite being super eager to be involved on those things. That's at least what I am hearing, less actual AI-development work, more "how to get AI into our business" work. Just my 2cts anyway.


This is my experience - but it is definitely a “teach not tell” approach needed - everyone wants to learn from the junior dev to the board and they need convincing people to believe


> Is it worth entering the field today starting from scratch?

Here's a personal anecdote: I'm only really good / competent in technical areas that truly interest me.

In technical areas that I've pursued only for career / money reasons, my performance has been sub-par. Especially when compared to that of developers who are truly interested.

I don't really know how much my experience generalizes to other developers. Maybe it's related to my ADD.


People with ADHD find forced motivation doesn't have the staying power that self directed interest does. Your entire comment is a textbook description of ADHD.


This is also what it's like to be a normal person


Thanks for posting this. I find a lot of self-described ADHD folks seem to describe fairly normal ennui that makes up life. My wife talked her way into a diagnosis (innocently, and she hasn't taken the medication) and she has never had any issues with aptitude: multiple 4.0 degrees, self taught software career, ultra runner etc.

I think folks are conveniently lying to themselves, and I'm not here to ruin anyone's party but I do get concerned in parenting circles when this kind of over-optimizing of normal laziness gets sold as childhood ADHD.


It's me to a tee, and I don't have ADHD. (At least I don't think so oh god vyvanse me up doc)


The difference is magnitude, this is a dismissive comment that is harmful for people with ADHD.


It's a dialog, you can't push ADHD diagnoses and then get upset when other folks don't agree with your view of the human condition.


You other comment isn't a dialog either. Both of you are furthering your personal narrative that ADHD isn't a real condition. The argument that someone is high achieving therefore they don't have ADHD is a common illogical trope.

ADHD is as real as diabetes or arthritis. If that disturbs you, it says more about you than it does about ADHD.


Looks like the line between Data Scientist, ML Engineer, AI Engineer are blurred with the onset of LLMs/GenAI (especially if you were previously working on classical NLP). Absolutely worth entering the domain, it's still day 0, but be prepared to keep yourself updated, unlearn things fast and pick up new tricks.


We have open positions for putting LLMs to use (won't link, not a work account). Seems difficult to hire people with sufficient knowledge of our niche and of LLMs.

Whether they will continue to grow is hard to say. Putting open models to productive use is difficult. Like stable diffusion can make a pretty image but can it make the image you want? For example, can it make a 2d video game sprite, can it make the same character consistently in different poses.

That's a personal project problem I have but it is the same vibe as LLMs with code. Like they can produce some code, but a more complex algorithm or larger program is tough. I'm not sure if there will be an upper limit on what they can do. Right now it is a lot of engineering to get them to do stuff with no human in the loop.

Not a lot of companies are training models. There are probably a number that are trying to integrate OpenAI's API in some way.

They can do powerful things and it is a good skill to learn, but it feels like the market right now is for PhDs from top schools or low paid data labelers. Maybe others will have different experience.

For resources, check out [1], [2], and [3]. The third being my least favorite, but others like it.

1. https://karpathy.ai/zero-to-hero.html

2. https://deeplearning.ai

3. https://fast.ai


Thank you for your valuable insights. I am currently working towards becoming an AI engineer in the LLM field and appreciate the insider's perspective.

Could you provide more information about the knowledge and experience your company seeks for that position? You mentioned it's challenging to find individuals with certain skills. Could you elaborate on what those skills are?


What would be the title of these positions? Something like "Prompt engineer", or more like fine tuning open models?


Machine learning engineer. Other titles have AI engineer in them. Roles encompass everything needed to try to get open LLMs to do a specific task, so hosting, evaluation, prompt tweaks, writing harnesses. We do some data generation and fine tuning but it is less common.


Prompt engineering was novel 15 months ago lol you can’t still be on that a high schooler or another LLM can do that


For every 1000 prompt engineers who are doing little more than hooking up an API to a front end and a bit of prompt engineering, there's really only 1 decent engineer who can put together things like RAG and fine tuning properly.

I'm yet to see a truly strong LLM application where the results are not significantly better than a smart person with chatGPT open and a bit of patience to craft some prompts.

What are the most impressive things available that are not just chatGPT?


Kind of a specialized purpose, but I'm working on a gpt that takes an image prompt, creates the image using dalle3, generates an etsy title, tags, and description, and trying to figure out last mile bits which will be: upload as a digital download on etsy, and upload to printify/printful then you can push that to etsy...

I'd also like a way to take a printify listing, and create duplicates for variations, like design 1 on a unisex tee, unisex hoodie, toddler tee, kids tee, and baby onesie - its very tedious to do this now...

Some of this I can do via python or php and api's, but current workflow is Dalle-3 on bing -> canva for fixes (bg removal and upscaling) -> {printify -> etsy}|{etsy} ...with some gpt back and forth for title, tags, description.

If I can say "create a pumpkin pie vector graphic in the shape of pacman for thanksgiving party decor" and have it push title, tags, picture (upscaled picture) to etsy on step 2...that saves me like 15 minutes or more per listing.


How do you avoid generating a mountain of low quality


Same problem as data science jobs have had for the last decade:

Job requirements are for applied researchers.

Job work is Excel/OpenAI wrangling.


So, the same as software engineers? Job (interview) requirements are writing infinitely scalable systems and memorizing the implementation of some sort you haven't used since undergrad, actual work is attaching 20 user analytics properties to every button click and gluing legos/APIs together.


Some people are promoting the title "AI Engineer", which I think a sensible title... kind of like "full stack developer", not defined by a single technology, but requires knowing a number of technologies (LLMs especially) and being able to put those together into a working product.

Having worked with LLMs a lot I would NOT agree with people who call them "just another tool". Some people are experts working with cryptography, or creating scalable distributed systems, or working on network protocols, etc... these are real and specialized skills. Working with LLMs is similar. Starting from a foundation of "can get stuff working" (full stack) is awfully important, and I don't think LLMs reward narrow specialization.

Also prompt engineering is real, and I don't think it's going anywhere. Working in collaboration with domain experts to do that prompting is essential, but a lot of output issues benefit from a combination of both prompting and changes to pipeline, text processing, and other code-based approaches. There's real benefit to being good with words, a close reader, able to get in the head of the LLM, learning its fixations and misconceptions, templating thought processes, etc. Ideas that prompt engineering will disappear with better models and fine tuning are, IMHO, naive and misunderstand the interplay of prompt and LLM. If you want to get something from an LLM you will still need to know how to ask!


I'm very skeptical about the possibility of building an application using LLMs that meaningfully adds to the utility of the base model and isn't piss easy to copy. If your product is just a system prompt you are adding very little value.


Sure, a system prompt alone isn't much of a product and is only the barest of what you can do with LLMs. But there's _so_ much more you can do with an LLM. Developing something more complex is what the position should be about. Integrating it into workflows is the most obvious example, and the only way to do that is just a bunch of integration work, like software engineers do.


I'd be pretty concerned about integrating a tool like LLMs that are known to be flakey into a more complex product. Seems like we're asking for trouble there.


People make mistakes and "workflows" typically involve people, so both the LLM and human output often needs review. Which is complicated and involves a lot of UI. More engineering work!


i see many startups deeply understanding end-customer-workflows and usecases, and then experimenting with how LLM may improve that.

customer-service, code-assist, call-center are a few areas which show early promise wherein customers are willing to pay for the added value. outside of these areas, i am yet to see breakthrough applications for which people are willing to pay. let me know if this is mistaken.


I don't think there is any real demand for high end LLM researchers/mathematitions etc. who would really build LLMs.

On the other hand, only a small amount of companies have the money, interest to actually at least finetune LLMs. They have their people already in place.

Than you have people who should integrate LLMs and tbh thats something everyone can do who was able to integrate any other api. I don't think there will be middle class companies looking for LLM experts.


I don't think there is any real demand for high end LLM researchers/mathematitions etc. who would really build LLMs

You couldn’t be more wrong. Such people make 7 digit salaries, either at FAANGs or at unicorn startups. It’s the hottest job in ML today. Though very few people can actually build a GPT-5 level model.


Yes, but that is definitely a winner-takes-all type position. Top 0.1% get the 7 figure salaries, top 1% get a good job -- not sure about the other 99% (numbers for illustrative purposes only).


It's more like top 10% get 7 digit salaries, and the remaining 90% get by on upper 6 digits. Keep in mind there are probably only a couple thousand of "high end LLM researchers/mathematitions etc. who would really build LLMs" in the world.


What would the job titles be for these types of jobs?


ML Researcher or ML Scientist.


Unless you're talking about research, LLM's are considered a tool, not a new field.


I'd argue the work field is developing:

- UX improvement / service delivery

- Reliability / QA

- Implementation


I have mostly worked in creating automation tools in Python(scrapers, bots, etc) and since the inception of chatGPT I have tried to use LLMs in existing solutions; obviously for automation purposes. Like recently I integrated a few customGPTs with web applications as currently GPTs do not support monetization nor you can distribute to others(unless the other person have a GPT4 account). If you opt for this route then there is definitely a need. Frameworks like CrewAI and langchain agents are going to help businesses to automate their internal processes.


In the broader field of machine learning (I would leave Artificial Intelligence for the marketers) don't think there are as many jobs as much there's hype.

A handful of companies are doing LLMs. Rest are doing internal fine tuning of available models.

LLMs have proved themselves not to be that reliable that they can be left in production alone.

Hence is the adoption. Limited to areas where they don't have significant impact such as "Change the tone , rewrite, summarize" context menus in many products.


Are you asking about creating LLMs or using LLMs to create apps?


The growth and opportunities will be at the application layer. Training and fine-tuning LLMs doesn't really have that much value. Closed source (OpenAI) models and Open source models will be the foundation. As I said running some training/fine-tuning on top of these models and knowing how to do evaluations wont have that much value. Learn how to build useful applications on top of these things instead.


The most successful use pf generative AI is a chatbot… once you can use a model with natural language you need less rich UIs. I think UIs will become simpler and value will be in the proprietary data you can manage to grab to adapt existing models to some valuable use cases.


chat is the new ui..imagine going to a bank website and you just enter in the chat - I need to view transactions and pause my debit card. It may then create an action list, and take you to the two links in order, without needing to find a button, or link. That saves a lot of time... In the future with AGI/SGI, though... well -- AI can be the entire app... you want a tax software with the look/feel of quickbooks or same exact functionality no problem... you want a 3d ai avatar that has all those things but interacts via speech, just ask for it... you want a VR version of the original 8bit legend of zelda - just ask.

Any software will just be created without any care for frameworks, platform, hosting, etc... it'll be hosted in the 'brain' of the ai.


>> Training and fine-tuning LLMs doesn't really have that much value.

I'm really not sure about that... seeing people talking about LLMops (as a role) and things like that. Well, let's see.


My first full time role was to build chat bots using RASA, Dialogflow and some other similar tools. Now the past year or so, it has largely become a GenAI role, making bots, and other exciting tools. So far things are kind of great in my current role, albeit people tend to have unrealistic expectations of these models.


Hiring in the LLM field varies, but there are opportunities if you're interested in law. It's hard to say if it'll boom like data science. But if you love law and are ready to learn, it could be worth it.


There is certain to be good money made in applying all this to business use cases.

…but not sure that necessarily makes it a good foundation for a CS career. Especially with much of this moving towards almost no code prompting territory


“Do you think they will grow in the coming years like it did for data scientists?”

Maybe. But then the demand might also shrink quickly after a couple of years. Is it worth that risk? Only you can decide.


AI Engineer is going to become about as descriptive as "full stack engineer"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: