Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Typst has been pretty amazing, and at my organization, we’re very happy with it. We needed to generate over 1.5 million PDFs every night and experimented with various solutions—from Puppeteer for HTML to PDF conversions, to pdflatex and lualatex. Typst has been several orders of magnitude faster and has a lighter resource footprint. Also, templating the PDFs in LaTeX wasn’t a pleasant developer experience, but with Typst templates, it has been quite intuitive.

We’ve written more about this large-scale PDF generation stack in our blog here: https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes



This is a really great write up. Kudos for the obvious effort, both on the technical side and sharing the process with the rest of us.


Never heard that someone is generating PDF documents at that pace. I'm working on a product that is used for mass PDF reporting based on Puppeteer. With nightly jobs, caching, and parallel processing, the performance is ok.

https://www.cx-reports.com


I think we don't reach that much quantity, but we do a hefty number. What for? Invoices!

We're having problems because until now PDFs are being generated by the ERP system, and it can't keep pace. I know there's a Dev team working on a microservice for PDF generation, but never thinked about doing it with Typst.

I think I'm going to send them @mr-karan link.


pdf/ps can easily be created in a way that data for text and qr code fields are easily in plain text. seems like yall focusing too much on the higher level tools instead of what's right in front of you.


Company branding is an important aspect of PDF creation that many tools struggle to handle correctly. PDF documents often need to include logos, company colors, fonts, and other branding elements. Puppeteer is popular because you can control these aspects through CSS. However, Puppeteer can be challenging to work with for larger documents, as each change requires programming effort or when your software needs to needs to serve multiple clients each with different requirements.


Yep, it's mostly about branding and control. It needs certain concrete layout and logos, and has to be relatively easy to change them.

We also render shipping labels in PDF, and we have to be VERY strict with that. But we're still not touching that, as that process is not at slow and problematic as the invoicing one.


how having the value and name in an invoice as plain text affect branding? or do you mean there client branding is added to the invoice??

the end result when you open the file is still a regular pdf. it's just encoded with some areas unpacked


It's not about the plaintext, but about the layout, design and logos. At least in our case.


so why you bring that up on a thread that had nothing to with layout? I'm honestly confused


Have you tried reportlab as well? It was a good solution when I had to deal with a similar problem many moons ago. Not quite the same volume you have but still.


Having used ReportLab a bunch, I'd agree it's a good solution, but not maybe on the more mediocre side of good. Generating LaTeX was a better solution for me, and while I haven't used it, Typst looks a lot better.


Just wondering: did your organisation contribute anything back to the project, or supported it financially in any way?


Hey. Yes we did support them financially from our FOSS fund. We’ll be happy to do it once more as well.


What is the use case for generating that many PDFs?


Regulatory requirements mandate that. Stock brokers in India are required to generate this document called “Contract Notes” which includes all the trades done by the user on the stock exchanges. It also contains a breakdown of all charges incurred by the user (brokerage, various taxes etc). And this has to be emailed to every user before the next trading session begins.


Does the law specify PDF? I would have thought pain text or even HTML would be sufficient.


I don’t know the situation in India but brokers in Austria and Germany do the same. The law does not stipulate the format but PDF is what everyone uses. I assume it’s because it can be signed and archived and will outlast pretty much anything. You need to keep these for 7 years.


Yes, in India, the law mandates that ECNs (electronic contract notes) need to be digitally signed with a valid certifying authority. While it's true that XML/docx/xls files could also support digital signatures, but I think PDFs are prevalent and also allow clients to verify this on their end, quite easily.


PDF is less likely to contain executable malicious code than other formats.


Is it? More so than say .csv file ?

I was under the impressions that pdfs are not that safe. I thought they can do stuff like execute a subset of PostScript and Javascript.


Look, when it comes to corporate reporting, PDFs are pretty much the gold standard. Sure, they've got some potential security issues, but any decent company's IT department has them well in hand.

Think about it - you want your reports to look sharp, right? PDFs deliver that professional look every time, no matter who opens them or on what device. Plus, they've got all those nifty features like password protection and digital signatures that the big guys love.

CSV files? They're great for crunching numbers, but let's face it - they look about as exciting as a blank wall. Try sending a CSV report to the board of directors and watch their eyes glaze over.

So, yes, for reporting in a company that's got its security act together, PDFs are your best bet. They're like the well-dressed, security-savvy cousin of other file formats - they look good and keep things safe.


More than plain text? I doubt so.


common people don't talk about plain text. what are you? a hacker?!?


I mean I guess you don't care as long as the file is signed if it is just some regulatory stuff that barely anyone would ever read anyway.


There are of course way more efficient methods for generating templated pdfs than using a typesetter.


I'm interested to hear what you would propose.


Not sure what GP had in mind, but one can programmatically generate PDFs directly, without using something like Typst as a "middleman".


Have you tried doing that? It’s no fun at all and far from easy. I don’t quite see a benefit in doing it without some utility.


Besides, the generation of PDF reports is usually decoupled from the templates, so you will have to work on your own "middleman".


Apache iText, for example.


I guess some webkit solution like wkhtmltopdf


How is that more efficient than Typst exactly?


I was reading second sentence, and I knew it was zerodha. It’s good to see more open source in your tech stack.


Thanks for your writeup, that was exceptionally well-presented.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: