Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a graph database which does disk IO for database startup, backup and restore as single threaded, sequential, 8kb operations.

On EBS it does at most 200MB/s disk IO just because the EBS operation latency even on io2 is about 0.5 ms. Even though the disk can go much faster, disk benchmarks can easily do multi-GB/s on nodes that have enough EBS throughput.

On instance local SSD on the same EC2 instance it will happily saturate the whatever instance can do (~2GB/s in my case).



What graph db is that?


neo4j


What is the cost of running Neo4j on aws vs using aws Neptune? Related to disk I/o?


It is not easily possible to directly compare neo4j and AWS Neptune as the former does not exist as a fully managed service in AWS. neo4j is available through the AWS marketplace, though, but it most assuredly runs on an EC2 instance by neo4j (the company).

We run a modest graph workload (relatively small dataset wise but an intense on graph edge wise) on Neptune that costs us slightly under USD 600 per month – that is before the enterprise discount, so in reality we pay USD 450-500 a month. But we use Neptune Serverless that bursts out from time to time, which means that monthly charges are averaged out across the spikes/bursts. The monthly charges are for the serverless configuration of 3-16 NPU's.

Disk I/O stats are not available for Neptune, moreso for serverless clusters, and they would not be insightful anyway. The transactions per second rate is what I look at.


Tbh, I don't know. For us the switching cost alone would be pretty high. That said ongoing maintenance is pretty high as well.


Just want to chime in. Zhenni, cofounder of PuppyGraph. We created the first graph query engine that can sit on top of your relational databases (think Postgres, Iceberg, Delta lake, etc.), and query your relational data as a graph using Cypher and Gremlin, without any ETL or a separate graphdb needed. It's much more lightweight and easy to spin up. Because we sit on top of column based storage and our compute engine is distributed, we can achieve subsecond query speed across 1 billion nodes. Please check it out!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: