Hacker Newsnew | past | comments | ask | show | jobs | submit | more vsgherzi's commentslogin

https://oxide-and-friends.transistor.fm/episodes/mr-nagles-w...

oxide and friends episode on it! It's quite good


Very on brand, oxide's core proposition is to actually invent a new (server) os+hardware, so they question/polish many of the traditional protocols and standards from the golden era.


Not relevant? That’s the best part! Spill it!


the linked issues are quite interesting, why does go have to page in so much memory for the GoString? Is this for some sort of optimization? https://github.com/mullvad/mullvadvpn-app/pull/6727

if anyone else is more familiar with go (I only really do rust) is there no solution to preventing stack smashing on goroutines? https://github.com/mullvad/mullvadvpn-app/pull/7728 I understand that go routines have a smaller stack size (the whole green thread problem) but there's no way to fix this?


It was solved in another PR by using an alternate stack by setting the flag SA_ONSTACK.


last I heard for the private compute features they were racking and stacking m2 mac pros


I honestly forgot they still made the Mac Pro. Amazing that they have these ready to ship on their website. But at a 50% premium over similar but faster Mac Studio models, what is the point? You can't usefully put GPUs in them as far as I know. You'd have to have a different PCIe need to make it make sense.


all PCIe lanes combined in that machine can do over 1 terabit. Would be quite the networking beast.


The M2 Ultra has 32 off-world PCIe lanes, 8 of which are obligated to the SSDs. That leaves only 24 lanes for the 7 slots. That's 8 times less than you'd get from an EPYC, which is the kind of thing a normal user would put in a rack if they did not need to use macos.


Listening to the podcast episode on this was incredible. It’s shocking to me that hardware manuals can be inaccurate. I thought only software devs could write bad docs. /s


I listened the wrap up episode… I didn’t realize it was a different bug …


Why does cloudflare allow unwraps in their code? I would've assumed they'd have clippy lints stopping that sort of thing. Why not just match with { ok(value) => {}, Err(error) => {} } the function already has a Result type.

At the bare minimum they could've used an expect("this should never happen, if it does database schema is incorrect").

The whole point of errors as values is preventing this kind of thing.... It wouldn't have stopped the outage but it would've made it easy to diagnose.

If anyone at cloudflare is here please let me in that codebase :)


Not a cloudflare employee but I do write a lot of Rust. The amount of things that can go wrong with any code that needs to make a network call is staggeringly high. unwrap() is normal during development phase but there are a number of times I leave an expect() for production because sometimes there's no way to move forward.


Yeah it seems likely that even if there wasn't an unwrap, there would have been some error handling that wouldn't have panicked the process, but would have still left it inoperable if every request was instead going through an error path.


I'm in a similar boat, at the very leas an expect can give hits to what happened. However this can also be problematic if your a library developer. Sometimes rust is expected to never panic especially in situations like WASM. This is a major problem for companies like Amazon Prime Video since they run in a WASM context for their TV APP. Any panic crashes everything. Personally I usually just either create a custom error type (preferred) or erase it away with Dyn Box Error (no other option). Random unwraps and expects haunt my dreams.


At risk of sounding harsh, that’s a huge failure in your modeling of invariants that should not be permitted in development.

Permitting it in development is why one ends up in the position of having to use an `expect()` in production code, because your API surfaces are wrong and can’t model your actual invariants.


unwrap() is only the most superficial part of the problem. Merely replacing `unwrap()` with `return Err(code)` wouldn't have changed the behavior. Instead of "error 500 due to panic" the proxy would fail with "error 500 due to $code".

Unwrap gives you a stack trace, while retuned Err doesn't, so simply using a Result for that line of code could have been even harder to diagnose.

`unwrap_or_default()` or other ways of silently eating the error would be less catastrophic immediately, but could still end up breaking the system down the line, and likely make it harder to trace the problem to the root cause.

The problem is deeper than an unwrap(), related to handling rollouts of invalid configurations, but that's not a 1-line change.


We don't know what the surrounding code looks like, but I'd expect it handles the error case that's expressed in the type signature (unless they `.unwrap()` there too).

The problem is that they didn't surface a failure case, which means they couldn't handle rollouts of invalid configurations correctly.

The use of `.unwrap()` isn't superficial at all -- it hid an invariant that should have been handled above this code. The failure to correctly account for and handle those true invariants is exactly what caused this failure mode.


And the error magically disappears when the function returns it?


It doesn’t disappear, it forces you to handle it.


Propagating upwards a valid way of handling it and often the correct answer.

There needs to be something at the top level that can handle a crashing process.


You mean like kubernetes that restarts your program when it crashes?


can Rust handle global panics?

Or can a unwrap be stopped?

This is just a normal Tuesday for languages with Exception and try/catch.


> This is just a normal Tuesday for languages with Exception and try/catch.

Yes, unfortunately, random stack unrolls and weird state bugs as a result are a normal Tuesday for languages with (unchecked) Exception and try/catch


would you get backtraces with this implementation? I've been increasingly frustrated with axum. It hits my custom error where I print something like "sqlx database failed" but when there's lots of traffic it's hard to determine which of the hundreds of queries actually failed. What I really want is the top part of a backtrace or at least a line number. I know anyhow gives bts but will it actually be useful when used like this?

Anyone else have ideas on how to solve this?


i do not see problems with axum.

we wrote some macro(with muncher) which allows strongly typed and yet dry per method error, error split of public part vs internal log only(with backtrace crate trace capture), full openapi support(via schemars) and we do not use anyhow.

whole article is rot and bad advice.


mind sharing a link?


end result is here https://zo-devnet.n1.xyz/docs - well documented strongly typed errors for API. most eveolved

https://zo-devnet.n1.xyz/docs#tag/default/post/action - mixes binary encoding, headers, and numeric errors embeeding (http goes to binary tx format).

all strongly typed without duplicated boilerplate.

inside - ad hoc error assembly - display_doc,derive_more, backtrace - no anyhow, not thiserror, no snafu.

fail fast and panic a lot, covered by proptests(toward fuzzytests).

used https://github.com/target-san/scoped-panic-hook to catch panics as exceptions.

was thinking to use https://github.com/iex-rs/lithium

panics = bugs or undefinenide behavour.


used like this

``` pub async fn account_pubkey( GetAccountPubkey { account_id }: GetAccountPubkey, State(st): State<AppState>, ) -> Response!(RegistrationKey; not_found: UserNotFound) { ```

``` pub async fn action( SubmitAction {}: SubmitAction, State(st): State<AppState>, TypedHeader(content_type): TypedHeader<headers::ContentType>, body: axum::body::Bytes, ) -> Result< axum::body::Bytes, ApiError!(unsupported_media_type: AcceptedMediaType, payload_too_large: PayloadTooLarge), > { ```

as you see limitation of ast macro - single entity per code. to do more - need proc macro.

default response is json

``` #[macro_export] macro_rules! Response { ($ty:ty) => { Result<::axum::Json<$ty>, $crate::http::error::Error<()>> }; ($ty:ty; $($var:ident : $e:ty),) => { Result<::axum::Json<$ty>, nord_core::ApiError!($($var : $e),)> }; } ```

use ApiError for other mime types

``` #[macro_export] macro_rules! ApiError { ( $($var:ident : $e:ty),* $(,)? ) => { nord_core::ApiError!({ $($var : $e,)* }) }; () => { $crate::http::error::Error::<()> }; ({ $($var:ident : $e:ty),* $(,)? }) => { nord_core::ApiError!( @internal bad_request: $crate::http::error::Infallible, not_found: $crate::http::error::Infallible, forbidden: $crate::http::error::Infallible, unsupported_media_type: $crate::http::error::Infallible, payload_too_large: $crate::http::error::Infallible, not_implemented: $crate::http::error::Infallible, | $($var : $e,)* ) }; ( @internal bad_request: $bad_request:ty, not_found: $not_found:ty, forbidden: $forbidden:ty, unsupported_media_type: $unsupported_media_type:ty, payload_too_large: $payload_too_large:ty, not_implemented: $not_implemented:ty, | // empty ) => { $crate::http::error::Error<( $bad_request, $not_found, $forbidden, $unsupported_media_type, $payload_too_large, $not_implemented, )> }; ( @internal bad_request: $_:ty, not_found: $not_found:ty, forbidden: $forbidden:ty, unsupported_media_type: $unsupported_media_type:ty, payload_too_large: $payload_too_large:ty, not_implemented: $not_implemented:ty, | bad_request: $bad_request:ty, $($rest:tt)* ) => { nord_core::ApiError!( @internal bad_request: $bad_request, not_found: $not_found, forbidden: $forbidden, unsupported_media_type: $unsupported_media_type, payload_too_large: $payload_too_large, not_implemented: $not_implemented, | $($rest)* ) };

.... crazy a lot of repeated code of recursive tt muncher

```

error can be custom:

``` #[derive(Debug, serde::Serialize, serde::Deserialize, schemars::JsonSchema)] pub struct AcceptedMediaType { pub expected: String, }

impl ExpectedMimeType for AcceptedMediaType { fn expected(&self) -> &str { &self.expected } }

impl AcceptedMediaType { pub fn new(value: headers::ContentType) -> Self { Self { expected: value.to_string(), } } } ```

each method has its own error as needed.

openapi integration ```

impl<ST: StatusTypes> aide::OperationOutput for Error<ST> { type Inner = Self;

    fn inferred_responses(
        cx: &mut aide::generate::GenContext,
        op: &mut aide::openapi::Operation,
    ) -> Vec<(Option<u16>, aide::openapi::Response)> {
        [
            <ST::BadRequest as OperationOutputInternal>::operation_response(cx, op)
                .map(|x| (Some(400), x)),
              ....
            <ST::UnsupportedMediaType as OperationOutputInternal>::operation_response(cx, op).map(
                |mut x| {
                    use aide::openapi::{
                        Header, ParameterSchemaOrContent, ReferenceOr, SchemaObject,
                    };
                    let header = Header {
                        description: Some("Expected request media type".into()),
                        style: Default::default(),
                        required: true,
                        deprecated: None,
                        format: ParameterSchemaOrContent::Schema(SchemaObject {
                            json_schema: schemars::schema::Schema::Object(
                                schemars::schema_for!(String).schema,
                            ),
                            external_docs: None,
                            example: None,
                        }),
                        example: Some(serde_json::json!(
                            mime::APPLICATION_OCTET_STREAM.to_string()
                        )),
                        examples: Default::default(),
                        extensions: Default::default(),
                    };
                    x.headers
                        .insert(header::ACCEPT.to_string(), ReferenceOr::Item(header));
                    (Some(415), x)
                },
            ),
            ...
            <ST::NotImplemented as OperationOutputInternal>::operation_response(cx, op)
                .map(|x| (Some(501), x)),
        ]
        .into_iter()
        .flatten()
        .collect()
    }
} ```

user vs internal errors - tracing:

``` impl<ST: StatusTypes> IntoResponse for Error<ST> where ST: StatusTypes, { fn into_response(self) -> axum::response::Response { let status = self.status_code(); match self { Self::Internal(error) => { let error = &error as &dyn std::error::Error; tracing::error!(error, "internal error during http request"); (status, Json("INTERNAL SERVER ERROR")).into_response() } Self::Forbidden(e) => (status, Json(e)).into_response(), Self::UnsupportedMediaType(e) => { let value = HeaderValue::from_str(e.expected()); let mut resp = (status, Json(e)).into_response(); if let Ok(value) = value { resp.headers_mut().insert(header::ACCEPT, value); } resp } Self::PayloadTooLarge(e) => (status, Json(e)).into_response(), ... } } }

```

and types sugar ``` mod typelevel {

    /// No client error defined; this type can't be constructed.
    #[derive(Debug, Serialize, Deserialize)]
    pub enum Infallible {}

    impl ExpectedMimeType for Infallible {
        fn expected(&self) -> &str {
            unreachable!("Infallible")
        }
    }

    pub trait ExpectedMimeType {
        fn expected(&self) -> &str;
    }

    pub trait StatusTypes {
        type BadRequest: serde::Serialize + OperationOutputInternal;
        type UnsupportedMediaType: serde::Serialize + OperationOutputInternal + ExpectedMimeType;
        ...
    }

    impl StatusTypes for () {
        type BadRequest = Infallible;
        type NotFound = Infallible;
... }

    impl StatusTypes for Infallible {
        type BadRequest = Infallible;
        type NotFound = Infallible;
.. }

    impl<
        BadRequest: serde::Serialize + OperationOutputInternal,
        NotFound: serde::Serialize + OperationOutputInternal,
        Forbidden: serde::Serialize + OperationOutputInternal,
        UnsupportedMediaType: serde::Serialize + OperationOutputInternal + ExpectedMimeType,
        PayloadTooLarge: serde::Serialize + OperationOutputInternal,
        NotImplemented: serde::Serialize + OperationOutputInternal,
    > StatusTypes
        for (
            BadRequest,
            NotFound,
            Forbidden,
            UnsupportedMediaType,
            PayloadTooLarge,
            NotImplemented,
        )
    {
        type BadRequest = BadRequest;
        type NotFound = NotFound;
        type Forbidden = Forbidden;
        type UnsupportedMediaType = UnsupportedMediaType;
        type PayloadTooLarge = PayloadTooLarge;
        type NotImplemented = NotImplemented;
    }
... }

use typelevel::;

#[derive(Debug)] pub enum Error<ST: StatusTypes> { Internal(Box<dyn std::error::Error + Send + Sync>), BadRequest(ST::BadRequest), ... }

impl std::fmt::Display for Infallible { fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match *self {} } }

impl<ST: StatusTypes> Error<ST> { pub fn internal(value: impl std::error::Error + Send + Sync + 'static) -> Self { Self::Internal(Box::new(value)) }

    pub fn bad_request(value: ST::BadRequest) -> Self {
        Self::BadRequest(value)
    }

    pub fn not_found(value: ST::NotFound) -> Self {
        Self::NotFound(value)
    }

    pub fn forbidden(value: ST::Forbidden) -> Self {
        Self::Forbidden(value)
    }
```


I understand ffmpeg being angry at the workload but this is how it is with large open source projects. Ffmpeg has no obligation to fix any of this. Open source is a gift and is provided as is. If Google demanded a fix I could see this being an issue. As it is right now it just seems like a bad look. If they wanted compensation then they should change the model, there's nothing wrong with that. Google found a bug, they reported it. If it's a valid bug then it's a valid bug end of story. Software owes it to its users to be secure, but again it's up to the maintainers if they also believe that. Maybe this pushes Google to make an alternative, which I'd be excited for.


> Software owes it to its users to be secure.

There is no such obligation.

There is no warranty and software is provided AS-IS explicitly by the license.


I disagree, as software engineers we owe it to the craft to create correct software especially when we intend to distribute. Anything less is poor taste.

You bring up licensing. I’m not talking about legally I’m talking about a social contract.


The choice of license is also a a partial descriptor of the social contract. If I wanted to work on it for “customers” I would sell it. I don’t owe you anything otherwise.

The social contract is “here is something I’ve worked on for free, and it is a gift. Take it or leave it.”

You want me to work on something for it? FYPM


For GP's sake, even before you make it to FYPM levels of angry, you will be in over your head. It's too much work. I remember being very early in my career and feeling like GP does. This is very easily more than a full-time job. The demands people will make of you and the attitudes they will use to do it will make you crazy.


> Google found a bug

That does not impact their business or their operations in any way whatsoever.

> If it's a valid bug then it's a valid bug end of story.

This isn't a binary. It's why CVEs have a whole sordid scoring system to go along with them.

> Software owes it to its users to be secure

ffmpeg owes me nothing. I haven't paid them a dime.


> ffmpeg owes me nothing. I haven't paid them a dime.

That is true. At the same time Google also does not owe the ffmpeg devs anything either. It applies both ways. The whole "pay us or we won't fix this" makes no sense.


> Google also does not owe the ffmpeg devs anything either.

Then they can stop reporting bugs with their assinine one size fits all "policy." It's unwelcome and unnecessary.

> It applies both ways.

The difference is I do not presume things upon the ffmpeg developers. I just use their software.

> The whole "pay us or we won't fix this" makes no sense.

Pay us or stop reporting obscure bugs in unused codecs found using "AI" scanning, or at least, if you do, then change your disclosure policy for those "bugs." That's the actual argument and is far more reasonable.


> Then they can stop reporting bugs with their asinine one size fits all "policy." It's unwelcome and unnecessary.

Right, they should just post the 0days on their blog.


I for one welcome it. I want to know if there are some vulnerabilities in the software I use.


It doesn’t matter if it affects their business or not. They found an issue and they reported it. Ffmpeg could request that they report it privately perhaps. Google has a moral duty to report the bug.

Software should be correct and secure. Of course this can’t always be the case but it’s what we should strive for. I think that’s baseline


> That does not impact their business or their operations in any way whatsoever.

I don't know what tools and backends they use exactly, but working purely by statistics, I'm sure some place in Google's massive cloud compute empire is relying on ffmpeg to process data from the internet.


And they're processing old LucasArts codec videos with it? Which is the specific bug report in question.


It's unlikely the specific codec that is the issue but the bug report suggests that the code path could be hit by a maliciously crafted payload since ffmpeg does file fuzzing. They almost certainly have ffmpeg stuff that touches user submitted videos.


They're probably not manually selecting which codecs and codec parameters to accept and sticking to the default ones instead.

Plus, this bug was reported by AI, so it was as much a proof of concept/experiment/demonstration of their AI security scanner as it was an attempt to help secure ffmpeg


>Ffmpeg has no obligation to fix any of this

I read this as nobody wants CVEs open on their product, so you might feel forced to fix them. I find it more understandable if we talk about web frameworks: Wordpress don't want security CVEs open for months or years, or users would be upset they introduce new features while neglecting safety.

I am a nobody, and whenever I found a bug I work extra to attach a fix in the same issue. Google should do the same.


Why is there an onus on Google to fix this? Bug bounty hunters aren’t required to submit a patch even when the target is open source.

Now should Google? Probably, it would be nice but no one has to. The gift from Google is the discovery of the bug.


Hm does this mean Triangulation is an NSA creation or are they just using it?


Oh wow interesting, so the rewrite only went 1/2 as fast? I know cloudflare uses ebpf quite heavy whereas the Great Firewall uses DPDK. I wonder if cloudflare's motivation is just to run it easier on GCP. Any cloudflare employee's here?


Pretty much, which was incredible for a half day rewrite, learning ebpf in rust included. The effort to result ratio is simply incredible. A few cleanups and optimizations later and I was pretty much convinced I would not need to touch DPDK again (so was the company). Following this experiments, I wrote some actual production grade eBPF routers at this company that are in production, much more complex, but still able to reach 200Mpps on a $500 CPU (EPYC 9015).

As for why Cloudflare uses eBPF where the GFW uses DPDK I can see a few reasons:

- DPDK was the only game in town when the GFW started, while eBPF was the hot new thing for Cloudflare's recent endeavors. GFW did not have any choice.

- Cloudflare has a performance focus, but still has a bit of "hardware is cheap, engineers are expensive", making eBPF more than fine.

- The GFW runs on dedicated machines on the traffic path, while I would expect most of Cloudflare's eBPF endeavors run directly on mixed-workloads machines. One of their first blogpost about it (dropping x Mpps) specifically calls the reason was to protect an end machine directly on said machine, by preventing bad packets from reaching the kernel stack

- Most of the operational advantages I already mentioned. GFW is fine with a "drop traffic if DPDK down", but Cloudflare is absolutely not, making the operational simplicity a bit win.

I bet Cloudflare does have quite a hefty DPDK application used for the traffic scrubbing part of their anti-ddos; but they don't publicize it because it's not as shiny as eBPF.

There are also other advantages to eBPF that makes it better suited to a multi-product company like cloudflare that don't weigh as much as in a mono-product org like the GFW. Take for example the much easier testing, dev env on any laptop, ... Or that eBPF probes can be written in Rust, getting the same featureful language to run in the kernel and in userspace (the classic combo is Go in userspace, C in kernelspace).


gotcha that makes sense, thanks for sharing! Impressed that you were eventually able to make a full ebpf router to stand in for the DPDK one, might have to look into it as a serious alternative for me. Differences between tuning and hardware types has been a nightmare with DPDK especially with cloud deployments.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: