facet 소개: Rust를 위한 리플렉션

나는 오래전부터 Rust 컴파일 시간과 전쟁 중이다.

내 해결책의 일부는 Apple Silicon의 꿈나라에 돈을 쏟아부은 것이었다. 거기선 빌드가… 좀 더 빠르다. 가끔 x86_64 서버에 SSH로 들어갈 때마다, 심지어 64코어짜리 좋은 머신이어도, 그 차이를 매번 체감한다.

또 다른 일부는, 물론, Rust 자체에 더럽게 손을 담그는 것이었다.

나는 러스트 빌드 성능을 심층 분석한 글, rustc의 self-profiling까지 파고든 Why is my Rust build so slow?을 썼다.

nixpkgs 연재도 한 편 썼고, earthly로 갈아탔다가, 그게 죽어서 다시 떠났고, 이제는… 모두와 마찬가지로 겸허하게 Dockerfile을 쓴다.

하지만 아니. 나는 모두와 같지 않다. Rust는 동적 링크에 우호적이지 않다고들 했지만, 나는 dylo와 rubicon 같은 도구와 잘 놀도록 만들었다. “웁스, tokio가 동적 오브젝트마다 서로 다른 런타임이 있다고 생각하네” 같은 문제를 해결하면서.

그리고 내 웹사이트를 구동하는 소프트웨어(이름은 home, 지금은 오픈소스다)를 동적 라이브러리 모음으로 전달할 수 있었다. 각 라이브러리가 자연스러운 컨테이너 레이어가 되어, 빠른 배포에 아주 좋았다. 코드 변경 없음 = 레이어 재사용. 간단하다.

그러다 블로그에서는 동적 링크 사용을 멈췄다. rustc의 내장 동적 링크 지원이 내게 맞을 거라 생각했기 때문이다. 그건 내 커스텀을 전부 걷어내는 일이었고(마침내 upstream tokio로 되돌아간 건 안도였다). 그런데 곧 깨달았다. 하하, 아니, rustc의 동적 링크 지원은 내게 전혀 맞지 않는다. 그렇다고 다시 돌아가고 싶지는 않았고, 문제를 다른 각도에서 공격하기로 했다.

syn 없는 자가… ----------------------------- 내가 빌드 시간을 신경 쓰는 가장 큰 이유는 빠르게 반복하고 싶기 때문이다.

Rust의 “컴파일되면 아마 돌아가고, 돌아가면 아마 바른 일을 한다”는 이상에도 불구하고, 나는 내 웹사이트에 변경을 하고 그 결과를 충분히 빠르게 보고 싶다.

그리고 로컬에서 변경을 마치고 배포하려면, CI가 빨리 돌아야 한다! 그래야 컨테이너 이미지로 포장되어 전 세계, 이번 달에 내가 감당하기로 한 PoP 수만큼 퍼지고, 그 다음은 Kubernetes가 롤아웃을 알아서 해준다. 그 얘기는 여기서 더럽히지 말자.

즉, 내 웹사이트 소프트웨어를 자주 빌드하게 된다! 빌드 타이밍을 볼 기회도 많았다! 그리고, 음, 꽤 큰 C 의존성(zstandard, libjxl 같은)이 몇 개 있고, 큰 Rust 의존성(tantivy 같은)도 있고, 그리고… syn과 serde처럼 자주 눈에 띄는 의존성도 몇 개 더 있다.

이 숙제는 예전에 했다. The virtue of unsynn에서: syn 크레이트는 빌드의 크리티컬 패스에 자주 올라온다 — 인과 프로파일링으로, syn이 마법처럼 빨라지면 실제로 우리 빌드도 빨라진다는 걸 확인했다.

그리고 “우리 빌드”라고 한 건, 동지여, 네 프로젝트도 syn에 의존하고 있을 가능성이 매우 높기 때문이다.

내 CMS home은 6개의 서로 다른 경로를 통해 syn 1에 의존한다…



home on  HEAD (2fe6279) via 🦀 v1.89.0-nightly
❯ cargo tree -i syn@1 --depth 1
syn v1.0.109
├── const-str-proc-macro v0.3.2 (proc-macro)
├── lightningcss-derive v1.0.0-alpha.43 (proc-macro)
├── phf_macros v0.10.0 (proc-macro)
├── ptr_meta_derive v0.1.4 (proc-macro)
└── rkyv_derive v0.7.45 (proc-macro)
[build-dependencies]
└── cssparser v0.29.6

…그리고 syn 2에는 무려 25개의 서로 다른 경로로!! 실수가 아니다!



❯ cargo tree -i syn@2 --depth 1        
syn v2.0.101     
├── arg_enum_proc_macro v0.3.4 (proc-macro)
├── async-trait v0.1.88 (proc-macro)
├── axum-macros v0.5.0 (proc-macro)
├── clap_derive v4.5.32 (proc-macro)
├── cssparser-macros v0.6.1 (proc-macro)
├── darling_core v0.20.11
├── darling_macro v0.20.11 (proc-macro)
├── derive_builder_core v0.20.2
├── derive_builder_macro v0.20.2 (proc-macro)
├── derive_more v0.99.20 (proc-macro)
├── displaydoc v0.2.5 (proc-macro)
├── futures-macro v0.3.31 (proc-macro)
├── num-derive v0.4.2 (proc-macro)
├── phf_macros v0.11.3 (proc-macro)
├── profiling-procmacros v1.0.16 (proc-macro)
├── serde_derive v1.0.219 (proc-macro)
├── synstructure v0.13.2
├── thiserror-impl v1.0.69 (proc-macro)
├── thiserror-impl v2.0.12 (proc-macro)
├── tokio-macros v2.5.0 (proc-macro)
├── tracing-attributes v0.1.28 (proc-macro)
├── yoke-derive v0.8.0 (proc-macro)
├── zerofrom-derive v0.1.6 (proc-macro)
├── zeroize_derive v1.4.2 (proc-macro)
└── zerovec-derive v0.11.1 (proc-macro)
[build-dependencies]
└── html5ever v0.27.0

thiserror는 두 버전이 있고, 물론 clap, async-trait, displaydoc, 각종 futures 매크로, perfect hash maps, tokio 매크로, tracing, zerovec, zeroize, zerofrom, yoke 등등, 그리고 물론 serde가 있다.

그리고… 저 목록의 몇몇은 내가 바꿔 낄 수도 있겠지만, serde는… 쉽지 않다. 2025년 5월 기준, syn은 9억 다운로드로 사상 최다 다운로드 크레이트이며, serde는 5억4천만 다운로드로 근소한 11위다.

이 크레이트들의 인기에는 그만한 이유가 있다. 너무 유용하니까. 하지만 더 들여다볼수록 만족스럽지 않았다.

빌드가 느린 크레이트를 가진 사람이 자연스럽게 떠올릴 해법은, 여러 크레이트로 쪼개는 것이다. 하지만 serde의 접근 방식에서는 큰 차이가 나지 않는다.

그리고 그 이유를 이해하려면, 단형화(monomorphization)에 대해 이야기해야 한다.

단형화 ---------------- 당신에게 타입이 아주 많다고 해보자. API가 있고 JSON 페이로드가 있고, 게다가 카탈로그가 있다:

use chrono::{NaiveDate, NaiveDateTime};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

…그리고 계속된다:

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct CatalogMetadata {
    pub version: String,
    pub region: String,
}

그리고 또 계속:

/// 카탈로그에 표현된 사업체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Business {
    pub id: Uuid,
    pub name: String,
    pub address: Address,
    pub owner: BusinessOwner,
    pub users: Vec<BusinessUser>,
    pub branches: Vec<Branch>,
    pub products: Vec<Product>,
    pub created_at: NaiveDateTime,
}

계속된다. 좋은 감으로, 이 모든 걸 bigapi-types 크레이트에 넣었다고 하자.

그리고 설명을 위해, bigapi-indirection 크레이트에는 다음이 있다:

use bigapi_types::generate_mock_catalog;

pub fn do_ser_stuff() {
    // 모의 카탈로그 생성
    let catalog = generate_mock_catalog();

    // 카탈로그를 JSON으로 직렬화
    let serialized = serde_json::to_string_pretty(&catalog).expect("Failed to serialize catalog!");

    println!("Serialized catalog JSON:\n{}", serialized);

    // 다시 Catalog 구조체로 역직렬화
    let deserialized: bigapi_types::Catalog =
        serde_json::from_str(&serialized).expect("Failed to deserialize catalog");

    println!("Deserialized catalog struct!\n{:#?}", deserialized);
}

마지막으로, do_ser_stuff만 호출하는 애플리케이션 bigapi-cli가 있다:

fn main() {
    println!("About to do ser stuff...");
    bigapi_indirection::do_ser_stuff();
    println!("About to do ser stuff... done!");
}

코드량만 보면, CLI는 빌드가 아주 빨라야 하고, indirection도 몇 번 호출하는 것뿐이니 마찬가지. 반면 bigapi-types는 구조체 정의가 잔뜩 있고 모의 카탈로그를 만드는 함수도 있으니 아주 느려야 한다!

음, 콜드 디버그 빌드에선 우리의 직관이 맞다:

Minimum duration: 0.10s Shown: 25/34 units

total

bigapi-cli-serde 0.07s

bigapi-indirection-serde 0.19s

bigapi-types-serde 0.29s

chrono 0.46s

serde_json 0.34s

uuid 0.15s

serde 0.80s

serde_derive 0.68s

syn 0.54s

num-traits 0.23s

proc-macro2 0.16s

libc 0.14s

num-traits 0.16s

ryu 0.16s

itoa 0.13s

serde_json 0.35s

core-foundation-sys 0.14s

memchr 0.25s

getrandom 0.36s

libc 0.36s

cfg-if 0.10s

autocfg 0.20s

serde 0.35s

proc-macro2 0.36s

unicode-ident 0.13s

콜드 릴리즈 빌드에서는 전혀 그렇지 않다:

Minimum duration: 0.06s Shown: 24/34 units

total

bigapi-cli-serde 0.07s

bigapi-indirection-serde 1.31s

bigapi-types-serde 0.38s

uuid 0.21s

chrono 0.66s

serde_json 0.46s

serde 0.88s

serde_derive 0.69s

syn 0.55s

quote 0.08s

num-traits 0.30s

proc-macro2 0.16s

libc 0.14s

num-traits 0.07s

iana-time-zone 0.08s

memchr 0.30s

libc 0.15s

ryu 0.19s

getrandom 0.13s

serde_json 0.12s

autocfg 0.13s

core-foundation-sys 0.10s

serde 0.14s

proc-macro2 0.14s

왜 indirection이 빌드 시간의 대부분을 차지할까? 이유는 serde_json::to_string_pretty와 serde_json::from_str가 제네릭 함수이기 때문이다. 그리고 그 인스턴스화가 bigapi-indirection 크레이트에서 일어난다.

bigapi-indirection을 살짝만 건드려도, 문자열 상수 하나만 바꿔도, 그 비용을 매번 다시 치른다:

Minimum duration: 0.04s Shown: 2/2 units

total

bigapi-cli-serde 0.11s

bigapi-indirection-serde 1.38s

bigapi-types를 건드리면 더 심하다! generate_mock_catalog에서 문자열 값 하나만 바꿨을 뿐인데도, 모든 걸 다시 빌드하게 된다:

Minimum duration: 0.09s Shown: 3/3 units

total

bigapi-cli-serde 0.23s

bigapi-indirection-serde 1.30s

bigapi-types-serde 0.40s

이게 단형화(monomorphization)다: Rust의 모든 제네릭 함수가 인스턴스화된다. T, K, V 같은 제네릭 타입 매개변수가 구체 타입으로 대체된다.

cargo-llvm-lines로 그 빈도를 볼 수 있다:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection | head -15
   Compiling bigapi-indirection v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection)
    Finished `release` profile [optimized] target(s) in 0.71s
  Lines                Copies              Function name
  -----                ------              -------------
  80335                1542                (TOTAL)
   8760 (10.9%, 10.9%)   20 (1.3%,  1.3%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
   3674 (4.6%, 15.5%)    45 (2.9%,  4.2%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
   3009 (3.7%, 19.2%)    11 (0.7%,  4.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
   2553 (3.2%, 22.4%)    37 (2.4%,  7.3%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
   1771 (2.2%, 24.6%)    38 (2.5%,  9.8%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
   1680 (2.1%, 26.7%)    20 (1.3%, 11.1%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
   1679 (2.1%, 28.8%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Product>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1569 (2.0%, 30.7%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Business>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1490 (1.9%, 32.6%)    10 (0.6%, 11.9%)  serde::ser::Serializer::collect_seq
   1316 (1.6%, 34.2%)     1 (0.1%, 11.9%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1302 (1.6%, 35.9%)     1 (0.1%, 12.0%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::UserProfile>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1300 (1.6%, 37.5%)    20 (1.3%, 13.3%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any

--release를 빼면 결과가 조금 달라진다 — 최적화를 하는 건 LLVM만이 아니다!

우리 타입들에 특화된, 서로 다른 제네릭 serde 메서드의 복사본이 40개 정도 있다. 이건 serde를 빠르게 만들지만, 빌드는 느리게 만든다.

그리고 바이너리도 약간 더 커진다:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo build --release
    Finished `release` profile [optimized] target(s) in 0.01s

bigapi on  main [+] via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 30 May 21:16  target/release/bigapi-cli

이건 serde가 작동하는 방식의 본질이다. 같은 저자의 miniserde는 다르게 동작하지만, 테스트할 수가 없다. uuid나 chrono가 miniserde 기능을 제공하지 않고, 내가 포크하기엔 귀찮아서.

다른 전략 -------------------- 나는 다른 전략을 택했다. 또 하나의 serde를 들이밀기는 아주 어려운 설득일 거라 판단했다. 너무나 더 좋아야만 한다. 첫 번째 serde가 너무 좋고, 충분히 적절해서, 사람들을 다른 걸로 옮기게 만들기란 매우 어려울 것이다!

그래서 무엇으로 serde를 대체하든, 더 빠르지는 않되, 내가 신경 쓰는 다른 특성을 가져야 한다고 결정했다.

예를 들어, 우리 프로그램을 serde 대신 facet을 쓰도록 포크했다고 하자:

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

다음과 같이 바꾼다:

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Facet, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

간접 크레이트는 이제 JSON을 위해 facet-json을, Debug 대신 facet-pretty를 쓴다:

use bigapi_types_facet::generate_mock_catalog;
use facet_pretty::FacetPretty;

pub fn do_ser_stuff() {
    // 모의 카탈로그 생성
    let catalog = generate_mock_catalog();

    // 카탈로그를 JSON으로 직렬화
    let serialized = facet_json::to_string(&catalog);

    println!("Serialized catalog JSON.\n{}", serialized);

    // 다시 Catalog 구조체로 역직렬화
    let deserialized: bigapi_types_facet::Catalog =
        facet_json::from_str(&serialized).expect("Failed to deserialize catalog!");

    println!("Deserialized catalog struct:\n{}", deserialized.pretty());
}

그리고 그 간접 크레이트에 의존하는 새 CLI를 만든다고 하자. 예전의 serde 기반 버전과 어떻게 비교될까?

앞으로 볼 숫자들에 내가 완전히 만족하는 건 아니다. 그래도 지금 시점의 facet 상태를 사실대로 보여주는 게 중요하다고 생각했다. 그리고 그로 인해 내가 느끼는 좌절을 앞으로 계속 작업할 동력으로 삼으려 한다.

하지만 어제 기준으로, serde-json과의 벤치마크 하나에서는 우리가 더 빨랐다! 100KB짜리 문자열을 직렬화하면, CodSpeed의 어떤 머신에서 451마이크로초밖에 안 걸린다:

이미지 2: 긴 문자열 직렬화 벤치마크 — facet-json 351.9µs, serde 460.9µs

샘플 프로그램으로 돌아오면, 상황이 썩 좋지는 않다:



bigapi on  main via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli{,-facet}
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 31 May 08:33  target/release/bigapi-cli
.rwxr-xr-x  2.1M amos 31 May 09:15  target/release/bigapi-cli-facet

우리 프로그램은 전보다 더 커졌다.

그리고 이번에는 왜 그런지 파악하기가 더 어렵다. serde 버전에서 cargo-bloat를 써보면, 코드가 어디로 갔는지 선명하게 보인다:



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli

 File  .text     Size Crate
17.0%  41.6% 351.9KiB bigapi_indirection
13.3%  32.4% 273.9KiB std
 3.5%   8.5%  72.2KiB chrono
 2.2%   5.3%  44.8KiB serde_json
 2.1%   5.2%  44.3KiB bigapi_types
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

하지만 facet 버전에선… std가 주범이라고?



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli-facet

 File  .text     Size Crate
 6.3%  20.7% 326.3KiB std
 5.9%  19.4% 305.5KiB bigapi_types_facet
 3.8%  12.7% 200.0KiB facet_deserialize
 3.8%  12.6% 198.1KiB bigapi_indirection_facet
 2.8%   9.4% 147.9KiB facet_json
 2.6%   8.7% 136.5KiB facet_core
 2.2%   7.1% 112.3KiB chrono
 1.4%   4.8%  75.0KiB facet_reflect
 0.4%   1.3%  21.1KiB facet_pretty
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

그 뒤를 우리 타입 크레이트, facet_deserialize, 우리 indirection 크레이트, 그리고 facet_json, facet_core 등이 잇는다.

흥미롭게도, 코드는 여러 크레이트에 꽤 잘 분산되어 있다. 빌드 시간은 어떨까? 파이프라이닝이 되나?

콜드 디버그 빌드에선 bigapi-types가 더 오래 걸리지만, 다른 빌드를 끝까지 막지는 않는다:

Minimum duration: 0.08s Shown: 33/54 units

total

bigapi-cli-facet 0.09s

bigapi-indirection-facet 0.19s

facet-json 0.21s

facet-pretty 0.12s

facet-deserialize 0.28s

bigapi-types-facet 0.64s

facet-reflect 0.20s

facet-core 0.88s

facet-derive-emit 0.27s

facet-derive-parse 0.23s

chrono 0.46s

uuid 0.12s

unsynn 0.22s

quote 0.10s

time 0.65s

num-traits 0.24s

libc 0.15s

ariadne 0.18s

proc-macro2 0.20s

owo-colors 0.32s

yansi 0.13s

unicode-width 0.10s

deranged 0.24s

mutants 0.13s

byteorder 0.10s

owo-colors 0.15s

bitflags 0.10s

proc-macro2 0.15s

powerfmt 0.09s

getrandom 0.15s

core-foundation-sys 0.09s

libc 0.18s

autocfg 0.15s

콜드 릴리즈 빌드에선 facet-deserialize, pretty, serialize, json이 모두 동시에 빌드할 수 있다! 그리고 indirection을 사용하는 어떤 크레이트든 그와 함께 빌드될 수도 있다 — 보라색으로 표시된 걸로 알 수 있다.

Minimum duration: 0.12s Shown: 30/54 units

total

bigapi-cli-facet 0.09s

bigapi-indirection-facet 1.29s

facet-json 0.58s

facet-deserialize 0.66s

facet-pretty 0.47s

bigapi-types-facet 1.09s

facet-reflect 0.35s

facet-derive 0.16s

facet-core 1.18s

facet-derive-emit 0.29s

facet-derive-parse 0.24s

chrono 0.81s

uuid 0.21s

unsynn 0.22s

time 0.98s

num-traits 0.32s

ariadne 0.25s

proc-macro2 0.21s

libc 0.15s

owo-colors 0.37s

yansi 0.18s

unicode-width 0.13s

mutants 0.13s

deranged 0.31s

owo-colors 0.16s

bitflags 0.16s

proc-macro2 0.18s

getrandom 0.18s

libc 0.20s

autocfg 0.17s

그래서, 지금으로선 바이너리가 더 크고 빌드 시간도 더 길다. 그럼 그 대가로 무엇을 얻나?

우선, 눈치챘을지 모르겠지만, Debug 구현을 잃었다: 역직렬화한 데이터를 출력하는 데 Debug를 쓰는 대신 facet-pretty를 쓴다:



bigapi on  main via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...
Serialized catalog JSON.
✂️
Deserialized catalog struct:
/// 모든 것의 카탈로그를 나타내는 루트 구조체.
Catalog {
  id: aa1238fa-8f72-45fa-b5a7-34d99baf4863,
  businesses: Vec<Business> [
    /// 카탈로그에 표현된 사업체.
Business {
      id: 65d08ea7-53c6-42e8-848e-0749d00b7bdd,
      name: Awesome Business,
      address: Address {
        street: 123 Main St.,
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,
        geo: Option<GeoLocation>::Some(GeoLocation {
          latitude: 51,
          longitude: -0.1,
        }),
      },
      owner: BusinessOwner {
        user: User {
          id: 056b3eda-97ca-4c12-883d-ecc043a6f5b4,

일회성 비용으로, 색상까지 포함한 예쁜 포매팅을 얻는다. 민감정보 마스킹도 지원한다!

로그에 번지수(도로명 번호)를 남기고 싶지 않나? 민감하다고 표시하라!

#[derive(Facet, Clone)]
pub struct Address {
    // 👇
    #[facet(sensitive)]
    pub street: String,
    pub city: String,
    pub state: String,
    pub postal_code: String,
    pub country: String,
    pub geo: Option<GeoLocation>,
}



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
✂️
Deserialized catalog struct:
/// 모든 것의 카탈로그를 나타내는 루트 구조체.
Catalog {
  id: 61f70016-eca4-45af-8937-42c03f9a5cd8,
  businesses: Vec<Business> [
    /// 카탈로그에 표현된 사업체.
Business {
      id: 9b52c85b-9240-4e73-9553-5d827e36b5f5,
      name: Awesome Business,
      address: Address {
        street: [REDACTED],
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,

물론 색상을 끌 수도 있다. 그리고 facet-pretty는 코드가 아니라 데이터에 의존하기 때문에, 출력 깊이를 제한하는 것도 가능하다 — Debug 트레이트는 유연성이 그 정도로 충분하지 않다.

그리고 그게 facet의 핵심 아이디어다: 파생 매크로가 코드를 생성하는 대신 데이터를 생성한다.

글쎄, 런타임에 임의의 값과 상호작용할 수 있도록 가상 테이블도 많이 생성한다. 그리고 그게 cargo-llvm-lines에 나타난다:



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-types-facet | head -15
   Compiling bigapi-types-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-types-facet)
    Finished `release` profile [optimized] target(s) in 0.92s
  Lines                Copies              Function name
  -----                ------              -------------
  80657                3455                (TOTAL)
  29424 (36.5%, 36.5%) 1349 (39.0%, 39.0%) core::ops::function::FnOnce::call_once
   5010 (6.2%, 42.7%)    50 (1.4%, 40.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1990 (2.5%, 45.2%)    70 (2.0%, 42.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}
   1900 (2.4%, 47.5%)   110 (3.2%, 45.7%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}
   1544 (1.9%, 49.4%)    11 (0.3%, 46.0%)  <T as alloc::slice::<impl [T]>::to_vec_in::ConvertVec>::to_vec
   1494 (1.9%, 51.3%)     1 (0.0%, 46.0%)  chrono::format::formatting::DelayedFormat<I>::format_fixed
   1467 (1.8%, 53.1%)    14 (0.4%, 46.5%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1071 (1.3%, 54.4%)    63 (1.8%, 48.3%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{constant}}::{{closure}}
    992 (1.2%, 55.7%)   277 (8.0%, 56.3%)  facet_core::types::value::ValueVTableBuilder<T>::new::{{closure}}
    986 (1.2%, 56.9%)     1 (0.0%, 56.3%)  chrono::format::formatting::write_rfc3339
    681 (0.8%, 57.7%)     1 (0.0%, 56.4%)  bigapi_types_facet::generate_mock_catalog::mock_product
    651 (0.8%, 58.5%)    35 (1.0%, 57.4%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}

바이너리 크기 최적화 관점에서 저기에는 저수준 과일들이 좀 남아 있다고 본다. 처음부터 거기에 두어 시간쯤만 썼고, 그게 전부라 그런 듯하다.

어쨌든 우리의 실행 파일은 facet이 노출하는 데이터를 이용해 예쁘게 색칠된 구조체를 보여준다. 같은 데이터를 이용해 facet-json은 JSON 포맷으로의 직렬화와 역직렬화도 수행한다.

속도로는 serde가 명백한 승자다. 정확한 수치는 최신 벤치마크를 참고하라. 이 글을 쓰는 시점에는, facet-json이 serde-json 대비 3~6배 정도 느리다:

로그 스케일로 보면 그리 나빠 보이지도 않는다!

더 엄밀하고 자동화된 걸 하고 싶지만, 마감이 있다 — 당장은 이 정도로 하자. 그리고 당신은 저 가벼운 마이크로벤치마크를 소금 한 꼬집과 함께 보겠다고 약속해라.

왜냐면, 최종 사용자 입장에서 보면 둘 다 즉시 끝난다:



❯ hyperfine -N target/release/bigapi-cli-serde target/release/bigapi-cli-facet --warmup 500
Benchmark 1: target/release/bigapi-cli-serde
  Time (mean ± σ):       3.4 ms ±   1.7 ms    [User: 2.3 ms, System: 0.9 ms]
  Range (min … max):     1.8 ms …  10.0 ms    1623 runs

Benchmark 2: target/release/bigapi-cli-facet
  Time (mean ± σ):       4.0 ms ±   1.9 ms    [User: 2.5 ms, System: 1.4 ms]
  Range (min … max):     1.8 ms …  13.7 ms    567 runs

Summary
  target/release/bigapi-cli-serde ran
    1.18 ± 0.82 times faster than target/release/bigapi-cli-facet

웜 빌드는 어떨까? 디버그에선 거의 보이지 않으니, 웜 릴리즈 빌드로 보자 — 우리의 big API는 사실 그리 크지 않다.

bigapi-types-serde를 약간 바꿨을 때:

Minimum duration: 0.09s Shown: 3/3 units

total

bigapi-cli-serde 0.23s

bigapi-indirection-serde 1.30s

bigapi-types-serde 0.40s

bigapi-types-facet을 약간 바꿨을 때는:

Minimum duration: 0.05s Shown: 3/3 units

total

bigapi-cli-facet 0.13s

bigapi-indirection-facet 1.26s

bigapi-types-facet 1.10s

사실상 비슷한 상황이다 — 시간이 대략 같다.

unsynn 글에서처럼 -j1을 쓰면 상황은 더 나빠진다.

Minimum duration: 0.03s Shown: 3/3 units

total

bigapi-cli-serde 0.08s

bigapi-indirection-serde 1.93s

bigapi-types-serde 0.50s

Minimum duration: 0.05s Shown: 3/3 units

total

bigapi-cli-facet 0.13s

bigapi-indirection-facet 4.34s

bigapi-types-facet 1.74s

헤이, 내 크레이트를 좋아 보이게 만드는 트릭만 쓸 수는 없다.

솔직히 나는 꽤 낙관적이다. 우리가 각종 마커 트레이트를 잔뜩 추가하고, 튜플의 모든 요소가 그 트레이트를 구현하면 튜플용 표준 트레이트를 재구현하고… 그런 걸 좀 과하게 했다고 본다 — 그건 공짜가 아니다!

다시 cargo-llvm-lines를 보면:



bigapi on  main [!+?⇡] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection-facet | head -10
   Compiling bigapi-indirection-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection-facet)
    Finished `release` profile [optimized] target(s) in 1.29s
  Lines                 Copies              Function name
  -----                 ------              -------------
  129037                4066                (TOTAL)
   33063 (25.6%, 25.6%) 1509 (37.1%, 37.1%) core::ops::function::FnOnce::call_once
    8247 (6.4%, 32.0%)     3 (0.1%, 37.2%)  facet_deserialize::StackRunner<C,I>::set_numeric_value
    6218 (4.8%, 36.8%)     1 (0.0%, 37.2%)  facet_pretty::printer::PrettyPrinter::format_peek_internal
    5279 (4.1%, 40.9%)     1 (0.0%, 37.2%)  facet_deserialize::StackRunner<C,I>::pop
    5010 (3.9%, 44.8%)    50 (1.2%, 38.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
    3395 (2.6%, 47.4%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::object_key_or_object_close
    2803 (2.2%, 49.6%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::value

왜 call_once가 LLVM IR 3만3천 줄을 차지하지? 왜 set_numeric_value 같이 u64를 u16으로 바꾸고 그 반대 하는 정도의 함수가 전체 코드의 6%를 넘게 먹지? 더 들여다보고 싶은데, 지금은 시간이 없다. 그래도 기준선은 되었다, 그렇지?

오늘, 내일 --------------- 기본 아이디어는 여전히 같다: 이건 고정 비용이다. facet-json에는 소수의 작은 제네릭 함수만 있다 — 아주 금방, 모든 것이 리플렉션의 영역으로 들어간다.

단형화가 아니다 — 우리는 파생 매크로가 생성한 데이터를 사용한다. 예컨대 StructType 같은 구조체로:

#[non_exhaustive]
#[repr(C)]
pub struct StructType<'shape> {
    pub repr: Repr,
    pub kind: StructKind,
    pub fields: &'shape [Field<'shape>],
}

각 필드는 오프셋과, 그만의 shape를 가진다:

#[non_exhaustive]
#[repr(C)]
pub struct Field<'shape> {
    pub name: &'shape str,
    pub shape: &'shape Shape<'shape>,
    pub offset: usize,
    pub flags: FieldFlags,
    pub attributes: &'shape [FieldAttribute<'shape>],
    pub doc: &'shape [&'shape str],
    pub vtable: &'shape FieldVTable,
    pub flattened: bool,
}

여기저기에 함수 포인터가 있다. Display 구현, FromStr, 비교 등의 호출을 가능하게 하려고. 모두 “한 번 컴파일된 코드”가 임의의 타입에 대해 동작하도록 설계되어 있다.

물론, 임의의 메모리 위치를 읽고 쓰려면 unsafe 코드가 필요하다.

그래서 그 위에 facet-reflect라는 안전한 레이어가 있다. 예컨대 이런 구조체를 따라가며 값을 들여다볼 수 있다:

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct Secrets {
    github: OauthCredentials,
    gitlab: OauthCredentials,
}

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct OauthCredentials {
    client_id: String,
    #[facet(sensitive)]
    client_secret: String,
}

…어떤 필드를 추출하는 식으로:

fn extract_client_secret<'shape>(peek: Peek<'_, '_, 'shape>) -> Result<(), Error> {
    let secret = peek
        .into_struct()?
        .field_by_name("github")?
        .into_struct()?
        .field_by_name("clientSecret")?
        .to_string();
    eprintln!("got your secret! {secret}");
    Ok(())
}

fn main() {
    let secrets: Secrets = facet_json::from_str(SAMPLE_PAYLOAD).unwrap();
    extract_client_secret(Peek::new(&secrets)).unwrap()
}



facet-demo on  main [!+] via 🦀 v1.87.0
❯ cargo run
   Compiling facet-demo v0.1.0 (/Users/amos/facet-rs/facet-demo)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/facet-demo`
got your secret! cs_5678

이미지 5: 멋쟁이 곰

facet은 rename과 rename_all을 지원한다. 그리고 그건 직렬화 레벨이 아니라, 리플렉션 레벨에서 지원된다.

이미지 6: 잘난 척 곰

flatten도 지원한다!

쓰기(write) 측면에서는, facet-reflect로 객체를 아예 처음부터 만들 수 있다:

fn fill_secrets(shape: &'static Shape<'static>) -> Result<(), Error> {
    let mut partial = Partial::alloc_shape(shape)?;
    let facet::Type::User(UserType::Struct(sd)) = shape.ty else {
        todo!()
    };
    for (i, field) in sd.fields.iter().enumerate() {
        eprintln!(
            "Generating {} for {}",
            field.shape.bright_yellow(),
            field.name.blue()
        );
        partial
            .begin_nth_field(i)?
            .set_field("clientId", format!("{}-client-id", field.shape))?
            .set_field("clientSecret", format!("{}-client-secret", field.shape))?
            .end()?;
    }
    let heapval = partial.build()?;
    print_secrets(heapval);

    Ok(())
}

여기서는 shape를 모를 때 쓰는 API(필드를 순회하는 등)도, 어떤 부분의 shape를 알 때 쓰는 API도 보여줬다. 솔직히 저 경우에는 set을 바로 써도 된다. 예를 들어:

partial
    .begin_nth_field(i)?
    .set(OauthCredentials {
        client_id: format!("{}-client-id", field.shape),
        client_secret: format!("{}-client-secret", field.shape),
    })?
    .end()?;

결국, 프로그램의 어느 부분을 정적으로 알고 있고, 어느 부분을 모르는지가 관건이다.

이걸 보면서 여러 가지가 떠오른다: 디버그 출력은 물론이고, 구조적 로깅(예: tracing), 테스트용 모의 데이터 생성 등등.

직렬화라는 사용례만 놓고 보더라도 흥미로운 점이 많다.

우리는 코드를 생성하는 대신 데이터를 생성한다. 그래서 완전히 다른 JSON 파서들도 동등한 조건에서 경쟁할 수 있다 — 모두 똑같은 데이터에 접근할 수 있으니까.

예를 들어 — serde-json은 재귀적이다.

나처럼 마음속에 어둠이 있다면, serde-json의 스택을 폭발시키는 프로그램을 만드는 건 비교적 쉽다.

먼저 꽤 큰 구조체가 필요하다…

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

…그 다음엔 중첩된 JSON을 잔뜩 생성한다…

fn generate_nested_json(depth: usize) -> String {
    fn build_layer(remaining_depth: usize) -> String {
        if remaining_depth == 0 {
            return "null".to_string();
        }

        format!("{{\"next\":{}}}", build_layer(remaining_depth - 1))
    }

    build_layer(depth)
}

그리고 serde-json으로 파싱한다!

fn main() {
    let depth = 110;
    let json = generate_nested_json(depth);
    let layer: Layer = serde_json::from_str(&json).unwrap();
    let mut count = 0;
    let mut current_layer = &layer;
    while let Some(next_layer) = &current_layer.next {
        count += 1;
        current_layer = next_layer;
    }
    println!("Layer count: {}", count);
}

그리고 붐, 스택 오버플로:



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/deepdeep`

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run' terminated by signal SIGABRT (Abort)

릴리즈에선 코드젠이 더 효율적이라 패딩을 좀 더 늘려야 한다:

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    _padding2: Option<[[f32; 32]; 32]>,
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

하지만 결과는 비슷하다!



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo build --release && lldb ./target/release/deepdeep
    Finished `release` profile [optimized] target(s) in 0.00s
(lldb) target create "./target/release/deepdeep"
Current executable set to '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64).
(lldb) r
Process 44914 launched: '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64)
Process 44914 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
    frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587:
->  0x100005640 <+36>: str    xzr, [sp], #-0x20
    0x100005644 <+40>: ldp    x10, x8, [x0, #0x20]
    0x100005648 <+44>: cmp    x8, x10
    0x10000564c <+48>: b.hs   0x100005720    ; <+260>
Target 0: (deepdeep) stopped.

LLDB는 친절하게 스택 트레이스를 보여준다:



(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
  * frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
    frame #1: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #2: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #3: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #4: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #5: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #6: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #7: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #8: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #9: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #10: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #11: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #12: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #13: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #14: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #15: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    ✂️
    frame #197: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #198: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #199: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #200: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #201: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #202: 0x00000001000008e4 deepdeep`serde_json::de::from_trait::h96b8ac2f4e672a8e + 92
    frame #203: 0x0000000100005abc deepdeep`deepdeep::main::hb66396babb66c58d + 80
    frame #204: 0x0000000100005400 deepdeep`std::sys::backtrace::__rust_begin_short_backtrace::h52797e85990f16c6 + 12
    frame #205: 0x00000001000053e8 deepdeep`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h66924f9d4742b572 + 16
    frame #206: 0x0000000100021d48 deepdeep`std::rt::lang_start_internal::hdff9e551ec0db2ea + 888
    frame #207: 0x0000000100005c28 deepdeep`main + 52
    frame #208: 0x000000019ecaeb98 dyld`start + 6076
(lldb) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

그 문제를 피하려고, 그리고 어차피 더 느릴 테니, facet-json은 대신 반복(iterative) 접근을 택했다.

우리는 Deserialize 대신 Facet을 파생하고, 기본값 필드를 기본값으로 표시해야 한다(현재 Option에 대한 암묵적 동작은 없다):

use facet::Facet;

#[derive(Facet)]
struct Layer {
    #[facet(default)]
    _padding1: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding2: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

그리고 facet-json의 from_str를 쓰면 끝이다:

let layer: Layer = facet_json::from_str(&json).unwrap();

작동한다:



deepdeep-facet on  main [!] via 🦀 v1.87.0
❯ cargo run --release
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `target/release/deepdeep`
Layer count: 109

그리고 재미있는 사실? 크래시하는 serde-json 버전보다, 제대로 동작하는 facet 버전이 더 빠르다:



~/facet-rs
❯ hyperfine --warmup 2500 -i -N deepdeep/target/release/deepdeep deepdeep-facet/target/release/deepdeep
Benchmark 1: deepdeep/target/release/deepdeep
  Time (mean ± σ):       2.3 ms ±   0.7 ms    [User: 0.7 ms, System: 0.9 ms]
  Range (min … max):     1.5 ms …   5.0 ms    1685 runs

  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: deepdeep-facet/target/release/deepdeep
  Time (mean ± σ):       1.4 ms ±   0.2 ms    [User: 0.6 ms, System: 0.5 ms]
  Range (min … max):     1.3 ms …   2.9 ms    1237 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (2.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

Summary
  deepdeep-facet/target/release/deepdeep ran
    1.66 ± 0.58 times faster than deepdeep/target/release/deepdeep

아무 의미 없겠지만, 아마도 macOS의 페이지 폴트가 느려서 그런 걸 거다. 그래도 웃겼다.

앞서 봤듯, facet-json은 오늘 기준으로 재귀 대신 반복을 택했고, 그 대가가 있다. 하지만 더 빠른 재귀 구현을 누군가 내는 데 아무 제약도 없다.

우리는 아직 SIMD를 쓰지 않는다. 하지만 누군가는 써야 한다! JSON 디코딩에 테이프 지향(tape-oriented) 접근을 하지는 않지만, 꽤 멋지다고 들었다!

나는 기본 facet-json 구현이 유연함을 유지하길 바란다 — 오류 메시지도 멋지다는 얘기 했던가?



bigapi on  main [!⇡] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...

thread 'main' panicked at bigapi-indirection-facet/src/lib.rs:17:43:
Failed to deserialize catalog!: WARNING: Input was truncated for display. Byte indexes in the error below do not match original input.
Error:
   ╭─[ json:1:82 ]
   │
 1 │ …t":"2025-05-31T10:06:35"}],"created_at":"2025-05-31T10:06:3_","metadata":{"version":"1.0.b1!","region":"US"}}
   │                                          ──────────┬──────────
   │                                                    ╰──────────── Operation failed on shape NaiveDateTime: Failed to parse string value
───╯

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

여기서 길게 머무르고 싶진 않지만, 우리의 혈세… 아니, 빌드 분(分)이 어디에 쓰이는지 아는 건 좋은 일이다, 알겠지?

말했듯: facet-json은 유연함을 유지했으면 한다. 디시리얼라이저 설정으로 허용한다면, 트레일링 콤마를 지원해야 한다고 본다. 마찬가지로 인라인 주석도, 설정으로 켤 수 있어야 한다.

또, 비동기 I/O를 지원하는 방법도 있다고 생각한다. 왜 안 되겠나? 모든 상태가 이미 힙에, 핀 박혀 있다 — 어떤 async 런타임도 그걸 꺼릴 이유가 없다.

반대로, 힙 사용이 느리다면, 다른 할당자를 시험해봐야 한다 — arena, bump 할당자 같은 것들? 아니면 시스템 기본보다 좀 더 현대적인 범용 할당자. jemalloc, mimalloc 등으로 벤치마크는 아직 안 했다.

유연함으로 돌아가서, 멋진 한 가지는 역직렬화 중 XPath 스타일의 셀렉터를 지원하는 것이다. 중첩된 데이터에서 필터링 — 예를 들어 어떤 구조체의 필드의 구조체의 배열 첫 10개 자식만 원한다고 하자 — 디시리얼라이저는 그걸 얻는 데 필요한 최소 작업만 하되, 데이터의 shape이 올바른지도 검증할 수 있다 — 많은 사람들이 serde를 좋아하는 이유다.

마지막으로, 나의 파이프 드림: 역직렬화 속도를 높이기 위한 JIT(Just-In-Time) 컴파일. 런타임에 최적의 명령을 활용하기 위해서만이 아니라, 데이터로부터 관찰한 사실도 활용하기 위해. 예컨대 객체 키가 항상 “one, four, two” 순서로 오고, “three”는 빠져 있다면, 그 패턴을 관찰한 뒤에는 그에 최적화된 코드를 런타임에 생성할 수 있다 — 단, 그건 그 패턴을 관찰한 이후에만.

이건 리플렉션으로 할 수 있는 일들의 빙산의 일각이다.

지금은 언어의 한계도 맞닥뜨리고 있다: typeid는 const가 아니고, const 문맥에서 두 typeid를 비교하는 건 불가능하다. 상수에서 사이클은 지원되지 않고, 당장 계획도 없다 — 이 말은 함수 포인터를 통한 간접화를 넣어야 한다는 뜻이다. 특수화(specialization)는 불안정하니, spez 크레이트에서 영감을 얻은 autoderef 트릭을 써야 한다.

완벽하진 않지만, 생태계를 통째로 재구성하는 건 너무 재미있다: JSON, YAML, TOML, XDR 크레이트가 있고, KDL은 작업 중이며, XML도 누가 가져가면 좋겠다 — 이번엔 @나 $ 프리픽스도 필요 없다. facet에서 그들을 일급 시민으로 만들자.

멋진 assertion 라이브러리도 갖고 싶다! 테스트에서 facet-pretty처럼? facet 기반의 property testing 라이브러리도 갖고 싶다. 무엇을 변이할 수 있고, 어디에 있는지 알고, 불변식도 검사할 수 있다 — 몇 가지 커스텀 속성만 더 있으면, 구조체 레벨 불변식 메서드를 루프 돌리기보다, 어떤 값을 생성해야 할지 충분한 정보를 줄 수 있다.

그리고 아직 전혀 탐구하지 않은 것? 함수용 Facet: 값을 조작하는 건 즐겁고, 내 프로그램의 전체 상태를 들여다보는 HTTP 인터페이스에 대해 환상도 품었지만, 다음 단계는 당연히 인터랙티브 REPL이다! 동적으로 호출할 수 있는 함수를 노출하고, 그 위에 RPC도 구축하고, facet 생태계의 표준이 되어가는 동일한 훌륭한 도구를 모두 활용하자.

그리고 직렬화/역직렬화와 조금 닮은 게 무엇일까? FFI! 다른 언어와 값을 주고받거나, 아니면 심지어 데이터베이스와도 — 왜 facet-sqlite, facet-postgres가 안 되겠나? 너무나 자연스럽다.

할 일은 아직 많고, churn도 현실이다 — 격주로 대규모 리라이트가 있었다 — 그래도 잠재력은 어마어마하다고 생각한다. 함께 해킹하자! 재밌다!

후원자분들께 감사드립니다:

여러분의 후원이 큰 힘이 됩니다

가능하시다면, 감당 가능한 티어로 이 작업을 후원해 주세요:

Bronze 티어
- 보너스 콘텐츠 접근(러스트 코드베이스 등)
- 비공개 디스코드 채널 접근
Silver 티어
- 다른 티어의 모든 혜택, 추가로:
- 영상과 글 사전 공개
- 영상과 글에 이름 크레딧
Gold 티어
- 다른 티어의 모든 혜택, 추가로:
- 영원한 감사

당신만을 위한 또 다른 글:

나는 오래전부터 Rust 컴파일 시간과 전쟁 중이다.

또 다른 일부는, 물론, Rust 자체에 더럽게 손을 담그는 것이었다.

나는 러스트 빌드 성능을 심층 분석한 글, rustc의 self-profiling까지 파고든 Why is my Rust build so slow?을 썼다.

nixpkgs 연재도 한 편 썼고, earthly로 갈아탔다가, 그게 죽어서 다시 떠났고, 이제는… 모두와 마찬가지로 겸허하게 Dockerfile을 쓴다.

syn 없는 자가… ----------------------------- 내가 빌드 시간을 신경 쓰는 가장 큰 이유는 빠르게 반복하고 싶기 때문이다.

그리고 “우리 빌드”라고 한 건, 동지여, 네 프로젝트도 syn에 의존하고 있을 가능성이 매우 높기 때문이다.

내 CMS home은 6개의 서로 다른 경로를 통해 syn 1에 의존한다…



home on  HEAD (2fe6279) via 🦀 v1.89.0-nightly
❯ cargo tree -i syn@1 --depth 1
syn v1.0.109
├── const-str-proc-macro v0.3.2 (proc-macro)
├── lightningcss-derive v1.0.0-alpha.43 (proc-macro)
├── phf_macros v0.10.0 (proc-macro)
├── ptr_meta_derive v0.1.4 (proc-macro)
└── rkyv_derive v0.7.45 (proc-macro)
[build-dependencies]
└── cssparser v0.29.6

…그리고 syn 2에는 무려 25개의 서로 다른 경로로!! 실수가 아니다!



❯ cargo tree -i syn@2 --depth 1        
syn v2.0.101     
├── arg_enum_proc_macro v0.3.4 (proc-macro)
├── async-trait v0.1.88 (proc-macro)
├── axum-macros v0.5.0 (proc-macro)
├── clap_derive v4.5.32 (proc-macro)
├── cssparser-macros v0.6.1 (proc-macro)
├── darling_core v0.20.11
├── darling_macro v0.20.11 (proc-macro)
├── derive_builder_core v0.20.2
├── derive_builder_macro v0.20.2 (proc-macro)
├── derive_more v0.99.20 (proc-macro)
├── displaydoc v0.2.5 (proc-macro)
├── futures-macro v0.3.31 (proc-macro)
├── num-derive v0.4.2 (proc-macro)
├── phf_macros v0.11.3 (proc-macro)
├── profiling-procmacros v1.0.16 (proc-macro)
├── serde_derive v1.0.219 (proc-macro)
├── synstructure v0.13.2
├── thiserror-impl v1.0.69 (proc-macro)
├── thiserror-impl v2.0.12 (proc-macro)
├── tokio-macros v2.5.0 (proc-macro)
├── tracing-attributes v0.1.28 (proc-macro)
├── yoke-derive v0.8.0 (proc-macro)
├── zerofrom-derive v0.1.6 (proc-macro)
├── zeroize_derive v1.4.2 (proc-macro)
└── zerovec-derive v0.11.1 (proc-macro)
[build-dependencies]
└── html5ever v0.27.0

이 크레이트들의 인기에는 그만한 이유가 있다. 너무 유용하니까. 하지만 더 들여다볼수록 만족스럽지 않았다.

그리고 그 이유를 이해하려면, 단형화(monomorphization)에 대해 이야기해야 한다.

단형화 ---------------- 당신에게 타입이 아주 많다고 해보자. API가 있고 JSON 페이로드가 있고, 게다가 카탈로그가 있다:

use chrono::{NaiveDate, NaiveDateTime};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

…그리고 계속된다:

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct CatalogMetadata {
    pub version: String,
    pub region: String,
}

그리고 또 계속:

/// 카탈로그에 표현된 사업체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Business {
    pub id: Uuid,
    pub name: String,
    pub address: Address,
    pub owner: BusinessOwner,
    pub users: Vec<BusinessUser>,
    pub branches: Vec<Branch>,
    pub products: Vec<Product>,
    pub created_at: NaiveDateTime,
}

계속된다. 좋은 감으로, 이 모든 걸 bigapi-types 크레이트에 넣었다고 하자.

그리고 설명을 위해, bigapi-indirection 크레이트에는 다음이 있다:

use bigapi_types::generate_mock_catalog;

pub fn do_ser_stuff() {
    // 모의 카탈로그 생성
    let catalog = generate_mock_catalog();

    // 카탈로그를 JSON으로 직렬화
    let serialized = serde_json::to_string_pretty(&catalog).expect("Failed to serialize catalog!");

    println!("Serialized catalog JSON:\n{}", serialized);

    // 다시 Catalog 구조체로 역직렬화
    let deserialized: bigapi_types::Catalog =
        serde_json::from_str(&serialized).expect("Failed to deserialize catalog");

    println!("Deserialized catalog struct!\n{:#?}", deserialized);
}

마지막으로, do_ser_stuff만 호출하는 애플리케이션 bigapi-cli가 있다:

fn main() {
    println!("About to do ser stuff...");
    bigapi_indirection::do_ser_stuff();
    println!("About to do ser stuff... done!");
}

음, 콜드 디버그 빌드에선 우리의 직관이 맞다:

Minimum duration: 0.10s Shown: 25/34 units

total

bigapi-cli-serde 0.07s

bigapi-indirection-serde 0.19s

bigapi-types-serde 0.29s

chrono 0.46s

serde_json 0.34s

uuid 0.15s

serde 0.80s

serde_derive 0.68s

syn 0.54s

num-traits 0.23s

proc-macro2 0.16s

libc 0.14s

num-traits 0.16s

ryu 0.16s

itoa 0.13s

serde_json 0.35s

core-foundation-sys 0.14s

memchr 0.25s

getrandom 0.36s

libc 0.36s

cfg-if 0.10s

autocfg 0.20s

serde 0.35s

proc-macro2 0.36s

unicode-ident 0.13s

콜드 릴리즈 빌드에서는 전혀 그렇지 않다:

Minimum duration: 0.06s Shown: 24/34 units

total

bigapi-cli-serde 0.07s

bigapi-indirection-serde 1.31s

bigapi-types-serde 0.38s

uuid 0.21s

chrono 0.66s

serde_json 0.46s

serde 0.88s

serde_derive 0.69s

syn 0.55s

quote 0.08s

num-traits 0.30s

proc-macro2 0.16s

libc 0.14s

num-traits 0.07s

iana-time-zone 0.08s

memchr 0.30s

libc 0.15s

ryu 0.19s

getrandom 0.13s

serde_json 0.12s

autocfg 0.13s

core-foundation-sys 0.10s

serde 0.14s

proc-macro2 0.14s

bigapi-indirection을 살짝만 건드려도, 문자열 상수 하나만 바꿔도, 그 비용을 매번 다시 치른다:

Minimum duration: 0.04s Shown: 2/2 units

total

bigapi-cli-serde 0.11s

bigapi-indirection-serde 1.38s

bigapi-types를 건드리면 더 심하다! generate_mock_catalog에서 문자열 값 하나만 바꿨을 뿐인데도, 모든 걸 다시 빌드하게 된다:

Minimum duration: 0.09s Shown: 3/3 units

total

bigapi-cli-serde 0.23s

bigapi-indirection-serde 1.30s

bigapi-types-serde 0.40s

이게 단형화(monomorphization)다: Rust의 모든 제네릭 함수가 인스턴스화된다. T, K, V 같은 제네릭 타입 매개변수가 구체 타입으로 대체된다.

cargo-llvm-lines로 그 빈도를 볼 수 있다:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection | head -15
   Compiling bigapi-indirection v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection)
    Finished `release` profile [optimized] target(s) in 0.71s
  Lines                Copies              Function name
  -----                ------              -------------
  80335                1542                (TOTAL)
   8760 (10.9%, 10.9%)   20 (1.3%,  1.3%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
   3674 (4.6%, 15.5%)    45 (2.9%,  4.2%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
   3009 (3.7%, 19.2%)    11 (0.7%,  4.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
   2553 (3.2%, 22.4%)    37 (2.4%,  7.3%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
   1771 (2.2%, 24.6%)    38 (2.5%,  9.8%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
   1680 (2.1%, 26.7%)    20 (1.3%, 11.1%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
   1679 (2.1%, 28.8%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Product>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1569 (2.0%, 30.7%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Business>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1490 (1.9%, 32.6%)    10 (0.6%, 11.9%)  serde::ser::Serializer::collect_seq
   1316 (1.6%, 34.2%)     1 (0.1%, 11.9%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1302 (1.6%, 35.9%)     1 (0.1%, 12.0%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::UserProfile>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1300 (1.6%, 37.5%)    20 (1.3%, 13.3%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any

--release를 빼면 결과가 조금 달라진다 — 최적화를 하는 건 LLVM만이 아니다!

우리 타입들에 특화된, 서로 다른 제네릭 serde 메서드의 복사본이 40개 정도 있다. 이건 serde를 빠르게 만들지만, 빌드는 느리게 만든다.

그리고 바이너리도 약간 더 커진다:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo build --release
    Finished `release` profile [optimized] target(s) in 0.01s

bigapi on  main [+] via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 30 May 21:16  target/release/bigapi-cli

그래서 무엇으로 serde를 대체하든, 더 빠르지는 않되, 내가 신경 쓰는 다른 특성을 가져야 한다고 결정했다.

예를 들어, 우리 프로그램을 serde 대신 facet을 쓰도록 포크했다고 하자:

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

다음과 같이 바꾼다:

/// 모든 것의 카탈로그를 나타내는 루트 구조체.
#[derive(Facet, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

간접 크레이트는 이제 JSON을 위해 facet-json을, Debug 대신 facet-pretty를 쓴다:

use bigapi_types_facet::generate_mock_catalog;
use facet_pretty::FacetPretty;

pub fn do_ser_stuff() {
    // 모의 카탈로그 생성
    let catalog = generate_mock_catalog();

    // 카탈로그를 JSON으로 직렬화
    let serialized = facet_json::to_string(&catalog);

    println!("Serialized catalog JSON.\n{}", serialized);

    // 다시 Catalog 구조체로 역직렬화
    let deserialized: bigapi_types_facet::Catalog =
        facet_json::from_str(&serialized).expect("Failed to deserialize catalog!");

    println!("Deserialized catalog struct:\n{}", deserialized.pretty());
}

그리고 그 간접 크레이트에 의존하는 새 CLI를 만든다고 하자. 예전의 serde 기반 버전과 어떻게 비교될까?

이미지 2: 긴 문자열 직렬화 벤치마크 — facet-json 351.9µs, serde 460.9µs

샘플 프로그램으로 돌아오면, 상황이 썩 좋지는 않다:



bigapi on  main via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli{,-facet}
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 31 May 08:33  target/release/bigapi-cli
.rwxr-xr-x  2.1M amos 31 May 09:15  target/release/bigapi-cli-facet

우리 프로그램은 전보다 더 커졌다.

그리고 이번에는 왜 그런지 파악하기가 더 어렵다. serde 버전에서 cargo-bloat를 써보면, 코드가 어디로 갔는지 선명하게 보인다:



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli

 File  .text     Size Crate
17.0%  41.6% 351.9KiB bigapi_indirection
13.3%  32.4% 273.9KiB std
 3.5%   8.5%  72.2KiB chrono
 2.2%   5.3%  44.8KiB serde_json
 2.1%   5.2%  44.3KiB bigapi_types
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

하지만 facet 버전에선… std가 주범이라고?



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli-facet

 File  .text     Size Crate
 6.3%  20.7% 326.3KiB std
 5.9%  19.4% 305.5KiB bigapi_types_facet
 3.8%  12.7% 200.0KiB facet_deserialize
 3.8%  12.6% 198.1KiB bigapi_indirection_facet
 2.8%   9.4% 147.9KiB facet_json
 2.6%   8.7% 136.5KiB facet_core
 2.2%   7.1% 112.3KiB chrono
 1.4%   4.8%  75.0KiB facet_reflect
 0.4%   1.3%  21.1KiB facet_pretty
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

그 뒤를 우리 타입 크레이트, facet_deserialize, 우리 indirection 크레이트, 그리고 facet_json, facet_core 등이 잇는다.

흥미롭게도, 코드는 여러 크레이트에 꽤 잘 분산되어 있다. 빌드 시간은 어떨까? 파이프라이닝이 되나?

콜드 디버그 빌드에선 bigapi-types가 더 오래 걸리지만, 다른 빌드를 끝까지 막지는 않는다:

Minimum duration: 0.08s Shown: 33/54 units

total

bigapi-cli-facet 0.09s

bigapi-indirection-facet 0.19s

facet-json 0.21s

facet-pretty 0.12s

facet-deserialize 0.28s

bigapi-types-facet 0.64s

facet-reflect 0.20s

facet-core 0.88s

facet-derive-emit 0.27s

facet-derive-parse 0.23s

chrono 0.46s

uuid 0.12s

unsynn 0.22s

quote 0.10s

time 0.65s

num-traits 0.24s

libc 0.15s

ariadne 0.18s

proc-macro2 0.20s

owo-colors 0.32s

yansi 0.13s

unicode-width 0.10s

deranged 0.24s

mutants 0.13s

byteorder 0.10s

owo-colors 0.15s

bitflags 0.10s

proc-macro2 0.15s

powerfmt 0.09s

getrandom 0.15s

core-foundation-sys 0.09s

libc 0.18s

autocfg 0.15s

Minimum duration: 0.12s Shown: 30/54 units

total

bigapi-cli-facet 0.09s

bigapi-indirection-facet 1.29s

facet-json 0.58s

facet-deserialize 0.66s

facet-pretty 0.47s

bigapi-types-facet 1.09s

facet-reflect 0.35s

facet-derive 0.16s

facet-core 1.18s

facet-derive-emit 0.29s

facet-derive-parse 0.24s

chrono 0.81s

uuid 0.21s

unsynn 0.22s

time 0.98s

num-traits 0.32s

ariadne 0.25s

proc-macro2 0.21s

libc 0.15s

owo-colors 0.37s

yansi 0.18s

unicode-width 0.13s

mutants 0.13s

deranged 0.31s

owo-colors 0.16s

bitflags 0.16s

proc-macro2 0.18s

getrandom 0.18s

libc 0.20s

autocfg 0.17s

그래서, 지금으로선 바이너리가 더 크고 빌드 시간도 더 길다. 그럼 그 대가로 무엇을 얻나?

우선, 눈치챘을지 모르겠지만, Debug 구현을 잃었다: 역직렬화한 데이터를 출력하는 데 Debug를 쓰는 대신 facet-pretty를 쓴다:



bigapi on  main via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...
Serialized catalog JSON.
✂️
Deserialized catalog struct:
/// 모든 것의 카탈로그를 나타내는 루트 구조체.
Catalog {
  id: aa1238fa-8f72-45fa-b5a7-34d99baf4863,
  businesses: Vec<Business> [
    /// 카탈로그에 표현된 사업체.
Business {
      id: 65d08ea7-53c6-42e8-848e-0749d00b7bdd,
      name: Awesome Business,
      address: Address {
        street: 123 Main St.,
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,
        geo: Option<GeoLocation>::Some(GeoLocation {
          latitude: 51,
          longitude: -0.1,
        }),
      },
      owner: BusinessOwner {
        user: User {
          id: 056b3eda-97ca-4c12-883d-ecc043a6f5b4,

일회성 비용으로, 색상까지 포함한 예쁜 포매팅을 얻는다. 민감정보 마스킹도 지원한다!

로그에 번지수(도로명 번호)를 남기고 싶지 않나? 민감하다고 표시하라!

#[derive(Facet, Clone)]
pub struct Address {
    // 👇
    #[facet(sensitive)]
    pub street: String,
    pub city: String,
    pub state: String,
    pub postal_code: String,
    pub country: String,
    pub geo: Option<GeoLocation>,
}



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
✂️
Deserialized catalog struct:
/// 모든 것의 카탈로그를 나타내는 루트 구조체.
Catalog {
  id: 61f70016-eca4-45af-8937-42c03f9a5cd8,
  businesses: Vec<Business> [
    /// 카탈로그에 표현된 사업체.
Business {
      id: 9b52c85b-9240-4e73-9553-5d827e36b5f5,
      name: Awesome Business,
      address: Address {
        street: [REDACTED],
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,

그리고 그게 facet의 핵심 아이디어다: 파생 매크로가 코드를 생성하는 대신 데이터를 생성한다.

글쎄, 런타임에 임의의 값과 상호작용할 수 있도록 가상 테이블도 많이 생성한다. 그리고 그게 cargo-llvm-lines에 나타난다:



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-types-facet | head -15
   Compiling bigapi-types-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-types-facet)
    Finished `release` profile [optimized] target(s) in 0.92s
  Lines                Copies              Function name
  -----                ------              -------------
  80657                3455                (TOTAL)
  29424 (36.5%, 36.5%) 1349 (39.0%, 39.0%) core::ops::function::FnOnce::call_once
   5010 (6.2%, 42.7%)    50 (1.4%, 40.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1990 (2.5%, 45.2%)    70 (2.0%, 42.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}
   1900 (2.4%, 47.5%)   110 (3.2%, 45.7%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}
   1544 (1.9%, 49.4%)    11 (0.3%, 46.0%)  <T as alloc::slice::<impl [T]>::to_vec_in::ConvertVec>::to_vec
   1494 (1.9%, 51.3%)     1 (0.0%, 46.0%)  chrono::format::formatting::DelayedFormat<I>::format_fixed
   1467 (1.8%, 53.1%)    14 (0.4%, 46.5%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1071 (1.3%, 54.4%)    63 (1.8%, 48.3%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{constant}}::{{closure}}
    992 (1.2%, 55.7%)   277 (8.0%, 56.3%)  facet_core::types::value::ValueVTableBuilder<T>::new::{{closure}}
    986 (1.2%, 56.9%)     1 (0.0%, 56.3%)  chrono::format::formatting::write_rfc3339
    681 (0.8%, 57.7%)     1 (0.0%, 56.4%)  bigapi_types_facet::generate_mock_catalog::mock_product
    651 (0.8%, 58.5%)    35 (1.0%, 57.4%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}

바이너리 크기 최적화 관점에서 저기에는 저수준 과일들이 좀 남아 있다고 본다. 처음부터 거기에 두어 시간쯤만 썼고, 그게 전부라 그런 듯하다.

속도로는 serde가 명백한 승자다. 정확한 수치는 최신 벤치마크를 참고하라. 이 글을 쓰는 시점에는, facet-json이 serde-json 대비 3~6배 정도 느리다:

로그 스케일로 보면 그리 나빠 보이지도 않는다!

왜냐면, 최종 사용자 입장에서 보면 둘 다 즉시 끝난다:



❯ hyperfine -N target/release/bigapi-cli-serde target/release/bigapi-cli-facet --warmup 500
Benchmark 1: target/release/bigapi-cli-serde
  Time (mean ± σ):       3.4 ms ±   1.7 ms    [User: 2.3 ms, System: 0.9 ms]
  Range (min … max):     1.8 ms …  10.0 ms    1623 runs

Benchmark 2: target/release/bigapi-cli-facet
  Time (mean ± σ):       4.0 ms ±   1.9 ms    [User: 2.5 ms, System: 1.4 ms]
  Range (min … max):     1.8 ms …  13.7 ms    567 runs

Summary
  target/release/bigapi-cli-serde ran
    1.18 ± 0.82 times faster than target/release/bigapi-cli-facet

웜 빌드는 어떨까? 디버그에선 거의 보이지 않으니, 웜 릴리즈 빌드로 보자 — 우리의 big API는 사실 그리 크지 않다.

bigapi-types-serde를 약간 바꿨을 때:

Minimum duration: 0.09s Shown: 3/3 units

total

bigapi-cli-serde 0.23s

bigapi-indirection-serde 1.30s

bigapi-types-serde 0.40s

bigapi-types-facet을 약간 바꿨을 때는:

Minimum duration: 0.05s Shown: 3/3 units

total

bigapi-cli-facet 0.13s

bigapi-indirection-facet 1.26s

bigapi-types-facet 1.10s

사실상 비슷한 상황이다 — 시간이 대략 같다.

unsynn 글에서처럼 -j1을 쓰면 상황은 더 나빠진다.

Minimum duration: 0.03s Shown: 3/3 units

total

bigapi-cli-serde 0.08s

bigapi-indirection-serde 1.93s

bigapi-types-serde 0.50s

Minimum duration: 0.05s Shown: 3/3 units

total

bigapi-cli-facet 0.13s

bigapi-indirection-facet 4.34s

bigapi-types-facet 1.74s

헤이, 내 크레이트를 좋아 보이게 만드는 트릭만 쓸 수는 없다.

다시 cargo-llvm-lines를 보면:



bigapi on  main [!+?⇡] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection-facet | head -10
   Compiling bigapi-indirection-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection-facet)
    Finished `release` profile [optimized] target(s) in 1.29s
  Lines                 Copies              Function name
  -----                 ------              -------------
  129037                4066                (TOTAL)
   33063 (25.6%, 25.6%) 1509 (37.1%, 37.1%) core::ops::function::FnOnce::call_once
    8247 (6.4%, 32.0%)     3 (0.1%, 37.2%)  facet_deserialize::StackRunner<C,I>::set_numeric_value
    6218 (4.8%, 36.8%)     1 (0.0%, 37.2%)  facet_pretty::printer::PrettyPrinter::format_peek_internal
    5279 (4.1%, 40.9%)     1 (0.0%, 37.2%)  facet_deserialize::StackRunner<C,I>::pop
    5010 (3.9%, 44.8%)    50 (1.2%, 38.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
    3395 (2.6%, 47.4%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::object_key_or_object_close
    2803 (2.2%, 49.6%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::value

단형화가 아니다 — 우리는 파생 매크로가 생성한 데이터를 사용한다. 예컨대 StructType 같은 구조체로:

#[non_exhaustive]
#[repr(C)]
pub struct StructType<'shape> {
    pub repr: Repr,
    pub kind: StructKind,
    pub fields: &'shape [Field<'shape>],
}

각 필드는 오프셋과, 그만의 shape를 가진다:

#[non_exhaustive]
#[repr(C)]
pub struct Field<'shape> {
    pub name: &'shape str,
    pub shape: &'shape Shape<'shape>,
    pub offset: usize,
    pub flags: FieldFlags,
    pub attributes: &'shape [FieldAttribute<'shape>],
    pub doc: &'shape [&'shape str],
    pub vtable: &'shape FieldVTable,
    pub flattened: bool,
}

물론, 임의의 메모리 위치를 읽고 쓰려면 unsafe 코드가 필요하다.

그래서 그 위에 facet-reflect라는 안전한 레이어가 있다. 예컨대 이런 구조체를 따라가며 값을 들여다볼 수 있다:

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct Secrets {
    github: OauthCredentials,
    gitlab: OauthCredentials,
}

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct OauthCredentials {
    client_id: String,
    #[facet(sensitive)]
    client_secret: String,
}

…어떤 필드를 추출하는 식으로:

fn extract_client_secret<'shape>(peek: Peek<'_, '_, 'shape>) -> Result<(), Error> {
    let secret = peek
        .into_struct()?
        .field_by_name("github")?
        .into_struct()?
        .field_by_name("clientSecret")?
        .to_string();
    eprintln!("got your secret! {secret}");
    Ok(())
}

fn main() {
    let secrets: Secrets = facet_json::from_str(SAMPLE_PAYLOAD).unwrap();
    extract_client_secret(Peek::new(&secrets)).unwrap()
}



facet-demo on  main [!+] via 🦀 v1.87.0
❯ cargo run
   Compiling facet-demo v0.1.0 (/Users/amos/facet-rs/facet-demo)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/facet-demo`
got your secret! cs_5678

이미지 5: 멋쟁이 곰

facet은 rename과 rename_all을 지원한다. 그리고 그건 직렬화 레벨이 아니라, 리플렉션 레벨에서 지원된다.

이미지 6: 잘난 척 곰

flatten도 지원한다!

쓰기(write) 측면에서는, facet-reflect로 객체를 아예 처음부터 만들 수 있다:

fn fill_secrets(shape: &'static Shape<'static>) -> Result<(), Error> {
    let mut partial = Partial::alloc_shape(shape)?;
    let facet::Type::User(UserType::Struct(sd)) = shape.ty else {
        todo!()
    };
    for (i, field) in sd.fields.iter().enumerate() {
        eprintln!(
            "Generating {} for {}",
            field.shape.bright_yellow(),
            field.name.blue()
        );
        partial
            .begin_nth_field(i)?
            .set_field("clientId", format!("{}-client-id", field.shape))?
            .set_field("clientSecret", format!("{}-client-secret", field.shape))?
            .end()?;
    }
    let heapval = partial.build()?;
    print_secrets(heapval);

    Ok(())
}

partial
    .begin_nth_field(i)?
    .set(OauthCredentials {
        client_id: format!("{}-client-id", field.shape),
        client_secret: format!("{}-client-secret", field.shape),
    })?
    .end()?;

결국, 프로그램의 어느 부분을 정적으로 알고 있고, 어느 부분을 모르는지가 관건이다.

이걸 보면서 여러 가지가 떠오른다: 디버그 출력은 물론이고, 구조적 로깅(예: tracing), 테스트용 모의 데이터 생성 등등.

직렬화라는 사용례만 놓고 보더라도 흥미로운 점이 많다.

예를 들어 — serde-json은 재귀적이다.

나처럼 마음속에 어둠이 있다면, serde-json의 스택을 폭발시키는 프로그램을 만드는 건 비교적 쉽다.

먼저 꽤 큰 구조체가 필요하다…

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

…그 다음엔 중첩된 JSON을 잔뜩 생성한다…

fn generate_nested_json(depth: usize) -> String {
    fn build_layer(remaining_depth: usize) -> String {
        if remaining_depth == 0 {
            return "null".to_string();
        }

        format!("{{\"next\":{}}}", build_layer(remaining_depth - 1))
    }

    build_layer(depth)
}

그리고 serde-json으로 파싱한다!

fn main() {
    let depth = 110;
    let json = generate_nested_json(depth);
    let layer: Layer = serde_json::from_str(&json).unwrap();
    let mut count = 0;
    let mut current_layer = &layer;
    while let Some(next_layer) = &current_layer.next {
        count += 1;
        current_layer = next_layer;
    }
    println!("Layer count: {}", count);
}

그리고 붐, 스택 오버플로:



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/deepdeep`

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run' terminated by signal SIGABRT (Abort)

릴리즈에선 코드젠이 더 효율적이라 패딩을 좀 더 늘려야 한다:

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    _padding2: Option<[[f32; 32]; 32]>,
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

하지만 결과는 비슷하다!



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo build --release && lldb ./target/release/deepdeep
    Finished `release` profile [optimized] target(s) in 0.00s
(lldb) target create "./target/release/deepdeep"
Current executable set to '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64).
(lldb) r
Process 44914 launched: '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64)
Process 44914 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
    frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587:
->  0x100005640 <+36>: str    xzr, [sp], #-0x20
    0x100005644 <+40>: ldp    x10, x8, [x0, #0x20]
    0x100005648 <+44>: cmp    x8, x10
    0x10000564c <+48>: b.hs   0x100005720    ; <+260>
Target 0: (deepdeep) stopped.

LLDB는 친절하게 스택 트레이스를 보여준다:



(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
  * frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
    frame #1: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #2: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #3: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #4: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #5: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #6: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #7: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #8: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #9: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #10: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #11: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #12: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #13: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #14: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #15: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    ✂️
    frame #197: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #198: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #199: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #200: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #201: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #202: 0x00000001000008e4 deepdeep`serde_json::de::from_trait::h96b8ac2f4e672a8e + 92
    frame #203: 0x0000000100005abc deepdeep`deepdeep::main::hb66396babb66c58d + 80
    frame #204: 0x0000000100005400 deepdeep`std::sys::backtrace::__rust_begin_short_backtrace::h52797e85990f16c6 + 12
    frame #205: 0x00000001000053e8 deepdeep`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h66924f9d4742b572 + 16
    frame #206: 0x0000000100021d48 deepdeep`std::rt::lang_start_internal::hdff9e551ec0db2ea + 888
    frame #207: 0x0000000100005c28 deepdeep`main + 52
    frame #208: 0x000000019ecaeb98 dyld`start + 6076
(lldb) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

그 문제를 피하려고, 그리고 어차피 더 느릴 테니, facet-json은 대신 반복(iterative) 접근을 택했다.

우리는 Deserialize 대신 Facet을 파생하고, 기본값 필드를 기본값으로 표시해야 한다(현재 Option에 대한 암묵적 동작은 없다):

use facet::Facet;

#[derive(Facet)]
struct Layer {
    #[facet(default)]
    _padding1: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding2: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

그리고 facet-json의 from_str를 쓰면 끝이다:

let layer: Layer = facet_json::from_str(&json).unwrap();

작동한다:



deepdeep-facet on  main [!] via 🦀 v1.87.0
❯ cargo run --release
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `target/release/deepdeep`
Layer count: 109

그리고 재미있는 사실? 크래시하는 serde-json 버전보다, 제대로 동작하는 facet 버전이 더 빠르다:



~/facet-rs
❯ hyperfine --warmup 2500 -i -N deepdeep/target/release/deepdeep deepdeep-facet/target/release/deepdeep
Benchmark 1: deepdeep/target/release/deepdeep
  Time (mean ± σ):       2.3 ms ±   0.7 ms    [User: 0.7 ms, System: 0.9 ms]
  Range (min … max):     1.5 ms …   5.0 ms    1685 runs

  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: deepdeep-facet/target/release/deepdeep
  Time (mean ± σ):       1.4 ms ±   0.2 ms    [User: 0.6 ms, System: 0.5 ms]
  Range (min … max):     1.3 ms …   2.9 ms    1237 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (2.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

Summary
  deepdeep-facet/target/release/deepdeep ran
    1.66 ± 0.58 times faster than deepdeep/target/release/deepdeep

아무 의미 없겠지만, 아마도 macOS의 페이지 폴트가 느려서 그런 걸 거다. 그래도 웃겼다.

앞서 봤듯, facet-json은 오늘 기준으로 재귀 대신 반복을 택했고, 그 대가가 있다. 하지만 더 빠른 재귀 구현을 누군가 내는 데 아무 제약도 없다.

우리는 아직 SIMD를 쓰지 않는다. 하지만 누군가는 써야 한다! JSON 디코딩에 테이프 지향(tape-oriented) 접근을 하지는 않지만, 꽤 멋지다고 들었다!

나는 기본 facet-json 구현이 유연함을 유지하길 바란다 — 오류 메시지도 멋지다는 얘기 했던가?



bigapi on  main [!⇡] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...

thread 'main' panicked at bigapi-indirection-facet/src/lib.rs:17:43:
Failed to deserialize catalog!: WARNING: Input was truncated for display. Byte indexes in the error below do not match original input.
Error:
   ╭─[ json:1:82 ]
   │
 1 │ …t":"2025-05-31T10:06:35"}],"created_at":"2025-05-31T10:06:3_","metadata":{"version":"1.0.b1!","region":"US"}}
   │                                          ──────────┬──────────
   │                                                    ╰──────────── Operation failed on shape NaiveDateTime: Failed to parse string value
───╯

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

여기서 길게 머무르고 싶진 않지만, 우리의 혈세… 아니, 빌드 분(分)이 어디에 쓰이는지 아는 건 좋은 일이다, 알겠지?

또, 비동기 I/O를 지원하는 방법도 있다고 생각한다. 왜 안 되겠나? 모든 상태가 이미 힙에, 핀 박혀 있다 — 어떤 async 런타임도 그걸 꺼릴 이유가 없다.

이건 리플렉션으로 할 수 있는 일들의 빙산의 일각이다.

할 일은 아직 많고, churn도 현실이다 — 격주로 대규모 리라이트가 있었다 — 그래도 잠재력은 어마어마하다고 생각한다. 함께 해킹하자! 재밌다!

후원자분들께 감사드립니다:

여러분의 후원이 큰 힘이 됩니다

가능하시다면, 감당 가능한 티어로 이 작업을 후원해 주세요:

Bronze 티어
- 보너스 콘텐츠 접근(러스트 코드베이스 등)
- 비공개 디스코드 채널 접근
Silver 티어
- 다른 티어의 모든 혜택, 추가로:
- 영상과 글 사전 공개
- 영상과 글에 이름 크레딧
Gold 티어
- 다른 티어의 모든 혜택, 추가로:
- 영원한 감사

당신만을 위한 또 다른 글:

facet 소개: Rust를 위한 리플렉션

여러분의 후원이 큰 힘이 됩니다

관련 추천 글

unsynn의 미덕

리플렉션과 컴파일 타임 - Rust 프로젝트 목표

-Zno-embed-metadata 옵션으로 Cargo target 디렉터리 크기 줄이기

빌드 시스템의 트레이드오프

여러분의 후원이 큰 힘이 됩니다

관련 추천 글

unsynn의 미덕

리플렉션과 컴파일 타임 - Rust 프로젝트 목표

-Zno-embed-metadata 옵션으로 Cargo target 디렉터리 크기 줄이기

빌드 시스템의 트레이드오프