JS 성능을 최대 60배까지 개선하기 위한 단형성 이해하기

I tweeted about polymorphism and a lot of you were asking for a more in-depth dive into it. So let’s get to it.

Virtual Machine

가장 중요하게 이해해야 할 것은 가상 머신(VM)이라는 개념이다. virtual이라는 단어는 프로그램이 실행되는 “머신”이 물리적인 것이 아니라 에뮬레이션된 것임을 의미한다. 이는 물리적 CPU의 실행 모델이 너무 제한적이어서 모듈, 클로저, 객체 같은 복잡한 구성 요소를 가진 JavaScript와 같은 고수준 언어를 실행할 수 없다는 뜻이다.

이를 처리하는 방법은 두 가지다. 고수준 개념(객체/클로저 등)을 CPU가 실행할 수 있는 저수준 개념(메모리 접근과 서브루틴)으로 매핑하는 컴파일러를 사용하는 것이다. 이 변환이 미리(사전에) 수행되면 컴파일러라고 부른다. 애플리케이션이 실행되는 동안 변환이 일어나면 JITing이라고 부른다(JIT는 _Just In Time_의 약자로 Just-In-Time compilation을 뜻한다). 그리고 우리가 VM을 실행하는 환경을 이렇게 부른다.

Classes

VM이 물리 CPU로 매핑해야 하는 것들은 많지만, 이 글에서는 객체/클래스에 대해 이야기한다. 앞으로는 배열(배열도 객체이지만)과 구분하기 위해 객체를 그냥 클래스라고 부르겠다. 배열은 다르게 취급되기 때문이다. 그러므로 이 논의에서 객체 리터럴은 클래스로 본다.

const obj = {
  url: 'http://buildre.io',
  desc: 'Visual CMS',
};

우리는 이것을 클래스로 생각하지 않지만, 위의 값은 JavaScript VM 내부에서는 클래스로 저장된다. 클래스의 핵심 개념은 프로퍼티 접근을 통해 내용에 접근할 수 있다는 것이다.

obj.url // returns http://builder.io``

CPUs don’t understand classes

이해해야 할 점은 물리 CPU에는 클래스라는 개념이 없다는 것이다. 클래스는 VM이 CPU가 이해할 수 있도록 저수준 개념으로 변환해야 하는 더 고수준의 구성 요소다.

CPU는 오직 어떤 주소의 메모리를 읽거나, 주소로부터 어떤 오프셋만큼 떨어진 메모리를 읽는 것만 이해한다. 이는 배열과 완전히 같지는 않지만, 배열은 매우 좋은 비유다. 그래서 CPU 레벨에서 무슨 일이 벌어지는지 설명하기 위해 메모리 접근을 배열 접근으로 취급하겠다.

VMs downlevel

VM의 목적은 CPU가 클래스 개념을 이해하는 것 “처럼” 동작하도록, 클래스 개념을 배열 접근으로 내리는(downlevel) 것이다.

// Let's say we want to store these objects in a VM.
const builder = {
  url: "https://builder.io",
  desc: "Visual CMS",
};
const qwik = {
  url: "https://qwik.builder.io",
  desc: "Instant on web apps",
};

// Let's define a ClassShape which contains
// information about the shape of the object.
// This is greatly simplified from the actual
// implementation.
type ClassShape = string[];

// Define a type for an object. The first element
// is always a reference to the ClassShape the reminder
// are the values of the object.
type Object = [ClassShape, ...any];

// The ClassShape can store the offset and the
// property name information. The VM creates these
// on the fly for each possible object shape.
const vmShape1: ClassShape = ["url", "desc"];

// Now that the ClassShape is defined, we can use it
// create an object like so.
const vmBuilder: Object = [
  vmShape1, // <-- ClassShape always at location [0]
  "https://builder.io", // <-- The `url` property is at index 1
  "Visual CMS", // <-- The `desc` property is at index 2
];

const vmQwik: Object = [
  vmShape1, // Notice that this object is of the same shape
  "https://qwik.builder.io",
  "Instant on web apps",
];

Property access

위에서는 객체가 배열로 저장되는 방식을 설명했다. 이제 프로퍼티 접근에 대해 이야기해보자. VM의 일은 프로퍼티 접근을 배열 접근으로 변환하는 것이며, 방식은 다음과 같다.

// Let's say you want to read these properties
const url1 = builder.url;
const url2 = qwik.url;

// The VM will translate it into something like this.
const url1 = vmBuilder[vmBuilder[0].indexOf("url") + 1];
const url2 = vmQwik[vmQwik[0].indexOf("url") + 1];

VM이 먼저 위치 0에서 ClassShape를 읽고, indexOf()를 통해 특정 프로퍼티를 찾는 것에 주목하라. 얻어진 인덱스는 ClassShape를 보정하기 위해 +1이 더해지고, 그 결과로 실제 값을 읽는다. (쓰기에도 비슷한 과정을 사용할 수 있다.)

NOTE: 실제 indexOf() 구현은 위 코드가 암시하는 것처럼 O(n)이 아니라 O(1)에 가깝고, 위 코드는 단순하게 만들기 위해 크게 단순화되어 있다.

Getting the property index is slow

indexOf() 호출은 핫 패스에 있으며 느리다. (_핫 패스_는 프로그램에서 자주 실행되거나 중요한 부분을 뜻한다.) 강타입 언어는 컴파일 타임에 프로퍼티 이름을 인덱스로 변환할 수 있지만, 구조적 타입(덕 타이핑) 언어는 그럴 수 없으므로 런타임에 수행해야 한다.

기억하겠지만 V8이 2008년에 처음 등장했을 때 다른 JavaScript 엔진보다 훨씬 빨랐다. 그 이유 중 하나는 V8이 JavaScript 세계에 인라인 캐시를 도입했기 때문이다(인라인 캐싱은 JVM과 Smalltalk에서 오랫동안 사용되어 왔으므로 혁신이라기보다는 적용에 가깝다).

Inline-cache

V8은 JIT을 사용한다. 즉 V8은 일반 모드로 JavaScript를 실행하고, 그 과정에서 애플리케이션이 어떻게 동작하는지에 대한 정보를 수집한다. 어떤 함수가 핫 패스에 있다고 판단되면, 일반 실행 모드에서 수집한 가정들의 집합을 바탕으로 그 함수를 다시 컴파일한다. 이러한 가정은 JITter가 지름길을 택하게 해주며, 성능을 크게 개선한다.

[{
  url: "https://builder.io",
  desc: "Visual CMS",
}, {
  url: "https://qwik.builder.io",
  desc: "Instant on web apps",
}].map((obj) => obj.url)

일반 모드로 실행하는 동안:

const vmShape1: ClassShape = ["url", "desc"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape1, "https://qwik.builder.io","Instant on web apps",],
].map((obj) => obj[obj[0].indexOf('url')])

일반 실행이 학습하는 것은 obj[0]의 shape이 항상 vmShape1이라는 점이며, 그래서 출력을 훨씬 효율적인 형태로 JIT할 수 있다.

const vmShape1: ClassShape = ["url", "desc"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape1, "https://qwik.builder.io","Instant on web apps"],
].map((obj) => obj[
    obj[0] === vmShape1 ? 1 : // if the class shape is vmShape1. then we know 1
      obj[0].indexOf('url')  // otherwise fallback to slow path
  ]
)

이 한 가지 트릭만으로도 프로퍼티 읽기를 60배까지 빠르게 만들 수 있다! 엄청난 성능 향상이다.

Multiple shapes

하지만 현실 세계는 좀 더 복잡하다. 위 코드는 map() 함수에 들어오는 객체들의 shape이 항상 같을 때만 빠르다. 서로 다른 객체 shape이 있다면 어떻게 될까?

[{
  url: "https://builder.io",
  desc: "Visual CMS",
}, {
  isOSS: true,
  url: "https://qwik.builder.io",
}].map((obj) => obj.url)

이 경우 변환은 조금 더 복잡해진다.

// Notice two different object shapes
const vmShape1: ClassShape = ["url", "desc"];
const vmShape2: ClassShape = ["isOSS", "url`"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape2, true, "https://qwik.builder.io"],
].map((obj) => obj[
    obj[0] === vmShape1 ? 1 :   // inline cache for vmShape1
      obj[0] === vmShape2 ? 2 : // inline cache for mvShape2
        obj[0].indexOf('url')  // otherwise fallback to slow path
  ]
)

대부분의 VM은 느린 경로로 떨어지기 전에 최대 4개의 서로 다른 shape까지 인라인해준다.

Monomorphism, Polymorphism, and Megamorphism

VM은 이러한 동작에 이름을 붙인다. 단형성(monomorphic), 다형성(polymorphic), 거대형성(megamorphic) 프로퍼티 읽기다. 정의를 좀 더 자세히 해보자.

Monomorphic: 프로퍼티 읽기가 항상 같은 class shape을 얻는다면, 그 프로퍼티 읽기 지점(site)은 단형성이라고 한다.
Polymorphic: 프로퍼티 읽기가 소수(4개 이하)의 서로 다른 class shape을 얻고 인라인 캐시로 처리될 수 있다면, 그 프로퍼티 읽기 지점은 다형성이라고 한다.
Megamorphic: 프로퍼티 읽기 지점이 4개를 초과하는 서로 다른 class shape을 보게 되면 거대형성이라고 하며, 항상 느린 경로로 처리된다.

Megamorphic cache

VM에는 2차 캐시도 있다. indexOf() 메서드는 VM 내부에서 일어나는 일을 거칠게 근사한 것이다. 실제 메서드 구현은 indexOf()가 암시하는 것처럼 O(n)이 아니라 O(1)이다.

indexOf() 메서드 또한 마지막 N개의 프로퍼티-ClassShape 읽기(또는 쓰기)를 추적하는 캐시를 사용한다. V8에서 N은 1024이므로, 최근에 본 적이 있는 프로퍼티-ClassShape에 대해 indexOf()를 호출하면, 최근에 본 적이 없는 조합에 호출할 때보다 훨씬 빠르게 응답할 수 있다.

Real-world numbers

그렇다면 이 모든 것이 얼마나 큰 차이를 만들까? 실제로 상당히 크다! 다음 벤치마크와 결과를 보자.

Image 1: A chart showing Polymorphism Slowdown. The X-axis shows the number of class shapes, the Y-axis shows the slowdown multiple.

단형성(monomorphic) 경우의 비용을 1이라고 가정하자. 다형성의 정도가 1-4로 증가해도 인라인 캐시가 따라갈 수 있기 때문에 응답이 꽤 좋고, 4-way 인라인 캐시는 1-way 인라인 캐시(이상적인 경우)보다 40% 정도만 느리다. 하지만 4-way를 넘어가면 VM은 포기하고 indexOf()로 되돌아간다.

다만 indexOf()에는 거대형성 캐시가 있으므로, 1-way 캐시보다 약 3.5배 느리긴 하지만 여전히 성능이 꽤 괜찮다. 그러나 shape이 1000개로 늘어나면 성능이 떨어지기 시작하는데, 이는 거대형성 캐시가 넘치면서 최악의 경우로 되돌아가기 때문이다. 최악의 경우는 거대형성 경우보다 60배 느리다. 이는 매우 큰 차이다.

Microbenchmarks

사람들은 자신의 주장을 증명하기 위해 마이크로 벤치마크를 쓰는 것을 좋아한다. 하지만 조심해야 한다. 많은 마이크로 벤치마크는 실제 사용 사례를 반영하지 못한다. 대부분의 실패 원인은 함수에 흘려보내는 class shape의 개수가 현실을 반영하지 않기 때문이다.

이는 인라인 캐시가 동작하면서 코드가 실제보다 훨씬 빠르게 보이게 만든다. 그리고 더 많은 shape을 넣는다 하더라도, 마이크로 벤치마크에서 거대형성 캐시를 오버플로우시키는 경우는 드물다.

Conclusion

CPU는 우리가 고수준 프로그래밍 언어에서 당연하게 여기는 많은 개념을 이해하지 못한다. VM의 역할은 CPU가 이해하는 저수준 프리미티브로부터 고수준 개념을 에뮬레이션하는 것이다.

이러한 에뮬레이션 중 상당수는 비용이 크기 때문에, VM은 코드 실행을 빠르게 만드는 트릭들을 사용한다. 이런 트릭을 이해하면 언제 코드가 빠르고 언제 느려질 수 있는지 더 잘 이해할 수 있다.

가장 큰 주의점은 마이크로 벤치마크를 액면 그대로 받아들이는 것이다. 현실적인 실제 시나리오에서는 더 많은 class shape이 나타나면서, 마이크로 벤치마크에서 보이던 성능이 사라지는 경우가 많다.

I tweeted about polymorphism and a lot of you were asking for a more in-depth dive into it. So let’s get to it.

Virtual Machine

Classes

const obj = {
  url: 'http://buildre.io',
  desc: 'Visual CMS',
};

obj.url // returns http://builder.io``

CPUs don’t understand classes

VMs downlevel

VM의 목적은 CPU가 클래스 개념을 이해하는 것 “처럼” 동작하도록, 클래스 개념을 배열 접근으로 내리는(downlevel) 것이다.

// Let's say we want to store these objects in a VM.
const builder = {
  url: "https://builder.io",
  desc: "Visual CMS",
};
const qwik = {
  url: "https://qwik.builder.io",
  desc: "Instant on web apps",
};

// Let's define a ClassShape which contains
// information about the shape of the object.
// This is greatly simplified from the actual
// implementation.
type ClassShape = string[];

// Define a type for an object. The first element
// is always a reference to the ClassShape the reminder
// are the values of the object.
type Object = [ClassShape, ...any];

// The ClassShape can store the offset and the
// property name information. The VM creates these
// on the fly for each possible object shape.
const vmShape1: ClassShape = ["url", "desc"];

// Now that the ClassShape is defined, we can use it
// create an object like so.
const vmBuilder: Object = [
  vmShape1, // <-- ClassShape always at location [0]
  "https://builder.io", // <-- The `url` property is at index 1
  "Visual CMS", // <-- The `desc` property is at index 2
];

const vmQwik: Object = [
  vmShape1, // Notice that this object is of the same shape
  "https://qwik.builder.io",
  "Instant on web apps",
];

Property access

// Let's say you want to read these properties
const url1 = builder.url;
const url2 = qwik.url;

// The VM will translate it into something like this.
const url1 = vmBuilder[vmBuilder[0].indexOf("url") + 1];
const url2 = vmQwik[vmQwik[0].indexOf("url") + 1];

NOTE: 실제 indexOf() 구현은 위 코드가 암시하는 것처럼 O(n)이 아니라 O(1)에 가깝고, 위 코드는 단순하게 만들기 위해 크게 단순화되어 있다.

Getting the property index is slow

Inline-cache

[{
  url: "https://builder.io",
  desc: "Visual CMS",
}, {
  url: "https://qwik.builder.io",
  desc: "Instant on web apps",
}].map((obj) => obj.url)

일반 모드로 실행하는 동안:

const vmShape1: ClassShape = ["url", "desc"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape1, "https://qwik.builder.io","Instant on web apps",],
].map((obj) => obj[obj[0].indexOf('url')])

일반 실행이 학습하는 것은 obj[0]의 shape이 항상 vmShape1이라는 점이며, 그래서 출력을 훨씬 효율적인 형태로 JIT할 수 있다.

const vmShape1: ClassShape = ["url", "desc"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape1, "https://qwik.builder.io","Instant on web apps"],
].map((obj) => obj[
    obj[0] === vmShape1 ? 1 : // if the class shape is vmShape1. then we know 1
      obj[0].indexOf('url')  // otherwise fallback to slow path
  ]
)

이 한 가지 트릭만으로도 프로퍼티 읽기를 60배까지 빠르게 만들 수 있다! 엄청난 성능 향상이다.

Multiple shapes

[{
  url: "https://builder.io",
  desc: "Visual CMS",
}, {
  isOSS: true,
  url: "https://qwik.builder.io",
}].map((obj) => obj.url)

이 경우 변환은 조금 더 복잡해진다.

// Notice two different object shapes
const vmShape1: ClassShape = ["url", "desc"];
const vmShape2: ClassShape = ["isOSS", "url`"];
[
  [vmShape1, "https://builder.io", "Visual CMS"],
  [vmShape2, true, "https://qwik.builder.io"],
].map((obj) => obj[
    obj[0] === vmShape1 ? 1 :   // inline cache for vmShape1
      obj[0] === vmShape2 ? 2 : // inline cache for mvShape2
        obj[0].indexOf('url')  // otherwise fallback to slow path
  ]
)

대부분의 VM은 느린 경로로 떨어지기 전에 최대 4개의 서로 다른 shape까지 인라인해준다.

Monomorphism, Polymorphism, and Megamorphism

VM은 이러한 동작에 이름을 붙인다. 단형성(monomorphic), 다형성(polymorphic), 거대형성(megamorphic) 프로퍼티 읽기다. 정의를 좀 더 자세히 해보자.

Monomorphic: 프로퍼티 읽기가 항상 같은 class shape을 얻는다면, 그 프로퍼티 읽기 지점(site)은 단형성이라고 한다.
Polymorphic: 프로퍼티 읽기가 소수(4개 이하)의 서로 다른 class shape을 얻고 인라인 캐시로 처리될 수 있다면, 그 프로퍼티 읽기 지점은 다형성이라고 한다.
Megamorphic: 프로퍼티 읽기 지점이 4개를 초과하는 서로 다른 class shape을 보게 되면 거대형성이라고 하며, 항상 느린 경로로 처리된다.

Megamorphic cache

Real-world numbers

그렇다면 이 모든 것이 얼마나 큰 차이를 만들까? 실제로 상당히 크다! 다음 벤치마크와 결과를 보자.

Image 1: A chart showing Polymorphism Slowdown. The X-axis shows the number of class shapes, the Y-axis shows the slowdown multiple.

JS 성능을 최대 60배까지 개선하기 위한 단형성 이해하기

Virtual Machine

Classes

CPUs don’t understand classes

VMs downlevel

Property access

Getting the property index is slow

Inline-cache

Multiple shapes

Monomorphism, Polymorphism, and Megamorphism

Megamorphic cache

Real-world numbers

Microbenchmarks

Conclusion

관련 추천 글

모노모피즘을 이해해 JS 성능을 최대 60배까지 개선하기

언어 인터프리터와 가상 머신은 왜 느린가

빠른 동적 언어 인터프리터를 만드는 방법

Python 3.14 테일 콜 인터프리터의 성능

Virtual Machine

Classes

CPUs don’t understand classes

VMs downlevel

Property access

Getting the property index is slow

Inline-cache

Multiple shapes

Monomorphism, Polymorphism, and Megamorphism

Megamorphic cache

Real-world numbers

Microbenchmarks

Conclusion

관련 추천 글

모노모피즘을 이해해 JS 성능을 최대 60배까지 개선하기

언어 인터프리터와 가상 머신은 왜 느린가

빠른 동적 언어 인터프리터를 만드는 방법

Python 3.14 테일 콜 인터프리터의 성능