시스템 프로그래밍 시작하기, 1부: 프로그래머는 프로그램을 작성한다

Efron Licht의 소프트웨어 글

2025년 3월

이 글은 시스템 프로그래밍 기초를 다루는 4부작 글 중 첫 번째다. 비트 조작, 파싱, 파일시스템, 입력/출력, 시스템 콜, 메모리 관리, 시그널 등 핵심 요소들을 많이 다룰 것이다. 내 글 시리즈가 대개 그렇듯, 이것은 포괄적인 안내서라기보다는 잡동사니 주머니(grab bag)에 가깝다. 그래도 도움이 되었으면 한다.

마지막으로 처음부터 프로그램을 작성해 본 게 언제인가? 놀랄 만큼 많은 프로그래머에게 그 대답은 ‘학교에서’다. 이는 업계 전반에 퍼져 있는 문제이고, 점점 더 심해지고 있다. 나는 후보자 면접을 많이 보는데, ‘테크 리드 @ 테슬라’(혹은 더 나쁘게는 Principal Engineer) 같은 직함을 달고도 종이봉투에서 빠져나오는 것조차 못 할 정도로 프로그래밍을 못하는 사람을 만난 적이 있다. 내 단골 면접 질문은 “grep을 작성해 보세요”인데, 이 문제는 원래라면 1~2학년 컴퓨터과학 학생에게도 적당해야 한다. 그런데 압도적으로 많은 후보가 이걸 실패한다.

나는 그들이 멍청하다고 생각하지 않는다. 보통 그렇지 않다. 다만 그들은 “진짜로” 프로그래밍을 하는 데 필요한 프로그래밍 기본기—시스템 프로그래밍 기본기—가 부족하다. 이는 후보자에게도, 업계에도, 그리고 점점 더 컴퓨터화되는 우리 세계에도 좋지 않다. 왜냐고? 신뢰할 수 있고 직관적이며 효율적인 소프트웨어를 만드는 일은 복잡성을 최소화하는 데 달려 있기 때문이다. 기존 프로그램에 그저 덧붙이기만 할 수 있다면, 이미 존재하는 복잡성에 접착제로 붙잡혀 있게 된다. 처음부터 프로그램을 작성할 수 없다면, 남의 코드와 남의 실수로 가득한 세계에 갇힌다.

무언가를 잘하게 되는 방법은 직접 하는 것이다. 투수는 공을 던지고, 화가는 그림을 그리고, 프로그래머는 프로그래밍한다. 그러니 이 글은 프로그램을 작성하는 것에 관한 글이 될 것이다—수십 개의 프로그램을 말이다. 따라서 이 글의 텍스트를 읽는 것만으로도 뭔가를 배우겠지만, 진짜로 최대한 얻어가려면 프로그램들을 이해해야 한다. 연습에 도움이 되도록 과제도 제공했다.

스타일 & 환경에 대한 참고

가능한 한 이 시리즈의 코드는 라이브러리를 최소한으로 사용할 것이다. 라이브러리를 쓰지 말라는 뜻이 아니다. 다만 라이브러리가 필요하지 않아야 한다는 뜻이다. 단순한 원시(primitives)만으로도 실용적인 도구를 만들 수 있다는 것을 보여주고 싶다.

이 글 전반에 걸쳐 여러 코드 블록이 나온다. 이들은 Go 프로그램이거나 bash 셸 스크립트다. Python, Javascript, C 같은 주류 언어 경험이 있다면 따라올 수 있을 텐데, 포인터는 조금 복습해 두는 편이 좋을 것이다.

Go는 주석에 //를 쓰고, bash와 python은 #를 쓴다. 각 코드 블록은 // filename.go 또는 #!/usr/bin/env bash처럼 언어를 나타내는 주석으로 시작하겠다.

파이썬 프로그래머를 위한 참고

이 시리즈의 많은 프로그램에 대해 파이썬 구현도 제공했다. gitlab의 articles/startingsystems/cmd/pythonports 디렉터리를 보라.

각 Go 프로그램 머리 부분에 가능한 한 해당 파일 링크를 걸어두겠다. Go 프로그램이 항상 ‘정본(canonical)’으로 취급되어야 한다. 다음 글에서도 계속 이렇게 할지는 모르겠다—일이 꽤 많다.

go 블록 예시

Go 프로그램은 // filename.go <설명>으로 시작한다.

1// minimal.go is an example go program.
2// see https://gitlab.com/efronlicht/blog/-/blob/58fb4c13f870a73514284617c71027bbe0a76e2a/articles/startingsystems/cmd/pythonports/minimal.py for the python version.
3package main
4import "fmt"
5func main(){ fmt.Println("this is a go program") }

bash 블록 예시

Bash 스크립트는 #!/usr/bin/env bash로 시작한다. 보통 # IN 주석 뒤에 터미널에서 실행할 법한 명령이 나열된다. # OUT 주석은 기대 출력(expected output)을 보여준다.

1#!/usr/bin/env bash
2# example.bash demonstrates a simple bash script with an # IN and # OUT section.
3
4# IN:
5echo "this is a bash script"
6
7# OUT:
8this is a bash script

보조정리(lemma): 사이드노트

사이드노트는 이런 들여쓰기 박스 안에 나타난다. ‘lemma’는 어떤 점을 명확히 하기 위해 중요한, 작은 곁가지 설명이다.

보조정리: 쉬뱅(shebang, #!)

파일 맨 앞의 쉬뱅(#!)은 운영체제에게 “이 파일을 실행할 때 어떤 프로그램을 사용할지”를 알려준다. 예를 들어 /usr/bin/bash는 /usr/bin/bash에 있는 bash 셸로 파일을 실행한다. #!/usr/bin/env bash는 OS에게 PATH 환경 변수에 들어 있는 bash를 사용해 스크립트를 실행하라고 말한다. 이런 것들에 대해서는 나중 글에서 더 이야기하겠다.

1.2. 마지막 주의사항 몇 가지:

나는 Windows, Darwin, BSD보다 Linux에 훨씬 익숙해서 이 글은 리눅스 중심이다. 리눅스와 다른 OS의 차이를 가끔 언급하겠지만, 내가 “OS”라고 말할 때는 그냥 리눅스를 의미하는 경우도 있다.
이 글은 포괄적인 가이드가 될 수 없다. 이상적으로는 컴퓨터 구조에 대한 기본 지식이 있으면 좋다. “nibble”, “register”, “file descriptor” 같은 모르는 용어가 나오더라도 당황하지 말자. 대체로 따라올 수 있을 것이다. 생각나는 것들은 용어집(glossary)로 제공하겠지만, 분명 몇 개는 빠질 것이다.
이 글 시리즈는 아주 많은 프로그램의 소스 코드를 제공한다. 이 시리즈의 핵심은 코드를 읽는 것이다. 코드는 보통 텍스트보다 더 중요하다. 진행하면서 코드를 읽고 수정해 보길 강력히 권한다.

좋다, 의식은 이쯤하고 시작하자.

1.3. 시리즈 개요

프로그래머는 프로그램을 작성한다 <— 지금 여기

이 글에서는 시스템 프로그래밍이 무엇인지, 프로그램이 무엇인지, 그리고 프로그램 안의 데이터와 어떻게 상호작용하는지 이야기한다. 프로그램을 빌드하고 그 안의 데이터를 파고들며, 동작을 바꾸도록 해킹도 해 보고, 이 시리즈 전체에서 사용할 이해 도구들을 여럿 만들 것이다.

프로그램과 바깥세상: 명령줄 인자, 환경 변수, 시스템 콜

프로그램은 바깥세상과 어떻게 상호작용할까? 명령줄 인자, 환경 변수, 시스템 콜을 포함한 UNIX 프로그래밍 환경의 기초를 다루고, 간단한 명령줄 인터프리터(일명 셸)까지 만들어 본다.

실행 카운트: 하드웨어, 메모리, & 소프트웨어 성능 (곧 공개)

프로그램은 하드웨어와 어떻게 상호작용할까? 저장과 접근의 기초—레지스터, 메모리 관리, 캐시—를 다루고, 함수 호출이나 시스템 콜을 호출할 때 실제로 무슨 일이 일어나는지 이야기하며, 전반적인 고성능 프로그래밍에 대한 크래시 코스를 제공한다.

잠깐, 전부 goto잖아—프로그래밍의 기초, 가상 머신, 어셈블리, 디버깅, ABI (곧 공개)

결국 프로그래밍이란 메모리, 명령 포인터, 그리고 조건부 점프의 연속이다. 새로 익힌 시스템 프로그래밍 기술로 Go의 유효한 부분집합인 가상 머신과 어셈블리 언어를 만든다. 이를 통해 디버거와 ABI가 어떻게 동작하는지 설명한다.

Go의 유효한 부분집합인 가상 머신과 프로그래밍 언어를 발명하고, 그걸로 프로그래밍과 디버깅의 기초를 탐험한다.

2. 시스템 프로그래밍이란?

“시스템 프로그래밍”과 다른 종류 사이의 명확한 경계선은 없다. 어떤 문제는 다음 중 하나라도 해당하면 ‘시스템 프로그래밍’이라고 할 수 있다.

운영체제나 하드웨어와 상호작용한다
성능 제약이 빡빡하다
개별 바이트나 레지스터를 다루는 등 ‘저수준’에서 동작한다

시스템 프로그래머는 컴퓨터를 수학적/형식적 추상화가 아니라, 완전히 이해할 수 있는 _물리적 기계_로 본다. 컴퓨터 시스템의 _하드웨어_와 _소프트웨어_를 이해하며, 둘 모두와 상호작용하는 프로그램을 작성할 수 있다. 시스템 프로그래머는 어떤 것을 해체하는 것을 두려워하지 않고, 다시 조립할 수 있다는 확신을 갖고 있다.

3. 블랙박스 들여다보기: 도대체 프로그램이란 무엇인가?

_프로그램_이란, 운영체제가 기계어 명령의 연속으로 해석할 수 있는 실행 파일이다. 즉 운영체제가 메모리에 로드해 실행할 수 있는 코드와 데이터의 조합이다.

프로그램은 크게 두 종류로 나뉜다.

입력을 받아 출력을 만들어내는 종류(이 글의 초점)
무한히 실행되며 바깥세상의 상호작용을 기다리는 종류(데몬, 서버 등)

‘입력’과 ‘출력’이라고 할 때 우리가 의미하는 것은 바이트(bytes)다. 워밍업으로, 입력은 없지만 출력은 있는 프로그램을 작성해 보자. 거의 50년 된 고전, “hello, world!”다.

3.1. hello.go

hello.py: 클릭

개요

표준 출력에 문자열을 출력한다

1// hello.go
2package main
3import "fmt"
4func main() {
5    fmt.Println("hello, world!")
6}

3.2. buildhello.bash

개요

go build를 호출해 프로그램을 컴파일한다
프로그램을 실행한다

1#!/usr/bin/env bash
2# buildhello.bash builds and runs the hello program.
3# IN
4go build -o hello hello.go # 1. call 'go build' to compile the program
5./hello # 2. run the program
6# OUT
7hello, world!

좋다. 그런데 hello 프로그램 안에는 실제로 무엇이 들어 있을까? 우리는 다음을 기대한다.

데이터 — “hello, world!” 문자열 포함
코드 — 그 문자열을 출력하는 명령들
그 밖의 다른 것들?

직접 찔러 보면서 무엇을 찾을 수 있는지 보자. 데이터부터 시작하자. 즉, 실행 명령이 아닌 파일 속 바이트들이다.

4. 데이터 세그먼트 조사

운영체제나 아키텍처에 상관없이, 파일 안에 “hello, world!” 문자열이 있을 거라는 건 확실하다. 그걸 찾아보자. 더 나아가, 그걸 찾는 프로그램을 작성하자—findoffset이라고 부르자.

4.1. `findoffset.go`로 문자열 찾기

개요

파일에서 특정 문자열을 찾아, 첫 번째로 등장하는 위치(offset)를 출력하고 싶다.

결국 문자열이란 어떤 문자 인코딩에서의 바이트 시퀀스일 뿐이다. 그래서 파일의 바이트를 문자열의 바이트와 1바이트씩 비교하면 된다.

즉:

명령줄 인자를 파싱한다
파일을 메모리로 읽어 온다
파일의 바이트와 문자열의 바이트를 1바이트씩 비교한다

1.   매치 실패: 다음 offset에서 계속

2.   매치 성공: 출력하고 0으로 종료(ok)

4. 1로 종료(error)

보조정리: 표준 출력 스트림

모든 프로그램은 기본적으로 3개의 파일에 연결된다. 흔히 “standard i/o”, 줄여서 stdio라고 한다.

FILE 이름 R/W? 설명 PYTHON JS GO Note
STDIN 표준 입력 R 터미널에서 입력한 것이 여기에 들어간다. sys.stdin process.stdin os.Stdin Input
STDOUT 표준 출력 W 프로그램이 쓴 내용이 여기에 간다. 다른 프로그램을 위한 출력. sys.stdout process.stdout os.Stdout
STDERR 표준 에러 W 프로그램이 에러를 쓰는 곳. 사람을 위한 출력. sys.stderr process.stderr os.Stderr Error

FILE	이름	R/W?	설명	PYTHON	JS	GO	Note
STDIN	표준 입력	R	터미널에서 입력한 것이 여기에 들어간다.	sys.stdin	process.stdin	os.Stdin	Input
STDOUT	표준 출력	W	프로그램이 쓴 내용이 여기에 간다. 다른 프로그램을 위한 출력.	sys.stdout	process.stdout	os.Stdout
STDERR	표준 에러	W	프로그램이 에러를 쓰는 곳. 사람을 위한 출력.	sys.stderr	process.stderr	os.Stderr	Error

이 예제에서 사용한 것들:

| | — | — | — | —|

| fmt.Fprintf(io.Writer, string, ...interface{}) | int | 스트림(파일, 메모리 버퍼 등)에 포맷된 출력 쓰기 | |

findoffset.py: 클릭

1// findoffset.go is a command line tool that finds the offset of the first occurrence of a string in a file and prints it to stdout.
 2package main
 3
 4import (
 5	"fmt"
 6	"os"
 7)
 8
 9func main() {
10	// 1. parse the command line arguments
11
12	// the operating system provides command line arguments to your program.
13	// os.Args[0] is the name of the program, and the rest are the the 'real' arguments.
14	if len(os.Args) != 3 {
15		fmt.Fprintf(os.Stderr, "Usage: findoffset <filename> <string>")
16		os.Exit(1)
17	}
18
19	filepath, pattern := os.Args[1], os.Args[2]
20
21	// 2. read the file into memory
22
23	// it's inefficent to read the entire file into memory, but it's simple and works well for small files
24	b, err := os.ReadFile(filepath) // we'll talk about how reading files works more later, too!
25	if err != nil {
26	fmt.Fprintf(os.Stderr, "read %s: %v", filepath, err) // HUMAN-READABLE DEBUG INFO should go to STDERR
27		os.Exit(1)
28	}
29
30	// 3. compare the bytes in the file to the bytes in the string, one-by-one
31	for i := 0; i < len(b)-len(pattern); i++ {
32		for j := range pattern {  // byte-by-byte comparison
33			// 3.1. no match: continue at next offset
34			if b[i+j] != pattern[j] {
35				break
36			}
37
38			// 3.2. match: print and exit 0 (ok)
39			if j == len(pattern)-1 { // found it! print the offset & newline & exit
40				fmt.Fprintf(os.Stdout, "%d\n", i)  // MACHINE-READABLE OUTPUT should go to STDOUT
41				os.Exit(0)
42			}
43		}
44	}
45	// 4. exit 1 (error)
46	os.Exit(1)
47}

좋아 보인다. 그런데 어떻게 테스트하지? findoffset을 테스트하려면 특정 내용으로 파일을 만들 수 있으면 더 쉽다. 그걸 하는 프로그램을 작성해 보자. 유닉스 전통을 따라 echo라고 부르자.

4.2. `echo.go`로 간단한 파일 쓰기

개요

명령줄 인자를 순회한다
각 인자를 공백으로 구분해 표준 출력에 출력한다
줄바꿈으로 종료한다

1// echo prints its arguments to standard output, separated by spaces and terminated by a newline.
 2// usage: echo <args...>
 3// see the python port at https://gitlab.com/efronlicht/blog/-/blob/0d2327696c01d6a46551fac21521937ee9f6fbe3/articles/startingsystems/cmd/pythonports/echo.py
 4package main
 5func main() {
 6	// 1. iterate over the command line arguments
 7	for i, arg := range os.Args[1:] {
 8		if i > 0 {
 9			fmt.Print(" ")
10		}
11		// 2. print each argument to standard output, separated by spaces
12		fmt.Print(arg)
13	}
14	// 3. terminate with a newline
15	fmt.Println()
16}

이제 간단한 파일은 쓸 수 있는데, 안에 뭐가 들어 있는지는 어떻게 알지? 그걸 읽는 프로그램을 작성하면 된다. 유닉스 전통을 따라 cat이라고 부르자.

4.3. `cat.go`로 파일 출력하기

cat은 concatenate(연결)의 줄임말로, 파일들을 결합해 표준 출력으로 출력한다. 하지만 실사용에서는 단일 파일을 읽어 터미널이나 다른 프로그램으로 보내는 용도로 더 자주 쓴다.

cat.py: 클릭

개요

우리는 다음을 하고 싶다.

명령줄에 지정된 각 파일을 읽는다
메모리로 읽어 온다
그 메모리를 표준 출력으로 복사한다

1// cat reads each file specified on the command line and writes its contents to standard output.
 2// usage: cat <file1> [<file2> ...]
 3package main
 4import (
 5	"fmt"
 6	"os"
 7)
 8func main() {
 9	for _, file := range os.Args[1:] { // 1. read each file specified on the command line
10		f, err := os.Open(file)
11		if err != nil {
12			fmt.Fprintf(os.Stderr, "open %s: %v", file, err)
13			os.Exit(1)
14		}
15		// performance note: it's better to use `io.Copy`, but I want to illustrate the process.
16		defer f.Close()
17		b, err := io.ReadAll(f) // 2. read it into memory
18		if err != nil {
19			fmt.Fprintf(os.Stderr, "read %s: %v", file, err)
20			os.Exit(1)
21		}
22		os.Stdout.Write(b) // 3. write its contents to standard output
23
24	}
25}

echo로 파일 두 개를 써서 cat으로 읽어 보자.

1#!/usr/bin/env bash
2# IN
3echo "the quick brown fox" > fox.txt
4echo "jumps over the lazy dog" > dog.txt
5cat fox.txt dog.txt
6# OUT
7the quick brown fox
8jumps over the lazy dog

연습문제

파일의 줄에 번호를 매기는 numberlines 프로그램을 작성하라.
파일의 비인쇄 문자를 치환하는 escapetext 프로그램을 작성하라.

4.4. `findoffset`, `echo`, `cat`으로 `hello` 프로그램 조사하기

간단한 파일 fox.txt를 쓰고 cat으로 읽어 보자.

bash 스크립트: `catfox.bash`

1#!/usr/bin/env bash
2# IN
3echo "the quick brown fox jumps over the lazy dog" > fox.txt
4cat fox.txt
5# OUT
6the quick brown fox jumps over the lazy dog

좋다. findoffset으로 fox.txt에서 “brown”의 offset을 찾아 보자.

1#!/usr/bin/env bash
2# findbrown.bash looks for the string "brown" in the file "fox.txt" and prints the offset.
3# IN:
4echo "the quick brown fox jumps over the lazy dog" > fox.txt
5findoffset fox.txt "brown"
6# OUT
710

되는 것 같다. 그럼 hello 프로그램에서 “hello, world!”의 offset을 찾아 보자.

1#!/usr/bin/env bash
2# IN
3findoffset hello "hello, world!"
4# OUT
5721335

연습문제

findoffset을 수정해서, 찾을 문자열의 “몇 번째 등장”을 지정하는 두 번째 인자를 받도록 하라. 예를 들어 findoffset hello "hello, world!" 2는 hello 프로그램에서 “hello, world!”의 두 번째 등장 위치를 찾아야 한다.

findoffset에서 음수 offset을 허용해 파일의 끝에서부터 검색하도록 하라. 예를 들어 findoffset hello "hello, world!" -1은 hello 프로그램에서 “hello, world!”의 마지막 등장 위치를 찾아야 한다.

4.5. `binpatch.go`로 하는 기초 해킹

컴파일된 프로그램의 동작을 바꾸고 싶은데 소스 코드에 접근할 수 없다고 해 보자.

우리는 다음 사실을 알고 있다.

프로그램은 그냥 바이트 덩어리다.
우리는 그 바이트를 읽고 쓸 수 있다.

이것만 알면 동작을 바꾸기에 충분하다.

재컴파일 없이 “hello, world!” 대신 “hello, efron!”을 출력하도록 바꿔 보자. 바이너리를 패치(patch) 하면 된다. 이를 위한 프로그램 binpatch를 작성해 보자.

개요

특정 바이트 덩어리만 제외하고 나머지를 전부 복사하고 싶다. 해야 할 일:

인자를 파싱한다
교체 구간 이전의 모든 내용을 file에서 표준 출력으로 복사한다(즉, 파일의 offset 바이트)
replacement를 표준 출력으로 쓴다
교체 중인 바이트들을 건너뛴다
파일의 나머지 부분을 표준 출력으로 복사한다

binpatch.py: 클릭

1// binpatch replaces a sequence of bytes in file starting at offset with a replacement string,
 2// and writes the result to standard output.
 3// Usage: binpatch <file> <offset> <replacement>
 4package main
 5
 6import (
 7	"fmt"
 8	"io"
 9	"os"
10	"strconv"
11)
12
13func main() {
14	// 1. Parse the arguments
15
16	// the first argument is the name of the program, so we need to check for 4 arguments.
17	// we'll talk more about arguments later.
18	if len(os.Args) != 4 {
19		// having the name of the program is useful for error messages, like this one.
20		// error messages are written to stderr, so they don't interfere with the output.
21		fmt.Fprintf(os.Stderr, "Usage: %s <file> <offset> <replacement>", os.Args[0])
22		os.Exit(1)
23	}
24	var (
25		file        = os.Args[1]
26		offset, err = strconv.ParseInt(os.Args[2], 0, 64)
27		replacement = os.Args[3]
28	)
29	if err != nil || offset < 0 {
30		fatalf("invalid offset: %v\nUsage: %s <file> <offset> <replacement>", err, os.Args[0])
31	}
32	// open the file for reading and writing
33	f, err := os.OpenFile(file, os.O_RDWR, 0)
34	if err != nil {
35		fatalf("open %s: %v\n", file, err)
36	}
37	defer f.Close()
38
39
40	// 2. Copy everything before the replacement from `file` to standard output (that is, `offset` bytes of the file)
41	_, err = io.CopyN(os.Stdout, f, offset)
42	if err != nil {
43		fatalf("copy: %v\n", err)
44	}
45
46	// we're now at the offset where we want to write the replacement chunk.
47	// 3. Write `replacement` to standard output
48	_, err = os.Stdout.Write([]byte(replacement))
49	if err != nil {
50		fatalf("write: %v\n", err)
51	}
52
53	// 4. Skip over the bytes we're replacing by throwing them away.
54	if _, err := io.CopyN(io.Discard, f, int64(len(replacement))); err != nil {
55		fatalf("copy: %v\n", err)
56	}
57
58	// 5. copy the rest of the file to standard output
59	_, err = io.Copy(os.Stdout, f)
60	if err != nil {
61		fatalf("copy: %v\n", err)
62	}
63}
64
65// fatalf prints an error message to stderr with fmt.Fprintf, then exits with status 1.
66func fatalf(format string, args ...interface{}) {
67	fmt.Fprintf(os.Stderr, format, args...)
68	os.Exit(1)~
69}

fox.txt에서 “brown”을 “green”으로 바꿔 보자.

1# IN
2findoffset fox.txt "brown"
3# OUT
410

그 offset을 이용해 파일을 패치하면…

1# IN
2binpatch fox.txt 10 "green"
3# OUT
4the quick green fox jumps over the lazy dog

바이너리 해킹하기

된다! 이제 hello 프로그램에도 적용해 보자.

bash 참고: bash 셸의 $(...) 문법은 “명령 치환(command substitution)”이다. 괄호 안의 명령을 실행하고, 그 출력으로 표현식을 대체한다. 이를 이용해 findoffset의 출력을 binpatch에 넘겨줄 수 있다.

1# IN
2binpatch hello $(findoffset hello "hello, world!") "hello, efron!" > hackedhello
3./hackedhello
4# OUT
5bash: ./hackedhello: Permission denied

앗, hackedhello를 실행 가능(executable)으로 만들지 않았다. 고치자.

파일 권한은 나중에 다룰 것이다. 지금은 파일이 읽기 가능(r), 쓰기 가능(w), 실행 가능(x)일 수 있다는 것만 알면 된다. chmod는 이 권한을 바꾼다. +x는 실행 가능으로, -x는 실행 불가로, +r은 읽기 가능으로 만든다.

1# IN
2chmod +x hackedhello
3./hackedhello
4# OUT
5hello, efron!

된다! 재컴파일 없이 프로그램의 동작을 성공적으로 바꿨다. 이 정도는 많은 ‘프로그래머’가 커리어 전체에서 한 번도 안 해보는 해킹인데, 우리는 이제 워밍업에 불과하다.

참고로, 누구나 이런 일을 할 수 있다. 실행 파일을 다운로드한 뒤 어떤 체크섬으로 무결성을 검증하는 편이 좋다… 하지만 대부분 그렇게 하지 않는다.

프로그램 안을 들여다보고 무엇이 있는지 확인하거나, 심지어 바꿀 수도 있다! 다른 건 하나도 기억하지 못하더라도 이것만은 기억해라. 블랙박스가 아니다, 마법이 아니다. 그냥 바이트다.

결국 프로그래밍이란 데이터 변환이다. 이론, 패러다임, 패턴이 아니다—그런 것들이 도움이 될 때도 있지만 본질을 놓치게 만든다. 프로그래밍은 데이터를 받아 바꾸고 새 데이터를 내놓는 것이다. 텍스트 편집기, 헥스 편집기, 프로그램으로도 할 수 있다. IDE, 라이브러리, github copilot 등 도움을 받을 수도 있다. 하지만 그 근본 이해를 대체할 것은 없다.

이제 프로그램 안의 데이터를 더 살펴보자. 예전에 “hello, world!”였던 것의 _주변_에는 바이너리 안에 뭐가 있을까?

늘 하던 대로, 알아보기 위해 torso라는 프로그램을 작성하자.

유닉스 coreutils의 head와 tail은 각각 파일의 앞부분과 뒷부분을 읽는다. 보통은 그것들을 쓰겠지만, torso는 그 말장난으로 파일의 ‘가운데’를 읽는다.

인자와 플래그는 나중에 더 자세히 다룬다. 지금은 -flag가 유닉스 프로그램에서 플래그를 지정하는 흔한 방식이라는 것만 알면 된다.

4.6. `torso.go`로 파일 속으로 손 뻗기

개요

해야 할 일:

명령줄 플래그를 정의하고 파싱한다:

*   읽을 `-file`을 고른다

*   읽기 시작할 `offset`을 고른다

*   offset의 앞/뒤로 읽을 바이트 수 `-before`, `-after`를 고른다

2. 읽기 시작할 첫 바이트(offset - before)까지 건너뛴다

(before + after) 바이트를 표준 출력으로 복사한다
필요하면 줄바꿈을 추가한다

torso.py: 클릭

1
 2// torso reads the 'middle' of a file - the bytes around a given offset.
 3// it's not the head of the file, and it's not the tail - it's the torso.
 4// usage:
 5//
 6//	torso -offset n -before [b=128] -after [a=128] -from file [-newline]
 7//
 8// if no file is given, reads from standard input.
 9package main
10
11import (
12	"flag"
13	"fmt"
14	"io"
15	"os"
16)
17
18func main() {
19	var offset, before, after int
20	var from string
21	var newline bool
22	{ // 1. define and parse command line flags
23		flag.IntVar(&offset, "offset", -1, "offset to read from: must be specified")                // -offset is required
24		flag.IntVar(&before, "before", 128, "bytes to read before offset: will be clamped to 0")    // defaults to -before 128
25		flag.IntVar(&after, "after", 128, "bytes to read after offset: will be clamped to 0")       // defaults to -after 128
26		flag.StringVar(&from, "from", "", "file to read from: if empty, reads from standard input") // -from is required
27		flag.BoolVar(&newline, "newline", false, "append a newline to the output")                  // -newline is optional
28		flag.Parse()
29	}
30
31	// bounds checking and normalization
32	{
33		before = max(before, 0)      // can't be negative
34		before = min(before, offset) // can't go past the beginning
35		after = max(after, 0)
36		if offset < 0 {
37			fmt.Fprintf(os.Stderr, "missing or invalid -offset\n")
38			os.Exit(1)
39		}
40	}
41
42	start := offset - before // where to start?
43	n := before + after      // total number of bytes to read
44	if n == 0 {
45		return // nothing to do
46	}
47	buf := make([]byte, n)
48
49	// read from a file
50	f, err := os.Open(from)
51	if err != nil {
52		fmt.Fprintf(os.Stderr, "open: %s: %v\n", from, err)
53		os.Exit(1)
54	}
55	// 2. skip to the first byte we want to read (offset - before)
56	_, err = f.Seek(int64(start), io.SeekStart)
57	if err != nil {
58		fmt.Fprintf(os.Stderr, "seek: %s: %v\n", from, err)
59		// make sure to close the file before exiting!
60		// experienced go programmers will use 'defer()', but I want this to be accessible to non-go programmers.
61		f.Close()
62		os.Exit(1)
63	}
64
65	// 3. copy (before + after) bytes to standard output
66
67	// first read them into memory...
68	n, err = io.ReadFull(f, buf)
69	if err != nil && err != io.EOF && err != io.ErrUnexpectedEOF {
70		fmt.Fprintf(os.Stderr, "read: %s: %v\n", from, err)
71		os.Exit(1)
72	}
73	buf = buf[:n]
74
75	// then write them to standard output
76	_, err = os.Stdout.Write(buf)
77	if err != nil {
78		fmt.Fprintf(os.Stderr, "write: %v\n", err)
79		f.Close()
80		os.Exit(1)
81	}
82
83	// 4. add a newline if requested
84	if newline {
85		fmt.Println()
86	}
87	f.Close()
88}

1#!/usr/bin/env bash
2# IN
3torso -offset $(findoffset hello "hello, world!") -before 128 -after 128 -from hello
4# OUT
5mismatchwrong timersillegal seekinvalid slothost is downnot pollablegotypesaliashttpmuxgo121multipathtcprandautoseedtlsunsafeekmhello, world!3814697265625wakeableSleepprofMemActiveprofMemFuturetraceStackTabexecRInternaltestRInternalGC sweep waitSIGQUIT: qu

에러 메시지(“illegal seek”, “host is down”, “not pollable”), Go 런타임 내부 메시지(“profMemActive”, “profMemFuture”), 그리고 수상한 숫자 3814697265625가 섞여 있는 것처럼 보인다. 다음 섹션에서는 그 숫자가 어디서 왔는지 조사한다.

연습문제

-lines 플래그로, 바이트 대신 줄 단위로 동작하도록 torso를 수정하라.

-words 플래그로, 바이트 대신 단어 단위로 동작하도록 torso를 수정하라.

-runes 플래그로, 바이트 대신 UTF-8 코드포인트(룬) 단위로 동작하도록 torso를 수정하라.

-from 파일이 없으면 표준 입력에서 읽도록 torso를 수정하라.

4.7. 조사: `3814697265625`는 뭐지?

문자열이 소스 코드에 있었다면, 바이너리에도 있어야 한다. 바이너리에 있다면… 아마 소스 코드에도 있을 것이다. 우리가 임포트한 패키지 중 하나에 있을지도 모른다. Go 언어 소스 코드에서 찾아보자. 문자열을 포함한 파일을 찾는 프로그램을 작성하자.

고전 유닉스 도구 grep가 딱 맞다. 우리도 이를 하는 프로그램을 작성할 것이다… 하지만 그건 다음에 하자. 파일에 대해 좀 더 이야기한 뒤에.

일단 grep를 써서 우리가 쓰는 Go 버전의 소스 코드에서 “3814697265625”가 처음 등장하는 곳을 찾아보자.

1# IN
2git clone https://github.com/golang/go
3cd go
4git checkout v1.23 # or whatever version you're using
5grep -r "3814697265625" .
6# OUT
7math/big/floatconv.go:  3814697265625,
8strconv/decimal.go:     {6, "3814697265625"},                               // * 262144

그런데 둘 중 어느 쪽이 우리 프로그램에 들어갔을까? findoffset으로 알아보자.

1# IN
2# is it math/big?
3findoffset hello "math/big" || echo "math/big not found"
4findoffset hello "strconv" || echo "strconv not found"
5# OUT
6math/big not found
7745064

strconv다. 다시 cat으로 주변을 보자.

IN

1// #!/usr/bin/env bash
2cat strconv/decimal.go

OUT

1// Cheat sheet for left shift: table indexed by shift count giving
 2// number of new digits that will be introduced by that shift.
 3//
 4// For example, leftcheats[4] = {2, "625"}.  That means that
 5// if we are shifting by 4 (multiplying by 16), it will add 2 digits
 6// when the string prefix is "625" through "999", and one fewer digit
 7// if the string prefix is "000" through "624".
 8//
 9// Credit for this trick goes to Ken.
10type leftCheat struct {
11	delta  int    // number of new digits
12	cutoff string // minus one digit if original < a.
13}
14
15var leftcheats = []leftCheat{
16	// Leading digits of 1/2^i = 5^i.
17	// 5^23 is not an exact 64-bit floating point number,
18	// so have to use bc for the math.
19	// Go up to 60 to be large enough for 32bit and 64bit platforms.
20	/*
21		seq 60 | sed 's/^/5^/' | bc |
22		awk 'BEGIN{ print "\t{ 0, \"\" }," }
23		{
24			log2 = log(2)/log(10)
25			printf("\t{ %d, \"%s\" },\t// * %d\n",
26				int(log2*NR+1), $0, 2**NR)
27			}'
28	*/
29	{0, ""},
30/* many entries omitted for space ...*/
31	{6, "3814697265625"},                               // * 262144
32/* many entries omitted for space ...*/
33	{19, "867361737988403547205962240695953369140625"}, // * 1152921504606846976
34}

이것은 거장 Ken Thompson 본인이 만들어낸 정교한 시스템 프로그래밍 트릭이었다! 나중에 다시 보자… 살짝 맛보기라고 생각하면 된다 ;).

Ken Thompson

Kenneth Lane Thompson(1943년생)은 유닉스, grep, Go와 전 세계 거의 모든 컴퓨터 시스템의 기반을 이루는 수많은 도구들을 만들었다. 혼자서 다 한 건 아니지만, 전설적으로 생산적인 인물이다. 점심시간에 유닉스 파이프를 구현했다는 유명한 일화도 있다. 정규식을 사용해 본 적이 있거나, 맥/리눅스 컴퓨터나 스마트폰을 사용해 본 적이 있다면 그의 작업을 사용한 셈이다.

데이터 세그먼트 만지작거림은(일단) 이쯤 하자. 프로그램의 나머지 부분에 대한 통찰을 얻을 수 있을지 보자.

5. 코드 세그먼트 조사

torso로 프로그램 첫 256바이트를 살짝 들여다보자.

1# IN
2torso -offset 0 -after 256 -from hello
3# OUT
4ELF

예상보다 훨씬 적은 바이트만 나온 것 같다. 무슨 일일까? 셸은 바이너리 데이터를 어떻게 해석해야 할지 확신하지 못한다. ASCII—1960년대부터 거의 모든 셸이 사용해 온 문자 인코딩—는 128개 문자를 다룬다. 그중 ‘a’나 ‘|’처럼 출력 가능한 것도 있지만, 출력 불가능한 것도 많다. 처음 32개는 **제어 문자(control characters)**로, 뭔가를 출력하기보다 터미널을 _제어_한다. 우리 프로그램의 첫 256바이트에는 이런 것들이 많아서 셸을 혼란스럽게 하는 것이다.

더 깊이 보기 위해서는 바이트를 사람이 읽을 수 있는 형식으로 직렬화(serialize) 해야 한다. 가장 흔한 방법은 바이트를 **16진수(hexadecimal)**로 출력하는 것이다. 이를 위해 고전 유닉스 도구 hexdump를 변형한 프로그램을 작성하자.

5.1. `shexdump.go`로 바이너리 파일 읽기

이 도구는 바이너리 데이터를 조작하는 첫걸음이 될 것이다. 이것은 시스템 프로그래밍의 핵심 기술이다. 파일을 덩어리(chunk)로 읽고, 각 바이트를 16진수 두 자리로 바꿔 표준 출력으로 출력할 것이다. 고전 hexdump와 구분하기 위해 shexdump(simple hexadecimal dump)라고 부르자.

또, 16바이트마다 줄을 바꾸고, 각 바이트 사이에 공백을 넣고, 16바이트마다 줄바꿈을 넣으면 더 읽기 쉬울 것이다.

무엇을 해야 할까?

입력 소스를 고른다: stdin 또는 파일
입력에서 바이트 덩어리(최대 16)를 읽는다
덩어리의 각 바이트를 16진수 두 자리로 변환한다
16진수 숫자를 공백으로 구분해 표준 출력으로 출력하고, 줄바꿈으로 끝낸다

hexdump.py: 클릭

1// shexdump.go dumps the input as pairs of space-separated hexadecimal bytes, with a newline after every 16 bytes.
 2//
 3// # example
 4//  //	#!usr/bin/env/bash
 5//  //	echo "now is the time for all good men to come to the aid of their country" | shexdump
 6//  //	6e 6f 77 20 69 73 20 74  68 65 20 74 69 6d 65 20
 7//	//	6f 66 20 61 6c 6c 20 67  6f 6f 64 20 6d 65 6e 20
 8//	//	74 6f 20 63 6f 6d 65 20  74 6f 20 74 68 65 20 61
 9//	//	69 64 20 6f 66 20 74 68  65 69 72 20 63 6f 75 6e
10//	//	74 72 79 20 0a
11
12package main
13
14import (
15	"bufio"
16	"fmt"
17	"io"
18	"os"
19)
20
21func main() {
22	// 1. choose the input source: stdin or a file
23	var src io.Reader
24	switch len(os.Args) {
25	case 1:
26		src = os.Stdin
27	case 2:
28		f, err := os.Open(os.Args[1])
29		if err != nil {
30			fmt.Fprintf(os.Stderr, "open %s: %v", os.Args[1], err)
31			os.Exit(1)
32		}
33		defer f.Close()
34		src = f
35	default:
36		fmt.Fprintf(os.Stderr, "Usage: %s [filename]", os.Args[0])
37		os.Exit(1)
38	}
39	if err := hexdump(os.Stdout, src); err != nil {
40		fmt.Fprintf(os.Stderr, "hexdump: %v", err)
41		os.Exit(1)
42	}
43}
44
45// dump the contents of r to w in a hexdump format.
46func hexdump(dst io.Writer, src io.Reader) error {
47	// performance: small reads and writes are very inefficient. while we could write a byte at a time, it's much faster to read and write in chunks.
48	r := bufio.NewReader(src)
49	defer w.Flush()
50	for { // 2. read a chunk of bytes (up to 16) from the input
51
52		var raw [16]byte // read 16 bytes at a time
53
54		encoded := make([]byte, 0, 16*3+1+1) // 16 bytes, 3 characters per byte, 1 space between bytes, newline at the end.
55		n, err := io.ReadFull(r, raw[:])
56
57		// 3. convert each byte in the chunk to a pair of hexadecimal digits
58		const hex = "0123456789abcdef"
59		if n != 0 {
60			for i := range min(n, 8) {
61				encoded = append(encoded, hex[raw[i]>>4], hex[raw[i]&0x0f], ' ')
62			}
63			encoded = append(encoded, ' ')
64			for i := 8; i < min(n, 16); i++ {
65				encoded = append(encoded, hex[raw[i]>>4], hex[raw[i]&0x0f], ' ')
66			}
67			encoded[len(encoded)-1] = '\n'
68
69			// 4. print the hexadecimal digits to standard output, space-separated, terminating with a newline
70			if _, err := w.Write(encoded); err != nil {
71				return err
72			}
73		}
74		if err == io.ErrUnexpectedEOF {
75			return nil
76		} else if err != nil {
77			return err
78		}
79	}
80}

연습문제

각 줄의 앞에 파일 offset을 16진수로 붙이고 줄의 끝 offset으로 마무리하는 -offset 플래그를 추가하라(기본값 false)
한 줄에 출력할 컬럼 수를 지정하는 -columns 플래그를 추가하라(기본값 2)
각 컬럼에 출력할 바이트 수를 지정하는 -column-width 플래그를 추가하라(기본값 8)
연속으로 동일한 줄이 여러 개 나올 때 *와 개수로 한 줄로 압축하는 -squeeze 플래그를 추가하라(기본값 false)
각 줄의 끝에 바이트의 ASCII 표현을 덧붙이는 -ascii 플래그를 추가하라(기본값 false). 출력 불가 또는 비ASCII 바이트는 .으로 바꿔라.

1#!/usr/bin/env bash
2# IN
3echo -n "now is the time " | shexdump -ascii
4# OUT
56e 6f 77 20 69 73 20 74  68 65 20 74 69 6d 65 20  |now is the time |

hexdump -C처럼 정식(canonical) hex+ASCII 형식으로 출력하는 -canon 플래그를 추가하라(기본값 false). 이는 -ascii와 -offset을 결합하며, 이 둘과는 배타적이어야 한다.

1#!/usr/bin/env bash
2// # IN
3echo "now is the time of all good men to come to the aid of their country " | shexdump -canon

1# OUT
200000000  6e 6f 77 20 69 73 20 74  68 65 20 74 69 6d 65 20  |now is the time |
300000010  6f 66 20 61 6c 6c 20 67  6f 6f 64 20 6d 65 6e 20  |of all good men |
400000020  74 6f 20 63 6f 6d 65 20  74 6f 20 74 68 65 20 61  |to come to the a|
500000030  69 64 20 6f 66 20 74 68  65 69 72 20 63 6f 75 6e  |id of their coun|
600000040  74 72 79 20 0a                                    |try .|
700000045

정말 올바르게 썼는지 어떻게 테스트할까? hexdump의 출력은 손실 없이 원본 파일로 되돌릴 수 있어야 한다. 이를 위한 함수를 만들고, 그걸 쓰는 프로그램 unhexdump를 작성하자.

5.2. `unhexdump.go`로 헥스덤프 역직렬화하기

개요

이전 프로그램 hexdump는 바이너리를 읽고 공백으로 구분된 16진수 바이트 쌍을 썼다.

이 프로그램 unhexdump는 공백으로 구분된 16진수 바이트 쌍을 읽고 원래 바이너리를 써야 한다. 즉:

입력 소스 선택: stdin 또는 파일
파일에서 공백으로 구분된 16진수 바이트 쌍을 읽는다
각 쌍을 바이트로 되돌린다(언헥스 unhex)
그 바이트를 표준 출력에 쓴다

프로그램: `unhexdump.go`

unhexdump.py: 클릭

1package main
 2
 3import (
 4	"bufio"
 5	"fmt"
 6	"io"
 7	"os"
 8)
 9
10// unhexdump.go reverses the process of hexdump, converting a hexdump back into a file.
11// it expects pairs of whitespace-separated hexadecimal bytes.
12func main() {
13	// 1. choose the input source: stdin or a file
14	var src io.Reader
15	switch len(os.Args) {
16	case 1:
17		src = os.Stdin
18	case 2:
19		f, err := os.Open(os.Args[1])
20		if err != nil {
21			fmt.Fprintf(os.Stderr, "open %s: %v", os.Args[1], err)
22			os.Exit(1)
23		}
24		defer f.Close()
25		src = f
26	default:
27		fmt.Fprintf(os.Stderr, "Usage: %s [filename]", os.Args[0])
28		os.Exit(1)
29	}
30
31	if err := unhexdump(os.Stdout, src); err != nil {
32		fmt.Fprintf(os.Stderr, "unhexdump: %v", err)
33		os.Exit(1)
34	}
35}
36
37// unhexdump reads pairs of whitespace-separated hexadecimal bytes from r and writes the corresponding bytes to w.
38func unhexdump(w io.Writer, r io.Reader) error {
39	// 2. read pairs of whitespace-separated hexadecimal bytes from the input.
40
41	// we're going to start caring a little more about performance here. we'll use a buffered reader and writer to reduce the number of system calls & allocations (more about these topics in later articles).
42	scanner := bufio.NewScanner(r)
43	scanner.Split(bufio.ScanWords)
44	bw := bufio.NewWriter(w)
45	defer bw.Flush()
46	for i := 0; scanner.Scan(); i++ {
47		b := scanner.Bytes()
48		if len(b)&1 == 1 { // odd number of hex digits
49			return fmt.Errorf("odd number of hex digits at position %d (%q)", i)
50		}
51		// 3. convert each pair back to a byte, unhex-ing it
52		for i := 0; i < len(b); i += 2 {
53			high, ok := unhex(b[i])
54			if !ok {
55				return fmt.Errorf("bad hex %x '%c' at position %d", b[i], b[i], i)
56			}
57			low, ok := unhex(b[i+1])
58			if !ok {
59				return fmt.Errorf("bad hex %x '%c' at position %d", b[i+1], b[i+1], i+1)
60			}
61
62			// 4. write that unhex-ed byte to standard output
63			if err := bw.WriteByte(high<<4 | low); err != nil {
64				return err
65			}
66		}
67
68	}
69	return scanner.Err()
70}
71
72// unhex converts a hexadecimal character to it's value (0x0-0xf),
73// or 0, false if the character is not a valid hexadecimal digit.
74func unhex(b byte) (byte, bool) {
75	switch {
76	case '0' <= b && b <= '9':
77		return b - '0', true
78	case 'a' <= b && b <= 'f':
79		return b - 'a' + 10, true
80	case 'A' <= b && b <= 'F':
81		return b - 'A' + 10, true
82	default:
83		return 0, false
84	}
85}
86

hexdump한 다음 unhexdump해서 원본으로 돌아오는지 테스트해 보자.

5.3. hexdump_test.bash

개요

moby.txt 파일을 쓴다
moby.hex로 hexdump한다
moby2.txt로 unhexdump한다
파일을 비교한다(고전 유닉스 도구 diff를 사용)

1#!/usr/bin/env bash
2# IN
3echo "to the last I grapple with thee; from hell's heart I stab at thee; for hate's sake I spit my last breath at thee" > moby.txt # 1. write a file
4
5shexdump moby.txt > moby.hex # 2. hexdump it
6unhexdump moby.hex > moby2.txt # 3. unhexdump it
7diff -s moby.txt moby2.txt # 4. compare the files

1# OUT:
2Files moby.txt and moby2.txt are identical

좋다. 이 글을 마무리하기 전 마지막 연습으로, torso와 hexdump로 ‘hello’ 프로그램의 첫 바이트들을 조사해 무엇을 배울 수 있는지 보자.

5.4. `hello`의 ELF 헤더 조사하기

리눅스의 프로그램은 ELF(Executable and Linkable Format)로 저장되며, 이 파일은 나머지 부분을 설명하는 헤더로 시작한다. 파일 포맷을 이렇게 구성하는 건 흔한 방식이다—나머지를 설명하는 짧은 헤더. ELF를 예시 헤더로 사용해 보자.

1#!/usr/bin/env bash
2# IN
3cat hello | torso -offset 0 -after 16 | shexdump
4# OUT
57f  45  4c  46  02  01  01  00  00  00  00  00  00  00  00  00
602  00  3e  00  01  00  00  00  c0  cf  46  00  00  00  00  00

7f는 우리에겐 의미가 없다—ASCII가 아니다. 하지만 45 4c 46은 ASCII로 E L F다. 그 이후는 대부분 불투명한 바이너리지만, 새로 배운 C-문자열 지식만으로도 왜 cat이 hello를 출력하다 멈췄는지 이미 추론할 수 있다. hello의 8번째 바이트는 널(null)이며, 이것이 문자열을 종료시켜 셸이 출력 멈춘 것이다. 위키피디아의 ELF 헤더 설명을 보며 더 알아보자.

**오브젝트 파일 타입(object file type)**은 0x10 오프셋의 특수 바이트로 식별된다.

1# IN
2cat hello | torso -offset 0x10 -before 0 -after 1 | shexdump
3# OUT
402

값	타입	의미
0x00	ET_NONE	알 수 없음
0x01	ET_REL	재배치 가능한 파일
0x02	ET_EXEC	실행 파일
0x03	ET_DYN	공유 오브젝트
0x04	ET_CORE	코어 파일

놀랍지 않게도, 우리의 실행 파일은 실행 파일이다. 0x12 오프셋의 machine type으로 아키텍처를 살짝 볼 수 있다.

1# IN
2cat hello | torso -offset 0x12 -before 0 -after 1 | shexdump
3# OUT
43e

이는 x86-64 아키텍처다. 파일의 **엔디언(endianess)**은 0x05 오프셋에 저장된다.

좋다. 그런데 프로그램은 실제로 어디서 시작할까? **엔트리포인트(entrypoint)**는 프로그램이 실행을 시작하는 메모리 위치—초기 명령 포인터(instruction pointer)의 값이다. 이는 ELF 헤더의 0x18 오프셋에 리틀 엔디언(little-endian)으로 저장된다.

1# IN
2cat hello | torso -offset 0x18 -before 0 -after 8 | shexdump
3# OUT
4c0  cf  46  00  00  00  00  00

메모리 주소 0x46cfc0에서 명령 실행을 시작하는 듯하다. 이에 대해서는 나중에 더 이야기하겠다.

오늘의 파헤치기는 여기까지다—이 글은 벌써 20페이지가 넘었다! 끝까지 봤다면, 특히 연습문제까지 했다면 축하하고, 함께해줘서 고맙다.

마무리 전 마지막 메모 몇 가지:

보조정리: 명령 포인터(Instruction Pointer)

가장 근본적인 수준에서 프로그램은 이렇게 동작한다:

명령 포인터가 가리키는 메모리 주소에서 명령을 읽는다

명령이 말하는 일을 한다

명령 크기만큼 명령 포인터를 증가시킨다

엔트리포인트는 명령 포인터가 시작하는 지점이다.

보조정리: 엔디언(Endianness)

바이트 목록을 정수로 바꾸는 방법은 두 가지가 있다. 최상위 바이트가 먼저 오는 빅 엔디언(big-endian)과, 최하위 바이트가 먼저 오는 리틀 엔디언(little-endian)이다. x86-64 아키텍처는 리틀 엔디언이므로, 바이트 c0 cf 46 00 00 00 00 00은 정수로 변환될 때 ‘뒤집혀’ 해석된다. 리틀 엔디언에서는 0x0046cfc0(4640704)이지만, 빅 엔디언에서는 0xc0cf46000000000(13893400341275213824)이다. 이것이 메모리 주소를 나타낸다는 걸 알고 있으니, 바이트만 봐도 리틀 엔디언임을 추론할 수 있다. 어떤 기계도 12635974테라바이트의 메모리를 갖지 않으니, 그런 값이 메모리 주소로 나올 수는 없다. 컴퓨터가 실제로 어떻게 동작하는지 조금만 알아도, 보는 것의 의미를 이해하는 데 큰 도움이 된다.

연습문제: 프로그램을 작성하여, 파일의 특정 오프셋에서 시작하는 64비트 리틀 엔디언 정수를 읽고 이를 “빅 엔디언”(최상위 바이트 먼저)으로 출력하라.

―――――

6. 결론: 시스템 프로그래밍의 정신

이 글이 시스템 프로그래밍의 ‘맛보기’를 제공했기를 바란다. 이 글이 말하는 “마인드셋” 혹은 “정신”을 요약하면:

스스로 도구를 만들어라.
데이터를 자신의 눈으로 직접 봐라.
추상화에 의존하기보다 시스템을 이해하라.
전부 바이트일 뿐이다.
프로그래머는 프로그램을 작성한다.

오늘은 여기까지다! 다음 글인 ‘Starting Systems 2: 당신의 프로그램과 바깥세상’에서는 프로그램이 실제로 일을 하는 방법—파일과 시스템 콜을 통해 바깥세상과 상호작용하는 방법, 메모리 관리, 환경 변수와 명령줄 인자, 표준 입력과 출력 등—을 살펴볼 것이다.

다른 글 더 보기

사이드노트: 엔디언의 어원

“endianess”라는 용어는 『걸리버 여행기』를 참조한 것이다. 1980년 4월 1일 Danny Cohen의 “ON HOLY WARS AND A PLEA FOR PEACE”에서 유래했으며, 인터넷 아카이브에 원문 사본이 있다. 여기에는 『걸리버 여행기』의 관련 부분이 인용되어 있다:

It began upon the following occasion.

It is allowed on all hands, that the primitive way of breaking eggs before we eat them, was upon the larger end: but his present Majesty’s grandfather, while he was a boy, going to eat an egg, and breaking it according to the ancient practice, happened to cut one of his fingers. Whereupon the Emperor his father published an edict, commanding all his subjects, upon great penalties, to break the smaller end of their eggs.

The people so highly resented this law, that our Histories tell us there have been six rebellions raised on that account, wherein one Emperor lost his life, and another his crown. These civil commotions were constantly formented by the monarchs of Blefuscu, and when they were quelled, the exiles always fled for refuge to that Empire.

It is computed, that eleven thousand persons have, at several times, suffered death, rather than submit to break their eggs at the smaller end. Many hundred large volums have been published upon this controversy: but the books of the Big-Endians have been long forbidden, and the whole party rendered incapable by law of holding employments.

During the course of these troubles, the emperors of Blefuscu did frequently expostulate by their ambassadors, accusing us of making a schism in religion, by offending against a fundamental doctrine of our great prophet Lustrog, in the fifty-fourth chapter of the Brundecral (which is their Alcoran). This, however, is thought to be a mere strain upon the text: for their words are these; That all true believers shall break their eggs at the convenient end: and which is the convenient end, seems, in my humble opinion, to be left to every man’s conscience, or at least in the power of the chief magistrate to determine.