시스템 프로그래밍 시작하기: 2부: 프로그램과 바깥세상: 시스템 콜 & 파일

1. 시스템 프로그래밍 시작하기: 2부: 프로그램과 바깥세상: 시스템 콜 & 파일

Efron Licht의 소프트웨어 아티클

2025년 3월

이 글은 시스템 프로그래밍의 기초를 다루는 4부작 중 두 번째입니다. 1부에서는 바이너리 파일을 조사하기 위한 툴셋으로 여러 프로그램을 작성했고, 리눅스 시스템의 실행 바이너리 형식을 정의하는 ELF(Executable and Linkable Format) 파일 포맷 개요로 마무리했습니다. 이번 글에서는 프로그램이 바깥세상과 상호작용하는 방법을 파고들기 시작합니다. 다음과 같은 질문에 답할 것입니다.

파일을 실제로 읽고 쓰려면 어떻게 해야 할까? 파일 디스크립터란?
내 프로그램은 커맨드라인 인자를 어떻게 받는 걸까? 플래그는 위치 인자와 어떻게 다를까?
프로세스 환경이란 _무엇_일까? 환경 변수를 설정하거나 읽을 때 실제로는 무슨 일이 일어날까?
유닉스 파일 권한은 도대체 어떻게 되어 있을까?
프로그램이 시작된다는 것은 실제로 어떤 과정일까?
시스템 콜이란 무엇이고, 왜/어떻게 사용해야 할까?

늘 그렇듯 이 질문들에 답하기 위해 많은 프로그램을 작성할 것이고, 최종적으로는 셸(shell)을 밑바닥부터 직접 만들어 볼 것입니다.

사이드노트: 라이브러리 함수

이번 글은 지난 글보다 할 일이 훨씬 많아서, 간결함을 위해 프로그램들 사이에서 함수를 명시적으로 import하거나 복사하지 않고 재사용할 겁니다. 재사용하는 함수는 이 글 안에서 이전에 정의된 것만 사용할 것이니, ctrl+f로 찾으면 원하는 정의를 꽤 빨리 찾을 수 있을 거예요.

1. 도입: 프로그램과 바깥세상

프로그램은 자기 메모리를 조작할 수 있지만, 실제로 뭔가를 하려면 바깥세상에 접근해야 합니다. 파일을 읽거나, 네트워크 소켓에 쓰거나, 다른 프로그램을 시작하거나, 심지어 종료하는 것까지요.

일반적으로 프로그램이 바깥세상과 상호작용하는 방식은 흔한 순서대로 크게 세 가지가 있습니다.

프로그램이 시작될 때 커맨드라인 인자와 **프로세스 환경(process environment)**을 함께 받습니다. 이들은 해석되지 않은 단순한 문자열들이며, 프로그램이 직접 해석해야 합니다.
프로그램은 보통 파일로 표현되는 다른 데이터를 읽고(read), 쓰고(write), **실행(execute)**할 수 있습니다. 파일은 보통 물리 디스크 위의 영구 저장소를 뜻하지만, 네트워크 소켓, 파이프, 심지어 물리 장치 같은 것들도 흔한 “파일”입니다. 이는 (권한이 있다면) **시스템 콜(system call)**을 통해 이루어집니다.
프로그램은 운영체제나 다른 프로세스가 보낸 **시그널(signal)**에 의해 중단될 수 있습니다. 시그널은 운영체제가 프로그램에게 어떤 일이 발생했음을 알리는 방식입니다. 가장 흔한 시그널은 SIGINT(“SIGnal INTerrupt”)로, 프로그램에게 종료를 요청합니다. 셸에서 ctrl+c를 누르면 포그라운드에서 실행 중인 프로그램에 SIGINT를 보냅니다. 이런 시그널도 시스템 콜로 생성됩니다.

곁가지: 공유 메모리 & 메모리 매핑 I/O

바깥세상과 상호작용하는 또 다른 방법으로는 메모리 매핑 I/O(memory-mapped IO)나 공유 메모리(shared memory)가 있습니다.

공유 메모리는 여러 프로세스가 같은 메모리에 쓰는 방식으로 통신하게 해 줍니다.

메모리 매핑 I/O는 메모리에 대한 ‘평범한’ 읽기/쓰기가 I/O 동작을 트리거하게 해 줍니다.

이 방식들은 시스템 콜과 그에 따른 컨텍스트 스위치를 건너뛸 수 있어서 성능상의 이점이 있습니다. 빠르지만 위험합니다. 이 시리즈 범위를 넘어서지만, 존재는 알아두세요.

먼저 커맨드라인 인자부터 시작합시다. 아마 익숙할 테니 빠르게 다루겠습니다.

2. `args`

프로그램이 시작되면 **커맨드라인 인자(command-line arguments)**라는 문자열 리스트를 받습니다. 보통 이는 셸에서 프로그램을 시작할 때 입력한 내용 그대로입니다.

커맨드라인 인자를 표준 출력으로 출력하는 printargs 프로그램을 작성해 봅시다.

1// printargs.go prints its command-line arguments to standard output.
2package main
3func main() {
4	for i, arg := range os.Args {
5		fmt.Fprintf(os.Stdout, "%d: %s\n", i, arg)
6	}
7}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./printargs.go foo bar
4# OUT
50: /tmp/go-build389224751/b001/exe/printargs
61: foo
72: bar

어라, 이상해 보이죠. go run이 프로그램을 _컴파일_한 다음 _실행_하기 때문에 첫 번째 인자가 컴파일된 바이너리 경로(여기서는 임시 디렉터리)입니다. 단계를 나눠서 다시 해 봅시다.

1#!/usr/bin/env bash
2# IN
3go build -o printargs ./printargs.go
4./printargs foo bar
5# OUT
60: ./printargs
71: foo
82: bar

이제 정상입니다. 보시다시피 커맨드라인 인자는 _아무 해석 없이 제공된 문자열_입니다. 이걸 어떻게 해석할지는 프로그램이 결정합니다.

커맨드라인 인자는 문자열이지만, 보통 두 가지 중 하나로 해석됩니다. 플래그(flags) 또는 **위치 인자(positional arguments)**입니다.

2.1. 플래그

flag는 대시 하나 -(“짧은” 플래그) 또는 두 개 --(“긴” 플래그)로 시작하는 커맨드라인 인자입니다. 플래그는 프로그램의 옵션이나 설정을 지정하는 데 사용됩니다. 짧은/긴 플래그 모두, 토글(예: --verbose)이거나 값을 가집니다. 전통적으로 값은 두 번째 인자(--output file.txt)로 주거나, 등호(--output=file.txt)로 줄 수 있습니다.

대부분의 프로그램은 플래그와 위치 인자를 섞어 받습니다. 유닉스 표준 grep를 예로 들어봅시다.

1#!/usr/bin/env bash
2# IN
3grep -r --color=always "foo" /tmp # /tmp에서 "foo"를 재귀적으로 검색, 컬러 출력

전통적으로 플래그는 위치 인자 앞에 오지만, 이는 관례일 뿐입니다. 어떤 프로그램은 위치 인자 뒤에 플래그를 허용하기도 하지만, 헷갈릴 수 있습니다.

플래그와 위치 인자를 분리하는 parseflags 프로그램을 작성해 봅시다. 실제로 이렇게 손수 구현할 일은 거의 없겠지만, 최소 한 번은 직접 논리를 따라가 보는 게 개념을 내재화하는 데 도움이 됩니다.

2.2. `parseflags.go`

2.2.1. 개요

플래그가 아닌 인자를 만나거나, --를 만나거나, 중복 플래그가 나오거나, 인자가 끝날 때까지 CLI 인자를 소비하면서 플래그를 얻는다.

1.   `--`는 이후의 모든 인자를 위치 인자로 강제한다.

2.   중복 플래그는 즉시 오류로 플래그 파싱을 종료한다.

3.   첫 번째 비-플래그 인자는 플래그 파싱을 종료한다. 나머지는 위치 인자다.

4.   `-short`와 `--long` 플래그는 동일하게 취급한다.

2. 플래그와 위치 인자를 표준 출력으로 출력한다.

1
 2// parseflags parses command-line flags and returns a map of flag names to values and a slice of positional arguments.
 3// the first non-flag argument terminates flag parsing; pass '--' to force all remaining arguments to be positional.
 4// args should NOT include the command name.
 5// Flags are of the form -name=value, --name=value, -name value, or --name value: we don't
 6// treat short or long flags differently.
 7// It is an error to set a flag more than once.
 8func parseflags(args []string) (flags map[string]string, positional []string, err error) {
 9	flags = make(map[string]string)
10
11FLAGS:
12	for len(args) > 0 {
13		s := args[0]
14		if len(s) <= 1 { // can't possibly be a flag
15			break FLAGS
16		}
17		if s == "--" { // end of flags
18			args = args[1:] // consume "--"
19			break FLAGS
20		}
21		if s[0] != '-' { // not a flag
22			break FLAGS
23		}
24		// strip off up to two leading '-'s
25		if s[1] == '-' {
26			s = s[2:]
27		} else {
28			s = s[1:]
29		}
30
31		// it's now a potential flag. is it of the form -name=value?
32		// look for '='
33		for i := range s {
34			if s[i] == '=' {
35				key, value := s[:i], s[i+1:]
36				if _, ok := flags[key]; ok {
37					return nil, nil, fmt.Errorf("flag -%s already set", key)
38				}
39				flags[key] = value
40				args = args[1:] // we've consumed one arg
41				continue FLAGS
42			}
43		}
44
45		// it's not of the form -name=value. the next arg is the value.
46		// is there a next arg?
47
48		if len(args) == 1 { // no. error.
49			return nil, nil, fmt.Errorf("flag -%s missing value", s)
50		}
51
52		key, value := s, args[1] // the next arg is the value
53		if _, ok := flags[key]; ok {
54			return nil, nil, fmt.Errorf("flag -%s already set", key)
55		}
56		flags[key] = value
57		args = args[2:] // we've consumed two args
58
59	}
60	// what's left is positional arguments.
61	return flags, args, nil
62}
63

실행해 봅시다.

1#!/usr/bin/env bash
 2# IN
 3go run . -name efron --animal tapir -foo bar positional --notflag efron
 4# OUT
 5flag name=efron
 6flag animal=tapir
 7flag foo=bar
 8positional 0=positional
 9positional 1=--notflag
10positional 2=efron

남은 인자를 모두 위치 인자로 강제하는 -- 유사 플래그도 테스트해 봅시다.

1#!/usr/bin/env bash
2# IN
3go run . -name efron -- --animal tapir
4# OUT
5flag name=efron
6positional 0=--animal
7positional 1=tapir

플래그가 아니면 위치 인자입니다.

2.3. 위치 인자

_위치 인자(positional argument)_는 인자 리스트에서의 위치에 따라 해석됩니다. cp [src] [dst]는 source를 destination으로 복사합니다. src와 dst가 위치 인자입니다.

보통 위치 인자는 필수이고 플래그는 선택인 경우가 많지만, 이것도 관례일 뿐입니다. 어떤 저자는 모든 것을 플래그로 처리하는 걸 선호하기도 합니다. 특히 타입이 비슷한 위치 인자가 여러 개라면, 위치 인자는 몇 개 이상 두지 않는 것이 좋습니다.

커맨드라인 인자는 execve 시스템 콜을 통해, 프로세스 환경(보통 env라고 부름)과 함께 프로그램 시작 시 전달됩니다. 그게 다음 섹션 주제입니다.

3. 프로세스 환경(`env`)

운영체제는 프로그램에 **환경(environment)**도 제공합니다. 이는 =로 구분된 키-값 쌍의 리스트입니다. 환경은 프로그램에 구성 정보를 전달하는 데 쓰입니다. 각 프로세스는 자신이 시작한 자식 프로세스에 자신의 환경을 전달해야 하는 것으로 기대되며, 이를 환경을 “상속(inherit)”한다고 합니다. 관례적으로 환경 변수 이름은 SCREAMING_SNAKE_CASE를 씁니다.

기본적인 환경 변수 예로는 사용자(USER), 홈 디렉터리(HOME), 셸(SHELL) 등이 있습니다.

프로세스 환경을 들여다보는 printenv 프로그램을 작성해 봅시다.

3.1. `printenv.go`

3.1.1. 개요:

Go 런타임에서 KEY=VALUE 형태의 환경 변수 리스트를 가져온다.
사용자가 제공한 각 키의 값을 찾아 표준 출력으로 출력한다.
모두 찾았으면 상태 코드 0으로 종료한다.
누락된 변수를 모두 표준 에러로 출력하고 상태 코드 1로 종료한다.

printenv.py: 여기 클릭

1// printenv.go prints the value of each environment variable given as an argument.
 2// it exits with status 1 if any of the variables are not found.
 3package main
 4
 5import (
 6	"fmt"
 7	"os"
 8)
 9
10func main() {
11	// 1. get the list of environment variables in form KEY=VALUE from the go runtime.
12	env := os.Environ()
13	var printed int
14	keys := os.Args[1:]
15	// 2. look up the value of each key provided by the user and print it to standard output.
16	for _, key := range keys {
17		val, ok := lookupenv(env, key)
18		if ok {
19			fmt.Fprintf(os.Stdout, "%s\t%s\n", key, val)
20			printed++
21		}
22	}
23	// 3. exit with status 0 if all variables were found.
24	if printed == len(keys) {
25		os.Exit(0) //
26	}
27	// 4. print all missing variables to standard error and exit with status 1.
28
29	fmt.Fprintf(os.Stderr, "missing %d/%d environment variables\n", len(os.Args)-1-printed, len(os.Args)-1)
30	for _, key := range keys {
31		if _, ok := lookupenv(env, key); !ok {
32			fmt.Fprintf(os.Stderr, "%s\n", key)
33		}
34	}
35	os.Exit(1) // 4. Exit with status 0.
36}
37
38// look up an environment variable by name, returning its value and whether it was found.
39// in case of duplicates, return the last value.
40// environment variables are stored as "key=value" strings.
41func lookupenv(env []string, key string) (string, bool) { // LIBRARY FUNCTION: first defined in printenv.go
42
43	/*  You may have duplicated environment variables - the operating system doesn't care. It's up to the receiving program to decide what to do with them. Usually the last one wins; we'll do that here.
44	*/
45	for i := len(env)-1; i >= 0; i-- {
46		e := env[i]
47		if len(e) < len(key)+1 { // +1 for the '='
48			continue
49		}
50		if e[:len(key)] != key {
51			continue
52		}
53		if e[len(key)] != '=' {
54			continue
55		}
56		return e[len(key)+1:], true
57	}
58	return "", false
59}

일반적인 환경 변수 몇 개로 printenv를 테스트해 봅시다.

1#!/usr/bin/env bash
2# IN
3go run ./printenv.go USER HOME SHELL

1# OUT
2USER	efron
3HOME	/home/efron
4SHELL	/bin/bash

정리: bash 셸에서 환경 변수 설정하기

특정 프로그램을 한 번 실행할 때만 환경 변수를 설정하려면 ENVVAR=VALUE를 명령 앞에 붙입니다. 예: USER=efron go run ./printenv USER는 efron을 출력합니다.

셸 세션 나머지 동안 유지되도록 환경 변수를 설정하려면 export ENVVAR=VALUE를 사용합니다.

3.1.2. 예시: 셸 세션 나머지 동안 환경 변수 설정하기

1#!/usr/bin/env bash
2# IN
3export ANIMAL=WOOLY_TAPIR
4go run ./printenv ANIMAL
5# OUT
6ANIMAL	WOOLY_TAPIR

3.1.3. 예시: 단일 실행에만 환경 변수 설정하기

1#!/usr/bin/env bash
2export ANIMAL=BAIRDS_TAPIR # 셸 세션 나머지 동안 설정
3ANIMAL=MALAYAN_TAPIR go run ./printenv ANIMAL # 이번 실행에만; 셸 세션 변수를 덮어씀

사이드노트: HOME과 ~

일부 프로그램은 틸드(~)를 홈 디렉터리를 나타내는 데 사용하지만, 이는 운영체제의 근본 기능이라기보다 프로그램별 편의 기능입니다.

bash 같은 대부분의 셸은 ~를 HOME 환경 변수 값으로 확장합니다.

다른 프로그램을 쓸 때 ~가 동작할 거라고 가정하지 마세요. 실제로 HOME 값을 조회하세요.

1#!/usr/bin/env bash
2# IN:
3echo ~
4printenv HOME
5
6# OUT:
7/home/efron
8/home/efron

printenv: 연습문제

인자가 제공되지 않으면 모든 환경 변수를 출력하도록 printenv를 수정해 보세요. 중복은 어떻게 처리해야 할까요?

환경 변수를 정렬된 순서로 출력하도록 printenv를 수정해 보세요. 중복은 어떻게 처리해야 할까요? 대소문자는 어떻게 처리해야 할까요?

사이드노트: 환경 변수 네임스페이싱

두 프로그램이 같은 환경 변수 이름을 쓰는 것을 방지하는 메커니즘은 없습니다. 충돌을 피하는 것은 우리에게 달려 있습니다. PATH, HOME, NAME 같은 짧거나 흔한 이름은 피하세요. 충돌 가능성이 낮은 짧은 접두어를 붙이는 것이 좋습니다. 예를 들어 회사가 Tapir Technology이고 “monitor”라는 프로그램을 만든다면, LOG_LEVEL 대신 TT_MONITOR_LOG_LEVEL 같은 이름을 사용할 수 있습니다.

좋습니다. 환경 변수는 단순한 키-값 매핑인데… 프로그램은 애초에 이 값을 어떻게 받는 걸까요?

Q: 프로그램은 이 값들을 어떻게 얻나요? A: 운영체제가 프로그램 시작 시 execve 시스템 콜을 통해 제공합니다.

환경 변수는 execve 시스템 콜로 프로그램을 시작할 때 전달됩니다. execve(path, args, env)는 현재 프로세스를 path에 있는 파일로부터 시작되는 새로운 프로세스로 대체하며, 인자 args와 환경 env를 함께 넘깁니다. execve를 이야기하려면, 먼저 시스템 콜과 파일에 대해 조금 알아야 합니다.

4. 시스템 콜

시스템 콜이 _무엇_일까요? 시스템 콜은 운영체제로의 “함수 호출”로, 창의적으로 SYSCALL이라 이름 붙은 특수한 기계 명령으로 실행됩니다. SYSCALL 명령은 운영체제에 제어권을 넘기면서 어떤 일을 해 달라는 _요청_을 전달합니다. 운영체제는 그 일을 수행하거나, 할 수 없다고 알려준 뒤 제어권을 다시 프로그램으로 돌려줍니다.

일반적으로 시스템 콜을 직접 호출하지는 않습니다. 시스템 콜은 아키텍처(amd64, riscv, arm64, 6502) 및 운영체제(linux, windows, macos, freebsd)에 종속적이기 때문입니다.

대신 시스템 콜을 대신 호출해 주는 라이브러리 함수를 사용합니다. 보통은 libc의 syscall입니다. 우리는 Go의 syscall 패키지를 사용할 겁니다. 이 글의 나머지에서는 특히 syscall.Syscall과 syscall.Syscall6 함수를 사용할 겁니다.

정리: 시스템 콜 라이브러리 함수

시스템 콜 라이브러리 함수는 다음을 수행합니다.

나중을 위해 레지스터를 저장

시스템 콜에 맞는 올바른 레지스터에 인자들을 배치

SYSCALL opcode를 사용해 커널 모드로 전환

… 운영체제가 처리 … <– 실제 시스템 콜은 여기

유저 모드로 복귀

레지스터 복구

시스템 콜 결과 반환

에러 처리

TLDR: 시스템 콜 라이브러리 함수는 시스템 콜을 평범한 함수 호출로 바꿔준다.

4.1. Go의 `syscall.Syscall`과 `syscall.Syscall6`로 보는 시스템 콜 기초

r1, r2, errno := syscall.Syscall(syscallno, arg1, arg2, arg3)는 인자가 3개 이하인 일반적인 시스템 콜에 사용합니다.
r1, r2, errno := syscall.Syscall6(syscallno, arg1, arg2, arg3, arg4, arg5, arg6)는 인자가 4~6개인 시스템 콜에 사용합니다.

syscallno(‘syscall number’)는 호출하려는 시스템 콜의 번호이고, arg들은 그 시스템 콜의 인자입니다. 각 syscall 번호를 라이브러리의 함수라고 생각하면, 나머지는 그 함수의 인자입니다. 역사적 이유로 syscallno를 “트랩(trap)”이나 “인터럽트(interrupt)”라고 부르기도 합니다.

파일시스템 연산인 write와 read는 가장 흔한 시스템 콜 중 일부입니다.

write(fd, buf, count)는 buf에서 시작하는 버퍼로부터 최대 count 바이트를 파일 디스크립터 fd가 가리키는 파일에 씁니다. 반환값은 실제로 쓴 바이트 수입니다.

정리: 파일 디스크립터

시스템 콜은 파일 “객체”를 대상으로 동작하지 않습니다. 객체는 실재하지 않습니다. 기계는 레지스터와 메모리만 알고 있습니다. 대신 시스템 콜은 열린 파일을 가리키기 위해 정수인 “파일 디스크립터”(보통 fd)를 사용합니다. 운영체제는 프로세스마다 열린 파일 테이블을 유지합니다. 파일 디스크립터는 그 테이블의 인덱스입니다. 이 파일 디스크립터는 파일시스템에서의 파일 위치와 반드시 대응하지는 않습니다. 운영체제는 프로그램이 시작될 때 자동으로 세 개의 파일 디스크립터를 열어 줍니다.

STDIN(“표준 입력”)은 fd 0.

STDOUT(“표준 출력”)은 fd 1.

STDERR(“표준 에러”)은 fd 2

워밍업으로, write 시스템 콜을 Go 함수로 감싸 봅시다.

4.1.1. `write` 시스템 콜 감싸기

1func gowrite(fd int, buf []byte) (int, error) {  // library function
 2	"""write the contents of buf to the file descriptor fd"""
 3	n, _, errno := syscall.Syscall(
 4		syscall.SYS_WRITE, // which syscall?
 5		uintptr(fd),       // write to standard output
 6		uintptr(unsafe.Pointer(&buf[0])), // where to write from
 7		uintptr(len(buf)), // how many bytes to write
 8	)
 9	return int(n), errno // errno implements the error interface
10}

정리: 포인터, unsafe.Pointer, uintptr

기계 내부에는 “포인터” 같은 건 없습니다. 레지스터와 메모리만 있을 뿐입니다. 포인터는 특정 메모리 주소를 “가리키는” 숫자일 뿐입니다. 운영체제에 데이터를 어디서 읽고/어디에 써야 하는지 알려주려면, 우리가 할 수 있는 건 메모리 주소를 나타내는 숫자를 넘겨주고 운영체제가 올바르게 해석해주길 바라는 것뿐입니다.

Go에서는 unsafe.Pointer를 거쳐 포인터를 숫자로 바꾸는 uintptr(포인터를 담을 수 있을 만큼 큰 unsigned integer; pointe**r의 크기)로 이를 할 수 있습니다.

지난번처럼 표준 출력에 “hello, world!”를 출력하는 프로그램을 작성하되… 이번에는 Go의 fmt 대신 시스템 콜을 사용해 봅시다.

4.2. syscallhelloworld.go

4.2.1. 개요

문자열 "hello, world!\n"를 구성하는 바이트 시퀀스의 포인터를 얻는다.
write 시스템 콜을 사용해 그 바이트들을 표준 출력에 쓴다.
성공/실패 여부에 따라 exit 시스템 콜로 프로그램을 종료한다.

hello-world-syscall.py: 여기 클릭

1// syscallhelloworld.go writes "hello, world!" to standard output using the write syscall.
 2package main
 3import (
 4	"syscall"
 5	"unsafe"
 6)
 7func main() {
 8	var b = []byte("hello, world!\n")
 9	n, _, errno := syscall.Syscall(
10		syscall.SYS_WRITE, // which syscall?
11		uintptr(fd),       // write to standard output
12		uintptr(unsafe.Pointer(&buf[0])), // where to write from
13		uintptr(len(buf)), // how many bytes to write
14	)
15	if errno != 0 {
16		fatalf("write: %v\n", errno)
17	}
18	if n != len(buf) {
19		fatalf("write: wrote %d bytes, expected %d\n", n, len(buf))
20	}
21}
22
23// fatalf writes a formatted string to standard error and exits with status 1. LIBRARY
24func fatalf(format string, args ...interface{}) {
25	buf := []byte(fmt.Sprintf(format, args...))
26	syscall.Syscall(syscall.SYS_WRITE, STDERR, uintptr(unsafe.Pointer(&buf[0])), uintptr(len(buf))) // no point in checking the error here; we're about to exit.
27	syscall.Syscall(syscall.SYS_EXIT, 1, 0, 0) // exit with status 1
28}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./hello-world-syscall.go
4
5# OUT
6hello, world!

동작합니다! 파일을 읽거나 쓸 때마다 내부적으로는 이런 일이 일어납니다.

파일을 더 살펴봅시다. 지난번에는 파일을 이어 붙이는 cat을 만들었습니다. 시스템 콜로도 똑같이 해봅시다.

시스템 콜로 파일에서 읽고 표준 출력에 쓰는 syscallcat 프로그램을 작성해 봅시다.

4.3. `syscallcat.go`

4.3.1. `syscallcat`: 개요

첫 번째 인자로 지정된 파일을 SYS_OPEN으로 연다.
SYS_READ로 덩어리(chunk) 단위로 메모리에 읽어들인다.
그 덩어리들을 SYS_WRITEE로 표준 출력에 쓴다.
SYS_FSYNC로 출력 버퍼를 디스크에 플러시한다. (나올 때 더 설명)
SYS_CLOSE로 파일을 닫는다.
SYS_EXIT로 종료한다.

아래 표는 syscallcat.go에서 사용하는 시스템 콜을 요약합니다.

4.3.2. `syscallcat`: 사용한 시스템 콜

이름	번호	인자	설명
close	3	fd	파일 디스크립터 `fd`를 닫는다
exit	60	status	상태 코드 `status`로 프로그램을 종료한다
fsync	74	fd	파일 디스크립터 `fd`를 디스크에 플러시한다
open	2	path, flags, mode	`flags`로 지정된 동작과 `mode` 권한으로 `path`의 파일을 연다
read	0	fd, buf, count	파일 디스크립터 `fd`에서 `count` 바이트를 `buf`로 읽는다
write	1	fd, buf, count	`buf`에서 시작하는 `count` 바이트를 파일 디스크립터 `fd`에 쓴다

syscallcat.py: 여기 클릭

1// syscallcat.go opens the file specified by the first argument and writes its contents to standard output using raw syscalls.
  2package main
  3
  4import (
  5	"fmt"
  6	"os"
  7	"syscall"
  8	"unsafe"
  9)
 10
 11func main() {
 12	if len(os.Args) != 2 {
 13		fmt.Fprintf(os.Stderr, "usage: %s <file>\n", os.Args[0])
 14		os.Exit(1)
 15	}
 16
 17	path := []byte(os.Args[1]) 	 // convert the string to a byte array so we can point to it
 18	path = append(path, 0)          // null-terminate the string
 19	ptr := unsafe.Pointer(&path[0]) // point to the first byte of the array
 20	const MODE = syscall.O_RDONLY   // open the file for reading only
 21	const FLAGS = 0                 // we don't need any
 22
 23
 24	// 1. Open the file specified by the first argument w/ `SYS_OPEN`.
 25
 26	fileDescriptor, _, err := syscall.Syscall(
 27		syscall.SYS_OPEN,
 28		uintptr(unsafe.Pointer(ptr)),
 29		MODE,
 30		FLAGS,
 31	)
 32	if err != 0 {
 33		fatalf("open: %v\n", err)
 34	}
 35
 36
 37	// 2. Read chunks into memory with `SYS_READ`.
 38	// we've now opened the file. let's read from it and copy the data to standard output.
 39	var buf [1024]byte // 1KB to read into
 40
 41READ:
 42	for {
 43		ptr := &buf[0] // point to the first byte of the buffer
 44		n, _, readErr := syscall.Syscall(
 45			syscall.SYS_READ,             // which syscall?
 46			fileDescriptor,               // tell it to read from the file we opened
 47			uintptr(unsafe.Pointer(ptr)), // where to write the data?
 48			uintptr(len(buf)),            // how many bytes to read?
 49		)
 50
 51		// we'll check the error in a second - we may have read some data even if there was an error.
 52
 53
 54		// 3. Write those chunks to standard output with `SYS_WRITE`.
 55
 56		// standard output just another file descriptor: it's automatically opened for us when the program starts. it's always file descriptor 1.
 57		const FD_STDOUT = 1
 58
 59		// we want to write all the data we read to standard output.
 60		// writes are not guaranteed to write all the data you ask for in one go. among other things,
 61		// signals like SIGPIPE or SIGINT can interrupt them (more on that later).
 62		// we need to keep writing until we've written all the data we read.
 63		// functions like io.Copy usually do this for you.
 64		for offset := uintptr(0); offset < n; {
 65			ptr := &buf[offset] // point to the first byte we need to write
 66			m, _, writeErr := syscall.Syscall(
 67				syscall.SYS_WRITE,
 68				FD_STDOUT,
 69				uintptr(unsafe.Pointer(ptr)),
 70				n,
 71			)
 72			if m == n {
 73				continue READ
 74			}
 75			if writeErr != 0 {
 76				fatalf("write: %v\n", writeErr)
 77			}
 78			offset += m
 79		}
 80
 81		if readErr != 0 {
 82			fatalf("read: %v\n", readErr)
 83		}
 84
 85		if n == 0 { // we've read all the data; exit the loop
 86			break READ
 87		}
 88	}
 89
 90	// we've now written all the data we read from standard input to standard output... or have we?
 91	// it's usually pretty inefficient to do lots of small writes to permanent storage, so operating systems maintain a buffer of data to write to disk when it's convenient.
 92	// we can force the operating system to write that buffer to disk with the fsync syscall.
 93	// fsync(fd) writes the buffer for file descriptor fd to disk, blocking until it's done.
 94	// the similarly-named sync() does this for _all_ open files; it's usually better to be specific.
 95	_, _, _ = syscall.Syscall(syscall.SYS_FSYNC, FD_STDOUT, 0, 0) // no error checking here. we're about to exit anyway.
 96
 97	// 5. Close the file with `SYS_CLOSE`.
 98	syscall.Syscall(syscall.SYS_CLOSE, fileDescriptor, 0, 0)
 99
100	// 6. Exit with `SYS_EXIT`.
101	syscall.Syscall(syscall.SYS_EXIT, 0, 0, 0)
102}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3echo "hello, world!" > hello.txt
4syscallcat hello.txt
5# OUT
6hello, world!

4.3.3. `syscallcat` 연습문제:

파일을 지정하지 않으면 표준 입력에서 읽도록 syscallcat을 수정해 보세요. 힌트: 표준 입력은 파일 디스크립터 0입니다.

여러 파일을 읽어서 표준 출력에 이어 붙이도록 syscallcat을 수정해서, 실제 cat 도구를 구현해 보세요.

syscallhello.go와 syscallcat.go는 프로그램이 바깥세상과 상호작용하는 방식의 감을 주었습니다. 그런데 프로그램은 애초에 어떻게 시작되는 걸까요? 아이콘을 클릭하거나 셸에 grep foobar를 치면 실제로 무슨 일이 일어나는 걸까요?

물론 시스템 콜입니다. execve는 (적절한 권한이 있다면) 새로운 프로그램을 실행하기 시작합니다.

그런데 잠깐, “권한”을 검사할 때 “당신(you)”은 누구죠? 권한이란 무엇일까요? 간단히 소유권과 접근 제어를 다룬 뒤, 다시 시스템 콜과 execve로 돌아가겠습니다.

5. 소유권과 접근 제어

현대 운영체제는 모두 다중 사용자, 다중 프로그램 운영체제입니다. 여러 사용자의 파일과 프로그램이 CPU, 메모리, 디스크 같은 자원을 공유할 수 있습니다. 이런 시스템에서는 일반 사용자가 시스템이나 다른 사용자를 망가뜨리지 못하도록 “누가 무엇을 건드릴 수 있는지”를 제어하는 것이 중요합니다. 이것이 **접근 제어(access control)**입니다.

이건 복잡한 주제고, 빠르게 대충 감을 잡기 위해 많은 것을 생략할 겁니다. 절대적인 정답처럼 받아들이지 마세요.

접근 제어의 원조이자 지금도 가장 흔한 형태는 **파일 권한(file permissions)**입니다.

사이드노트: 현대 운영체제의 계보

현대 운영체제는 두 갈래 중 하나에서 내려옵니다.

System V Unix (1983): 리눅스와 BSD의 기원이며, BSD를 통해 macOS로도 이어집니다.

Windows NT (1993): 현대 윈도우의 기원입니다.

파일 권한은 파일을 읽기, 쓰기, 실행할 수 있는 사람을 제어합니다. 편리하게도 이는 우리가 방금 다룬 시스템 콜 read, write, execve에 직접 매핑됩니다. 이것들이 ‘일반 함수’가 아니라 _시스템 콜_인 이유 중 하나는, 운영체제가 동작을 허용하기 전에 권한을 검사해야 하기 때문입니다.

파일 권한은 세 범주로 세상을 나눕니다. 파일을 소유한 사용자(user), 접근 권한을 공유하는 단 하나의 그룹(group), 그리고 **그 외 모든 사람(other)**입니다. 각 범주에 대해 읽기, 쓰기, 실행 권한을 개별적으로 설정할 수 있습니다.

표현은 보통 rwxr-xr-- 같은 기호 문자열을 쓰며, r은 읽기, w는 쓰기, x는 실행을 뜻하고 순서는 user, group, other입니다. 또는 0755 같은 **8진수(octal)**로 표현하기도 합니다. 위키피디아 문서가 8진수/기호 표현을 잘 설명하니 여기서는 자세히 다루지 않겠습니다.

몇 가지 예를 들어봅시다.

기호	8진수	설명
`rwxrwxrwx`	`0777`	누구나 읽기/쓰기/실행 가능
`---------`	`0000`	아무도 아무것도 못 함
`rwx------`	`0700`	소유자만 읽기/쓰기/실행 가능; 나머지는 아무것도 못 함
`rwxr-xr--`	`0754`	소유자는 읽기/쓰기/실행; 그룹은 읽기/실행; 다른 사람은 읽기만 가능

5.0.1. 연습문제: 파일 권한

프로그램을 작성하여 printperms라는 이름으로, 파일 권한을 기호 형태로 출력해 보세요.
프로그램을 작성하여 chmod라는 이름으로, 커맨드라인에서 지정한 권한으로 파일 권한을 변경해 보세요. 8진수와 기호 권한을 모두 받아야 합니다.
chmod를 수정해 user, group, other 범주에 대해 +와 - 연산자로 권한을 _수정_할 수 있게 해 보세요. 즉 chmod o+r file은 other에 읽기 권한을 추가하고, chmod g-w file은 group의 쓰기 권한을 제거해야 합니다.
chmod를 수정해 a(all)를 사용하면 모든 범주에 대해 한 번에 권한을 설정할 수 있게 해 보세요. 즉 chmod a-r file은 모든 범주에서 읽기 권한을 제거해야 합니다.

5.1. 사용자

유닉스는 사용자 목록을 유지하며, 각 사용자는 UID(User ID)라 불리는 유일한 숫자 ID와 사람이 읽을 수 있는 이름을 갖습니다. 각 파일과 프로세스는 그것을 ‘소유’하는 사용자가 있습니다. getuid 시스템 콜은 현재 사용자의 UID를 반환합니다. 사용자 이름은 사람이 보기 위한 편의이고, 운영체제는 UID를 사용합니다. id 명령은 현재 사용자의 UID와 GID를 출력합니다.

5.1.1. 연습문제: 사용자

프로그램을 작성하여 getusername라는 이름으로, 지정된 UID에 해당하는 사용자 이름을 출력해 보세요. UID가 지정되지 않으면 현재 사용자의 사용자 이름을 출력해야 합니다. 현재 사용자의 UID는 getuid 시스템 콜을 사용해 얻으세요.
- 힌트: /etc/passwd 파일에는 사용자 목록과 UID가 들어 있습니다. id 명령으로 자신의 사용자 이름과 UID를 찾고, 그걸로 파일 파싱 방법을 유추하세요.
- 힌트: 파일 _전체_를 파싱할 필요는 없습니다.

각 사용자는 최소 하나의 그룹(“기본(primary)” 그룹)에 속해야 하며, 더 많은 그룹에 속할 수도 있습니다. 다음 섹션에서 다룹니다.

5.2. 그룹

사용자와 마찬가지로 유닉스는 그룹 목록을 유지합니다. 각 그룹은 유일한 숫자 ID를 가지며 여기서는 UID가 아니라 GID라고 부릅니다. 그룹은 0명 이상의 사용자를 포함합니다. 단일 사용자 대신 사용자 _그룹_에 파일 권한을 부여할 수 있습니다. 예를 들어 학생들이 과제를 읽을 수는 있지만 쓰지는 못하도록 하는 식입니다. 각 그룹은 GID(Group ID)라는 유일한 숫자 ID와 사람이 읽을 수 있는 이름을 갖습니다. getgid 시스템 콜은 호출한 사용자의 기본 GID를 반환합니다. getegid는 호출한 사용자의 유효(effective) GID를 반환하는데, 이는 현재 파일 접근에 사용 중인 그룹의 GID입니다.

5.2.1. 연습문제: 그룹

프로그램을 작성하여 getgroupname이라는 이름으로, 지정된 GID에 해당하는 그룹 이름을 출력해 보세요. GID가 지정되지 않으면 현재 사용자의 그룹 이름을 출력해야 합니다. 현재 사용자의 GID는 getgid 시스템 콜을 사용해 얻으세요.
- 힌트: /etc/group 파일에는 그룹 목록과 GID가 들어 있습니다.
getgroupname에 -e 플래그를 추가해, 기본 GID 대신 현재 사용자의 유효 GID를 출력하게 해 보세요.
chmod와 chown으로 실험해서 getgroupname이 어떻게 동작하는지 확인해 보세요. 효과를 보려면 새 사용자/그룹을 만들어야 할 수도 있습니다.

프로그램이 시작되면, 그것을 시작한 프로세스의 사용자/그룹을 상속합니다. 그 프로세스들이 어떻게 시작되는지가 다음 섹션의 주제인 execve입니다.

6. `execve`로 프로그램 시작하기

execve(arg vector와 environment를 가진 execute)는 새로운 프로그램을 시작하는 시스템 콜로, 현재 프로세스를 새 프로세스로 대체합니다. 새 프로세스는 이전 프로세스의 파일 디스크립터는 유지하지만, 나머지는 모두 새것입니다.

셸의 핵심이 되는 프로그램을 작성해 봅시다. 표준 입력을 듣고, 자신의 인자를 stderr로 에코한 다음, 지정된 프로그램을 실행합니다. 지금은 /로 시작하는 절대 경로(absolute path)로 지정된 프로그램만 실행하게 제한하겠습니다.

6.0.1. 정리: C 배열과 `**byte`

C 문자열처럼, 배열은 널(0)로 끝나는 메모리 포인터일 뿐입니다. Go 관점에서 문자열이 *byte라면, 문자열 _배열_은 **byte입니다.

6.1. syscallexec.go

6.1.1. 개요

사용자가 인자를 제공했는지와, 그 인자가 절대 경로인지 확인한다.
Go 스타일의 커맨드라인 인자/환경([]string)을 C 스타일의 널 종료 문자열의 널 종료 배열(**byte)로 변환한다.
SYS_EXECVE 시스템 콜로 execve를 호출해 새 프로그램을 시작한다.

1// syscallexec.go runs another program specified by absolute path using the execve system call. it uses its first command-line argument as the path to the program to run and the rest of the command-line argument as the name of
 2// the program to run.
 3// it should have exactly the same effect as just running the program directly.
 4package main
 5
 6import (
 7	"fmt"
 8	"os"
 9	"syscall"
10	"unsafe"
11)
12
13func main() {
14	//1. Check that the user has provided an argument and that it's an absolute path.
15	if len(os.Args) < 2 {
16		fatalf("usage: %s <command> [args...]\n", os.Args[0])
17	}
18	goargs := os.Args[1:] // the first item is the command, the rest are arguments
19
20	// we'll cover the process PATH later. for now, let's protect our users from themselves.
21	if len(os.Args[1]) == 0 || os.Args[1][0] != '/' {
22		fatalf("error: %s is not an absolute path\n", os.Args[1])
23	}
24	exec(goargs, os.Environ()) // we'll talk about the environment in just a bit.
25}
26
27
28// execute a program, replacing the current process. on success, this never returns,
29// so err is always a non-zero syscall.Errno.
30func exec(args, env []string) error { // LIBRARY
31	// we need to convert the command and arguments to a slice of pointers to null-terminated strings to pass to execve.
32	// we need a null-terminated array of pointers to null-terminated strings.
33
34	cargs := make([]unsafe.Pointer, len(args)+1) // +1 for null terminator
35	for i := range args {
36		cargs[i] = cstr(args[i])
37	}
38
39	cenv := make([]unsafe.Pointer, len(env)+1) // +1 for null terminator
40	for i := range env {
41		cenv[i] = cstr(env[i])
42	}
43
44	path := cstr(args[0])
45
46	// 3. Call `execve` via the `SYS_EXECVE` syscall to start the new program.
47	_, _, err := syscall.Syscall(
48		syscall.SYS_EXECVE,
49		uintptr(path),                      // path to the program to run as null-terminated string
50		uintptr(unsafe.Pointer(&cargs[0])), // pointer to pointer to byte.
51		uintptr(unsafe.Pointer(&cenv[0])),  // pointer to pointer to byte.
52	)
53	return err
54}
55
56// 2. Convert the go-style command-line arguments and environment (`[]string`) to the C-style null-terminated arrays of null-terminated strings (`**byte`).
57// cstr converts a Go string to a null-terminated C string.
58// this allocates memory.
59func cstr(s string) unsafe.Pointer { // LIBRARY
60	b := make([]byte, len(s)+1)
61	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
62	return unsafe.Pointer(&b[0])
63}
64

echo(정확히는 /bin/echo)에 몇 개 인자를 넘겨 실행해 봅시다.

1#!/usr/bin/env bash
2# write the program to a file at the absolute path /bin/syscallexec
3go build -o /bin/syscallexec ./syscallexec.go
4
5# use it to run /bin/echo
6/bin/syscallexec /bin/echo hello, world!
7
8# call our program recursively
9/bin/syscallexec /bin/syscallexec /bin/echo hello, hello, world!

1# OUT
2hello, world!
3hello, hello, world!mj,jkm,

동작은 하지만 그다지 유용하진 않습니다. 한계 몇 가지는 다음과 같습니다.

execve가 현재 프로세스를 _대체_하기 때문에, 한 번에 정확히 한 프로그램만 한 번 실행할 수 있습니다. 먼저 프로세스를 fork해서 해결할 수 있습니다. 곧 다룹니다.
자식 프로그램을 제어할 수 없습니다. 이는 시그널과 파이프로 할 수 있습니다. 이것도 곧 다룹니다.
부모 환경을 수정 없이 상속합니다.

연습: syscallexec에 자식 프로세스의 환경을 설정하는 -e 플래그를 추가해 보세요. 여러 환경 변수를 교체할 수 있어야 합니다.

실행할 프로그램의 절대 경로를 알아야 합니다. 이는 PATH 환경 변수에서 프로그램을 찾아 해결할 수 있습니다. 이를 **명령 해석(command resolution)**이라 합니다.

6.1.2. 정리: `PATH` 환경 변수와 명령 해석

명령은 PATH 환경 변수에서 검색해 해석(resolve)됩니다. PATH는 프로그램을 찾기 위해 검색할 디렉터리들의 콜론(:) 구분 리스트입니다. 첫 번째로 일치하는 것이 승리합니다.

PATH=/bin:/usr/bin:/usr/local/bin이라면 다음 디렉터리를 포함합니다.

/bin

/usr/bin

/usr/local/bin

셸에 syscallexec를 입력하면 셸은 이 디렉터리들을 순서대로 검색하여 첫 번째 일치에서 멈춥니다. 이 경우 /bin/syscallexec가 존재하므로 그것을 실행합니다. syscallexec가 /bin/syscallexec로 **해석(resolve)**된다고 말합니다.

6.1.3. `bash` 셸에서의 명령 해석 예시

1> # IN
  2> # note: no /bin/ prefix
  3> syscallexec echo hello, world!
  4> ```
  5>
  6> ```bash
  7> # OUT
  8> hello, world!
  9> ```
 10
 11다음 섹션에서는 셸처럼 명령을 직접 해석하는 프로그램을 **작성**할 것입니다. 즉, 우리가 실행하는 프로그램이 무엇인지 `which`처럼 찾아내는 겁니다.
 
 ---
 
 ## 7. `whiche.go`로 명령 해석하기
 
 `whiche`(“위치”)는 셸이 하는 것처럼 `PATH` 환경 변수를 사용해 명령의 절대 경로를 찾습니다.
 
 ### 7.1. 개요
 
 1) Go 런타임에서 환경을 가져온다.
 
 2) 문자열 조작으로 `PATH` 환경 변수를 디렉터리 리스트로 해석한다.
 
 3) `SYS_STAT` 시스템 콜로 각 디렉터리에 파일이 존재하는지 확인한다.
 
 4) 일치가 없으면 표준 에러에 메시지를 출력하고 1로 종료한다.
 
 [whiche.py: 여기 클릭](https://gitlab.com/efronlicht/blog/-/blob/64b5bd1c71896796fd486c24d9e36aec688522ff/articles/startingsystems/cmd/pythonports/whiche.py)
 
 ```go
 // whiche.go ("witch-e") finds the absolute path to a command using the PATH environment variable like a shell would.
 package main
 
 import (
 	"fmt"
 	"os"
 	"syscall"
 	"unsafe"
 )
 
 func main() {
 	if len(os.Args) != 2 {
 		fmt.Fprintf(os.Stderr, "usage: %s <command> [args...]\n", os.Args[0])
 		os.Exit(1)
 	}
 	// 1. get the environment from the operating system.
 	env := os.Environ()
 	// 2. resolve the PATH environment variable to a list of directories via string manipulation.
 	var path []string
 	{
 		// this whole block is just doing `strings.Split(rawpath, ":")`.
 		// it's good to practice ordinary string manipulation every now and then.
 		var start int
 		rawpath := getenv(env, "PATH")
 		for i := range rawpath {
 
 			if rawpath[i] != ':' {
 				continue
 			}
 			if start == i {
 				continue // skip empty strings.
 			}
 			path = append(path, rawpath[start:i])
 			start = i + 1
 		}
 		if start < len(path) {
 			path = append(path, rawpath[start:])
 		}
 	}
 
 	// 3. find the first match in the PATH directories; print it to standard output and exit 0.
 	for _, dir := range path {
 		if path, err := exists(dir + "/" + os.Args[1]); err == nil && path {
 			// found it. print & exit.
 			fmt.Println(dir + "/" + os.Args[1])
 			os.Exit(0)
 		}
 	}
 	// 4. if no match is found, print an error message to standard error and exit 1.
 
 	fmt.Fprintf(os.Stderr, "%s: command not found\n", os.Args[1])
 	os.Exit(1)
 }
 
 // getenv retrieves the value of the environment variable named by the key, or an empty string if it's not set.
 func getenv(environ []string, key string) string {
 	key += "=" // environment variables are stored as "key=value"
 	n := len(key)
 	for i := range environ {
 		if len(environ[i]) < len(key) {
 			continue
 		}
 		// KEY=VALUE
 		if environ[i][:n] == key { // KEY=
 			return environ[i][n:] // VALUE
 		}
 
 	}
 	return ""
 }
 
 // check if a file exists using stat (https://linux.die.net/man/2/stat)
 // (true, nil) if it does.
 // (false, nil) if it doesn't.
 // (false, error) if there was an error.
 func exists(path string) (bool, error) {
 	p := cstr(path)
 	var statbuf [144]byte // we'll worry about this later.
 	// the STAT system call fills in a stat structure with information about the file,
 	// returning 0 on success and -1 on error.
 	// errno will be set to the error code if it fails.
 	_, _, err := syscall.Syscall(syscall.SYS_STAT, uintptr(p), uintptr(unsafe.Pointer(&statbuf)), 0)
 	switch err {
 	case 0: // success!
 		return true, nil
 	// the opaquely named ENOENT means Error NO ENTry; aka, "file not found".
 	case syscall.ENOENT: // file doesn't exist.
 		return false, nil
 	default: // some other error.
 		return false, err
 	}
 }
 
 // cstr converts a Go string to a null-terminated C string.
 // this allocates memory.
 func cstr(s string) unsafe.Pointer {
 	b := make([]byte, len(s)+1)
 	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
 	return unsafe.Pointer(&b[0])
 }

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./whiche.go echo
4
5# OUT
6/usr/bin/echo

whiche: 연습문제

whiche에 -a 플래그를 추가해, PATH 환경 변수에서 해석 순서대로 모든 매치를 줄바꿈으로 출력해 보세요.

whiche와 syscallexec 기능을 합쳐서, 이름으로 지정한 프로그램을 실행하는 run 프로그램을 작성해 보세요.

마지막으로 시그널을 다루고, 모든 걸 합쳐봅시다.

8. 시그널

때로는 프로그램의 제어 밖에서 일어난 일(깨진 네트워크 파이프 SIGPIPE, 사용자 인터럽트 SIGINT, 종료 요청 SIGTERM)을 프로그램에 알려야 합니다. 이런 것을 시그널이라고 합니다.

가장 흔한 시그널은 SIGSEGV(“segmentation fault”)로, 보통 널 포인터 역참조처럼 접근하면 안 되는 메모리에 접근하려고 할 때 운영체제가 보냅니다.

동작을 보기 위해 세그폴트로 크래시하는 프로그램을 작성해 봅시다.

8.1. segfault.go

1// https://go.dev/play/p/0l8t2y_aQ92
2package main
3
4func main() {
5	var nullptr *int
6	_ = *nullptr
7}

1#!/usr/bin/env bash
2# IN
3go run ./segfault.go

1# OUT
2panic: runtime error: invalid memory address or nil pointer dereference
3[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x468f22]
4
5goroutine 1 [running]:
6main.main()
7/home/efron/scratch/segfault.go:5 +0x2

이 시그널들은 kill 시스템 콜에서 비롯됩니다. kill(pid, signal)은 프로세스 ID pid를 가진 프로세스에 signal 시그널을 보냅니다. 이름이 오해를 부르는데, 많은 시그널이 프로세스를 죽이긴 하지만 대부분은 다른 용도입니다.

7.3.2. 정리: 흔한 시그널들

아래 표는 자주 접하게 될 시그널들을 요약합니다.

SIGNAL 설명 예시 비고
SIGINT 키보드 인터럽트; 보통 “편한 시점에 종료를 시작하라” 멈춘 커맨드라인 프로그램에서 ctrl+c
SIGPOLL 읽을 I/O가 준비됨 네트워크 데이터 대기
SIGTERM 지금 종료를 시작하라 kill 잡거나 무시할 수 있음
SIGKILL kill -9 잡거나 무시할 수 없음
SIGSEGV 메모리 접근 위반 위 예시 참고 프로그램 크래시

SIGNAL	설명	예시	비고
SIGINT	키보드 인터럽트; 보통 “편한 시점에 종료를 시작하라”	멈춘 커맨드라인 프로그램에서 `ctrl+c`
SIGPOLL	읽을 I/O가 준비됨	네트워크 데이터 대기
SIGTERM	지금 종료를 시작하라	`kill`	잡거나 무시할 수 있음
SIGKILL		`kill -9`	잡거나 무시할 수 없음
SIGSEGV	메모리 접근 위반	위 예시 참고	프로그램 크래시

다른 프로그램에 시그널을 보내는 프로그램을 작성해 봅시다.

8.2. sendsignal.go

7.3.3.1. 개요

커맨드라인에서 PID와 시그널을 정수로 파싱한다.
kill 시스템 콜로 주어진 PID의 프로세스에 시그널을 보낸다.

sendsignal.py: 여기 클릭

1package main
 2
 3import (
 4	"fmt"
 5	"os"
 6	"strconv"
 7	"syscall"
 8)
 9
10// sendsignal.go sends a signal to a process by PID.
11// usage: sendsignal <pid> <signal>
12func main() {
13	if len(os.Args) != 3 {
14		fatal(fmt.Errorf("usage: %s <pid> <signal>", os.Args[0]))
15	}
16	pid, err := strconv.Atoi(os.Args[1])
17	if err != nil {
18		fatal(err)
19	}
20	signal, err := strconv.Atoi(os.Args[2])
21	if err != nil {
22		fatal(err)
23	}
24	_, _, errno := syscall.Syscall(syscall.SYS_KILL, uintptr(pid), uintptr(signal), 0)
25	if errno != 0 {
26		fatal(errno)
27	}
28}
29
30func fatal(err error) {
31	fmt.Fprintln(os.Stderr, err)
32	os.Exit(1)
33}
34

테스트하려면 다음이 필요합니다.

시그널을 보낼 만큼 충분히 오래 실행되는 프로세스
그 프로세스의 PID

8.3. 정리: PID

PID(Process Identifier)는 시스템에서 프로세스를 유일하게 식별합니다. 각 프로세스는 getpid 시스템 콜로 자신의 PID를 알고, getppid 시스템 콜로 부모 프로세스(자신을 시작한 프로세스)의 PID를 알 수 있습니다. 모든 프로세스는 트리를 이루며, 루트는 init 프로세스(프로세스 1)입니다.

자기 PID를 출력하고 시그널을 기다리는 프로그램을 작성해 봅시다. 이름은 killme로 하겠습니다.

8.4. killme.go

8.4.1. 개요

getpid 시스템 콜로 현재 프로세스의 PID를 얻는다.
PID를 표준 출력에 한 번 출력한다.
15초마다 표준 에러로 ‘still alive’를 출력한다.

프로그램

killme.py: 여기 클릭

1package main
 2func main() {
 3	// 1. use the `getpid` system call to get the PID of the current process.
 4	pid, _, _ := syscall.Syscall(syscall.SYS_GETPID, 0, 0, 0)
 5	// 2. print the PID to standard output once.
 6	fmt.Println(pid)
 7	// 3. print 'still alive' to standard error every 15s.
 8	for ; ; time.Sleep(15*time.Second) {
 9		fmt.Fprintln(os.Stderr, "still alive")
10	}
11}

테스트해 봅시다. killme를 백그라운드 프로세스로 시작하고, ‘still alive’가 한 번 출력될 때까지 기다린 뒤, sendsignal로 SIGKILL(리눅스에서 9)을 보내겠습니다.

1#!/usr/bin/env bash
2# IN
3go run ./killme.go &

1# OUT
28656 # 이 숫자는 시스템마다 다릅니다.
3still alive

1#!/usr/bin/env bash
2# IN
3PID=8656 # 위에서 나온 값
4SIGNAL=9 # SIGKILL
5go run ./sendsignal.go $PID $SIGNAL

1# OUT
2signal: killed

killme 프로세스는 이제 ‘still alive’를 더 이상 출력하지 않아야 합니다. sendsignal을 다시 실행하면 에러가 납니다.

1#!/usr/bin/env bash
2# IN
3go run ./sendsignal.go 8656 9

1# OUT
2no such process

대부분의 시그널은 프로그램이 특별히 잡지(catch) 않는 한 대상 프로세스를 종료합니다. 사용하는 언어는 이를 설정하기 위해 sigaction 시스템 콜을 감싼 무언가를 제공할 것입니다. Go에서는 os/signal 패키지가 그 역할을 합니다.

SIGINT를 5번 받을 때까지 종료하지 않는 killmeslowly 프로그램을 작성해 봅시다. (ctrl+c는 셸에서 실행 중인 프로그램에 SIGINT를 보냅니다.) os/signal 패키지로 시그널을 잡겠습니다.

8.5. killmeslowly.go

이 프로그램에 대응하는 파이썬 버전은 없습니다.

8.5.1. 개요

PID를 표준 출력에 출력한다.
SIGINT에 대한 시그널 핸들러를 등록한다.
SIGINT를 받을 때마다 stderr에 5에서 1까지 카운트다운한다.
다섯 번째(마지막) SIGINT에서 상태 코드 0으로 종료한다.

8.5.2. 프로그램

1// killmeslowly runs until it receives 5 SIGINT signals.
 2package main
 3
 4import (
 5	"fmt"
 6	"os"
 7	"os/signal"
 8	"syscall"
 9)
10
11func main() {
12	// 1. Print our `PID` to standard output.
13
14	fmt.Println(os.Getpid())
15
16	//	2. Register a signal handler for `SIGINT`.
17
18	ch := make(chan os.Signal, 5)     // always make a buffer of at least 1 so you don't drop signals
19	signal.Notify(ch, syscall.SIGINT) // forward SIGTERM and SIGINT to ch
20
21	// 3. Count down from 5 to 1 on each `SIGINT` on stderr.
22	fmt.Fprintf(os.Stderr, "waiting for SIGINT\n")
23	remaining := 5
24	for range ch {
25		remaining--
26		if remaining == 0 {
27			fmt.Fprintf(os.Stderr, "exit\n")
28			os.Exit(0)
29		}
30		fmt.Fprintf(os.Stderr, "got SIGINT: %d more to exit\n", remaining)
31	}
32}

테스트해 봅시다. ctrl+c는 현재 실행 중인 프로그램에 SIGINT를 보내므로 백그라운드 프로세스가 필요 없습니다.

1#!/usr/bin/env bash
2# IN
3go run ./killmeslowly.go

1# OUT
 243555
 3waiting for SIGINT
 4# ctrl+c
 5got SIGINT: 4 more to exit
 6# ctrl+c
 7got SIGINT: 3 more to exit
 8# ctrl+c
 9got SIGINT: 2 more to exit
10# ctrl+c
11got SIGINT: 1 more to exit
12# ctrl+c
13exit

여기까지가 시그널의 기초입니다. 겉핥기만 했지만 방향을 잡기엔 충분할 겁니다. 언어 문서(예: Go의 os/signal)와 운영체제 문서(예: signal man page)를 꼭 읽어 보세요.

지금까지 많은 내용을 다뤘습니다. 새로 익힌 지식을 합쳐서 유용하고 시스템 프로그래밍스러운 걸 만들어 볼까요? 대화형 커맨드라인 인터프리터(= 셸)입니다.

핵심적으로 셸(또는 커맨드라인 인터프리터)은

표준 입력에서 한 줄을 읽고
그것을 명령으로 해석하고
명령을 실행합니다.

우리는 이제 리눅스 시스템 콜로 이 모든 것을 하는 법을 압니다. 마지막 섹션에서 이를 수행하는 syscallshell 프로그램을 작성해 봅시다.

9. `syscallshell`로 모두 합치기

syscallshell은 사용자가 입력한 서브커맨드를 실행하는 간단한 셸입니다.

9.1. 개요

표준 입력에서 한 줄씩 읽는다.
그 줄을 명령과 인자로 분리한다(지금은 따옴표나 이스케이프는 신경 쓰지 않는다).
PATH 환경 변수로 명령을 절대 경로로 해석한다.
새 프로세스를 fork한다.

1.   자식에서는 `execve`로 새 프로그램을 시작한다.

2.   부모에서는 자식이 끝날 때까지 기다린다.

5. 운영체제로부터 SIGINT(ctrl+c)로 종료 신호를 받을 때까지 반복한다. 메시지를 출력하고 상태 코드 0으로 종료한다.

이 프로그램의 파이썬 버전은 없습니다. 시간이 부족했습니다.

9.2. `syscallshell.go`

1// syscallshell.go implements a simple shell that runs subcommands entered by the user. it uses the PATH environment variable to find the commands to run.
  2// the shell reads a line at a time from standard input, splits it into arguments, and runs the command.
  3// example usage:
  4//
  5//	echo -e "echo hello\necho goodbye" | go run syscallshell.go
  6//	hello
  7package main
  8
  9import (
 10	"bufio"
 11	"errors"
 12	"fmt"
 13	"os"
 14	"strings"
 15	"syscall"
 16	"unsafe"
 17)
 18
 19func main() {
 20	// 1. read a line at a time from standard input.
 21	for scanner := bufio.NewScanner(os.Stdin); scanner.Scan(); { // scan a line at a time.
 22		line := scanner.Text()
 23		line = strings.TrimSpace(line)
 24		if line == "" {
 25			continue
 26		}
 27		// 2. Split the line into a command and arguments (don't worry about quotes or escaping for now).
 28		args := strings.Fields(line) // split on whitespace.
 29
 30		// the first argument is the command to run, as usual. now you get why it's like that ;).
 31
 32		// 3. Use the PATH environment variable to resolve the command to an absolute path.
 33		path, err := whiche(os.Environ(), args[0])
 34		if err != nil {
 35			fmt.Fprintf(os.Stderr, "%q: %v\n", args[0], err)
 36			continue
 37		}
 38
 39		name := path
 40
 41		// replace the command with the resolved path so call() has to do less work.
 42		args[0] = path
 43		// 4. Fork a new process; in the child, use `execve` to start the new program.
 44
 45		status, err := call(args, os.Environ())
 46		if err != nil {
 47			fmt.Fprintf(os.Stderr, "%s: %v\n", name, err)
 48		}
 49		if status != 0 {
 50			fmt.Fprintf(os.Stderr, "%s: exit status %d\n", name, status)
 51		}
 52	}
 53}
 54
 55// call runs a command like a shell would.
 56// it
 57//   - resolves the PATH environment variable to find the command (no syscalls, just string manipulation)
 58//   - forks a new process using FORK
 59//   - CHILD executes the command in the child process (syscall.EXECVE)
 60//   - PARENT waits for the child process to finish in the parent process.
 61//   - PARENT returns the exit status of the child process.
 62func call(args []string, env []string) (status int, err error) { // LIBRARY: first defined in syscallshell.go
 63	{ // bounds checks
 64		if len(args) == 0 {
 65			return 0, errors.New("no command")
 66		}
 67		if args[0] == "" {
 68			return 0, errors.New("empty command")
 69		}
 70		if args[0][0] != '/' {
 71			return 0, errors.New("command must be an absolute path; try using lookupPath")
 72		}
 73	}
 74	// 4. Fork a new process
 75	// spawn a new process.
 76	// we'll know if we're the parent or the child based on the return value of FORK.
 77	// the CHILD gets a 0.
 78	// the parent either gets PID > 0 (child's PID) or a negative number (error).
 79	pid, _, errno := syscall.Syscall(syscall.SYS_FORK, 0, 0, 0) // PID: *P*rocess *ID*entifier
 80	if errno != 0 {
 81		return status, fmt.Errorf("syscall: fork: %v", err)
 82	}
 83
 84	// there are 3 cases:
 85	// - we spawn a new process, it calls EXECVE, and succeeds (exit status 0)
 86	// - we spawn a new process, it calls EXECVE, and fails (exit status 1)
 87	// - we spawn a new process, it fails to call EXECVE (bad path? weird permissions? etc.)
 88
 89	// we want to find about about the third case. we can re-use the exit status to signal that the exec failed.
 90	const STATUS_FAILED_EXEC = 0xB01D // magic number to signal an exec error. picked at random.
 91
 92	// 4.1: in the child, use `execve` to start the new program.
 93	if isChild := pid == 0; isChild { // we're the child.
 94		err := exec(args, env)
 95		// WARNING: might be tempting to return the error here - but the parent process will never see it,
 96		// since we're the child. instead, we'll print the error and exit.
 97		// let's use our magic number to signal that we couldn't exec the command.
 98		fmt.Fprintf(os.Stderr, "syscall: execve: %v\n", err)
 99		os.Exit(STATUS_FAILED_EXEC)
100	}
101
102	// 4.2: in the parent, wait for the child to finish.
103	{ // we're the parent. wait for the child to finish.
104		// the WAIT system call waits for a child process to finish or for a signal, whichever comes first.
105		// returning the PID of the child and its exit status.
106		// if the child hasn't finished yet, it will block until it does.
107		// if the child has already finished, it will return immediately.
108		var waitstatus uint32
109		pid := syscall.Syscall(syscall.SYS_WAIT4, pid, uintptr(unsafe.Pointer(&waitstatus)), 0)
110		fmt.Fprintf(os.Stderr, "pid %d exited with status %d\n", pid, waitstatus)
111
112		// the exit status is in the upper 8 bits of the status.
113		// the lower 8 bits are the signal that killed the process, if any.
114		status = int(waitstatus >> 8)
115		if status == STATUS_FAILED_EXEC {
116			return status, errors.New("execve failed")
117		}
118		// signal handling could also happen here; we'll leave it out for now.
119		return status, nil
120	}
121}
122
123// execute a program, replacing the current process. on success, this never returns,
124// so err is always a non-zero syscall.Errno.
125func exec(args, env []string) error { // LIBRARY: first defined in syscallexec.go
126	// we need to convert the command and arguments to a slice of pointers to null-terminated strings to pass to execve.
127	// we need a null-terminated array of pointers to null-terminated strings.
128
129	cargs := make([]unsafe.Pointer, len(args)+1) // +1 for null terminator
130	for i := range args {
131		cargs[i] = cstr(args[i])
132	}
133
134	cenv := make([]unsafe.Pointer, len(env)+1) // +1 for null terminator
135	for i := range env {
136		cenv[i] = cstr(env[i])
137	}
138
139	path := cstr(args[0])
140
141	_, _, err := syscall.Syscall(
142		syscall.SYS_EXECVE,
143		uintptr(path),                      // path to the program to run as null-terminated string
144		uintptr(unsafe.Pointer(&cargs[0])), // arguments as null-terminated array pointer to null-terminated strings
145		uintptr(unsafe.Pointer(&cenv[0])),  // environment as null-terminated array pointer to null-terminated strings
146	)
147	return err
148}
149
150// whiche resolves the path to a command using the PATH environment variable like a shell would.
151// it returns the absolute path to the command, or an error if the command couldn't be found.
152// suppose our path is "/bin:/usr/bin:/usr/local/bin" and the command is "ls".
153// we'll look for "/bin/ls", "/usr/bin/ls", and "/usr/local/bin/ls", in that order.
154// the first one that exists is returned.
155// absolute paths are returned as-is.
156func whiche(env []string, command string) (string, error) {
157	// 4 cases
158	switch {
159	case strings.Contains(command, "..");
160		return "", errors.New("no weird relative paths, please") // we'll handle this later.
161	case command == "":
162		return "", errors.New("empty path")
163	case command[0] == '/': // already absolute. nothing to do.
164		return command, nil
165	case command[0] == '.': // relative path to the current working directory.
166		// working directories are somewhat complicated; different shells have different rules.
167		// we'll skip this for now and rely on the operating system to handle it.
168		wd, _ := os.Getwd()
169		return lookupPath(command, wd)
170	default: // look up the command in the PATH environment variable.
171
172		// we'll resolve the PATH environment variable to find the command.
173		// we know from our last program that environment variables are handed to a program as an array of null-terminated strings
174		// by the EXECVE syscall. the go runtime converted those to go-style strings for us before go's main() function was called;
175		// we'll grab them with os.Environ() and convert them back to C-style strings.
176		// find the value of the PATH environment variable; it's a ':'-separated list of directories, like /bin:/usr/bin:/usr/local/bin
177		pathEnv := getenv(env, "PATH")
178
179		// resolve the PATH environment variable into a list of directories... PATH=/bin:/usr/bin:/usr/local/bin -> ["/bin", "/usr/bin", "/usr/local/bin"]
180		dirs := strings.Split(pathEnv, ":") // this doesn't handle certain kinds of quoting & escaping, but it's good enough for now.
181		// find the command in the PATH directories.
182		// e.g. if the command is "ls", we'll look for "/bin/ls", "/usr/bin/ls", and "/usr/local/bin/ls".
183		return lookupPath(command, dirs...)
184	}
185}
186
187// getenv retrieves the value of the environment variable named by the key, or an empty string if it's not set.
188func getenv(environ []string, key string) string {
189	key += "=" // environment variables are stored as "key=value"
190	n := len(key)
191	for i := range environ {
192		if len(environ[i]) < len(key) {
193			continue
194		}
195		// KEY=VALUE
196		if environ[i][:n] == key { // KEY=
197			return environ[i][n:] // VALUE
198		}
199
200	}
201	return ""
202}
203
204// lookupPath searches for a file in a list of directories.
205// usually these are the directories in the $PATH environment variable.
206// use resolvePath to get that list.
207func lookupPath(name string, dirs ...string) (string, error) {
208	// different shells have different rules for relative paths.
209	// to keep things simple, we'll just say "no".
210	if strings.Contains(name, "..");
211		return "", fmt.Errorf("no weird relative paths, please: %q", name)
212	}
213	for i, dir := range dirs {
214		path := dir + "/" + name
215		if ok, err := exists(path); ok {
216			return path, nil
217		} else if err != nil {
218			return "", fmt.Errorf("lookupPath: stat in dir #%d: %q: %w", i, path, err)
219		}
220	}
221	return "", errors.New("not found in PATH")
222}
223
224// check if a file exists using stat (https://linux.die.net/man/2/stat)
225// (true, nil) if it does.
226// (false, nil) if it doesn't.
227// (false, error) if there was an error.
228func exists(path string) (bool, error) {
229	p := cstr(path)
230	var statbuf [144]byte // we'll worry about this later.
231	// the STAT system call fills in a stat structure with information about the file,
232	// returning 0 on success and -1 on error.
233	// errno will be set to the error code if it fails.
234	_, _, err := syscall.Syscall(syscall.SYS_STAT, uintptr(p), uintptr(unsafe.Pointer(&statbuf)), 0)
235	switch err {
236	case 0: // success!
237		return true, nil
238	// the opaquely named ENOENT means Error NO ENTry; aka, "file not found".
239	case syscall.ENOENT: // file doesn't exist.
240		return false, nil
241	default: // some other error.
242		return false, err
243	}
244}
245
246// cstr converts a Go string to a null-terminated C string.
247// this allocates memory.
248func cstr(s string) unsafe.Pointer {
249	b := make([]byte, len(s)+1)
250	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
251	return unsafe.Pointer(&b[0])
252}

9.3. `syscallshell`: 연습문제

syscallshell이 SIGINT(ctrl+c)를 처리하도록 수정해 보세요. 자식 프로세스가 실행 중이면 SIGINT는 셸이 아니라 자식 프로세스로 전달되어야 합니다. 자식이 실행 중이 아니면 셸은 종료해야 합니다.
$VAR 형태의 환경 변수를 해석하도록 syscallshell을 수정해 보세요. 예: “$HOME”은 HOME 환경 변수 값으로 치환되어야 합니다.
bash 스타일의 큰따옴표와 백슬래시 이스케이프를 지원하도록 syscallshell을 수정해 보세요. 예: cat "some file.txt"는 "some와 file.txt"를 각각 처리하려고 하지 말고 some file.txt 파일 내용을 출력해야 합니다.
> 리다이렉션과 >> append 연산자를 추가해, 표준 출력을 파일로 리다이렉트할 수 있게 수정해 보세요. 예: echo hello > file.txt는 hello를 file.txt에 쓰고, echo world >> file.txt는 world를 file.txt에 덧붙여야 합니다.

마무리하기 전에 간단한 셸로 대화형 세션을 실행해 봅시다.

1#!/usr/bin/env bash
2# START PROGRAM
3go run syscallshell.go

1# INTERACTIVE SESSION
2> echo hello, world!
3hello, world!
4> echo goodbye, world!
5goodbye, world!
6> thisisnotaprogram
7"thisisnotaprogram": not found in PATH

이제 거의 완전한 사용자 환경을 (거의) 밑바닥부터 만들었습니다. 꽤 멋지죠. 문제를 해결할 도구가 필요하다면, 그걸 해결하는 프로그램을 직접 작성할 수 있습니다.

10. 결론

오늘은 운영체제와 I/O의 기본기를 엄청 빠르게 훑었습니다. 프로그램 시작, 시그널, 권한, 시스템 콜, I/O 같은 주제는 각각 그 자체로 글 한 편이 될 수 있지만, 여기의 내용은 여러분이 방향을 잡기엔 충분할 겁니다.

오늘은 여기까지입니다. 다음 두 편은 아직 구상 중이지만, 더 깊은 저수준 프로그래밍으로 들어가 프로그램이 실제로 어떻게 실행되는지 그리고 어떤 자원을 사용하는지를 탐구할 계획입니다. 메모리 성능도 보고, 아주 조금의 어셈블리도 다룰 겁니다.

개인적인 메모로, 이 글은 지금까지 제가 쓴 글 중 가장 길고 복잡합니다. 거의 40페이지 분량이고, 총 300페이지/10만 단어가 넘었습니다. 그야말로 한 권의 책이네요!

기억하세요. 프로그래머는 프로그램을 작성합니다.

더 많은 글

1. 시스템 프로그래밍 시작하기: 2부: 프로그램과 바깥세상: 시스템 콜 & 파일

Efron Licht의 소프트웨어 아티클

2025년 3월

전체 글

파일을 실제로 읽고 쓰려면 어떻게 해야 할까? 파일 디스크립터란?
내 프로그램은 커맨드라인 인자를 어떻게 받는 걸까? 플래그는 위치 인자와 어떻게 다를까?
프로세스 환경이란 _무엇_일까? 환경 변수를 설정하거나 읽을 때 실제로는 무슨 일이 일어날까?
유닉스 파일 권한은 도대체 어떻게 되어 있을까?
프로그램이 시작된다는 것은 실제로 어떤 과정일까?
시스템 콜이란 무엇이고, 왜/어떻게 사용해야 할까?

늘 그렇듯 이 질문들에 답하기 위해 많은 프로그램을 작성할 것이고, 최종적으로는 셸(shell)을 밑바닥부터 직접 만들어 볼 것입니다.

사이드노트: 라이브러리 함수

이번 글은 지난 글보다 할 일이 훨씬 많아서, 간결함을 위해 프로그램들 사이에서 함수를 명시적으로 import하거나 복사하지 않고 재사용할 겁니다. 재사용하는 함수는 이 글 안에서 이전에 정의된 것만 사용할 것이니, ctrl+f로 찾으면 원하는 정의를 꽤 빨리 찾을 수 있을 거예요.

1. 도입: 프로그램과 바깥세상

일반적으로 프로그램이 바깥세상과 상호작용하는 방식은 흔한 순서대로 크게 세 가지가 있습니다.

프로그램이 시작될 때 커맨드라인 인자와 **프로세스 환경(process environment)**을 함께 받습니다. 이들은 해석되지 않은 단순한 문자열들이며, 프로그램이 직접 해석해야 합니다.
프로그램은 보통 파일로 표현되는 다른 데이터를 읽고(read), 쓰고(write), **실행(execute)**할 수 있습니다. 파일은 보통 물리 디스크 위의 영구 저장소를 뜻하지만, 네트워크 소켓, 파이프, 심지어 물리 장치 같은 것들도 흔한 “파일”입니다. 이는 (권한이 있다면) **시스템 콜(system call)**을 통해 이루어집니다.
프로그램은 운영체제나 다른 프로세스가 보낸 **시그널(signal)**에 의해 중단될 수 있습니다. 시그널은 운영체제가 프로그램에게 어떤 일이 발생했음을 알리는 방식입니다. 가장 흔한 시그널은 SIGINT(“SIGnal INTerrupt”)로, 프로그램에게 종료를 요청합니다. 셸에서 ctrl+c를 누르면 포그라운드에서 실행 중인 프로그램에 SIGINT를 보냅니다. 이런 시그널도 시스템 콜로 생성됩니다.

곁가지: 공유 메모리 & 메모리 매핑 I/O

바깥세상과 상호작용하는 또 다른 방법으로는 메모리 매핑 I/O(memory-mapped IO)나 공유 메모리(shared memory)가 있습니다.

공유 메모리는 여러 프로세스가 같은 메모리에 쓰는 방식으로 통신하게 해 줍니다.

메모리 매핑 I/O는 메모리에 대한 ‘평범한’ 읽기/쓰기가 I/O 동작을 트리거하게 해 줍니다.

이 방식들은 시스템 콜과 그에 따른 컨텍스트 스위치를 건너뛸 수 있어서 성능상의 이점이 있습니다. 빠르지만 위험합니다. 이 시리즈 범위를 넘어서지만, 존재는 알아두세요.

먼저 커맨드라인 인자부터 시작합시다. 아마 익숙할 테니 빠르게 다루겠습니다.

2. `args`

커맨드라인 인자를 표준 출력으로 출력하는 printargs 프로그램을 작성해 봅시다.

1// printargs.go prints its command-line arguments to standard output.
2package main
3func main() {
4	for i, arg := range os.Args {
5		fmt.Fprintf(os.Stdout, "%d: %s\n", i, arg)
6	}
7}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./printargs.go foo bar
4# OUT
50: /tmp/go-build389224751/b001/exe/printargs
61: foo
72: bar

1#!/usr/bin/env bash
2# IN
3go build -o printargs ./printargs.go
4./printargs foo bar
5# OUT
60: ./printargs
71: foo
82: bar

이제 정상입니다. 보시다시피 커맨드라인 인자는 _아무 해석 없이 제공된 문자열_입니다. 이걸 어떻게 해석할지는 프로그램이 결정합니다.

커맨드라인 인자는 문자열이지만, 보통 두 가지 중 하나로 해석됩니다. 플래그(flags) 또는 **위치 인자(positional arguments)**입니다.

2.1. 플래그

대부분의 프로그램은 플래그와 위치 인자를 섞어 받습니다. 유닉스 표준 grep를 예로 들어봅시다.

1#!/usr/bin/env bash
2# IN
3grep -r --color=always "foo" /tmp # /tmp에서 "foo"를 재귀적으로 검색, 컬러 출력

2.2. `parseflags.go`

2.2.1. 개요

플래그가 아닌 인자를 만나거나, --를 만나거나, 중복 플래그가 나오거나, 인자가 끝날 때까지 CLI 인자를 소비하면서 플래그를 얻는다.

1.   `--`는 이후의 모든 인자를 위치 인자로 강제한다.

2.   중복 플래그는 즉시 오류로 플래그 파싱을 종료한다.

3.   첫 번째 비-플래그 인자는 플래그 파싱을 종료한다. 나머지는 위치 인자다.

4.   `-short`와 `--long` 플래그는 동일하게 취급한다.

2. 플래그와 위치 인자를 표준 출력으로 출력한다.

1
 2// parseflags parses command-line flags and returns a map of flag names to values and a slice of positional arguments.
 3// the first non-flag argument terminates flag parsing; pass '--' to force all remaining arguments to be positional.
 4// args should NOT include the command name.
 5// Flags are of the form -name=value, --name=value, -name value, or --name value: we don't
 6// treat short or long flags differently.
 7// It is an error to set a flag more than once.
 8func parseflags(args []string) (flags map[string]string, positional []string, err error) {
 9	flags = make(map[string]string)
10
11FLAGS:
12	for len(args) > 0 {
13		s := args[0]
14		if len(s) <= 1 { // can't possibly be a flag
15			break FLAGS
16		}
17		if s == "--" { // end of flags
18			args = args[1:] // consume "--"
19			break FLAGS
20		}
21		if s[0] != '-' { // not a flag
22			break FLAGS
23		}
24		// strip off up to two leading '-'s
25		if s[1] == '-' {
26			s = s[2:]
27		} else {
28			s = s[1:]
29		}
30
31		// it's now a potential flag. is it of the form -name=value?
32		// look for '='
33		for i := range s {
34			if s[i] == '=' {
35				key, value := s[:i], s[i+1:]
36				if _, ok := flags[key]; ok {
37					return nil, nil, fmt.Errorf("flag -%s already set", key)
38				}
39				flags[key] = value
40				args = args[1:] // we've consumed one arg
41				continue FLAGS
42			}
43		}
44
45		// it's not of the form -name=value. the next arg is the value.
46		// is there a next arg?
47
48		if len(args) == 1 { // no. error.
49			return nil, nil, fmt.Errorf("flag -%s missing value", s)
50		}
51
52		key, value := s, args[1] // the next arg is the value
53		if _, ok := flags[key]; ok {
54			return nil, nil, fmt.Errorf("flag -%s already set", key)
55		}
56		flags[key] = value
57		args = args[2:] // we've consumed two args
58
59	}
60	// what's left is positional arguments.
61	return flags, args, nil
62}
63

실행해 봅시다.

1#!/usr/bin/env bash
 2# IN
 3go run . -name efron --animal tapir -foo bar positional --notflag efron
 4# OUT
 5flag name=efron
 6flag animal=tapir
 7flag foo=bar
 8positional 0=positional
 9positional 1=--notflag
10positional 2=efron

남은 인자를 모두 위치 인자로 강제하는 -- 유사 플래그도 테스트해 봅시다.

1#!/usr/bin/env bash
2# IN
3go run . -name efron -- --animal tapir
4# OUT
5flag name=efron
6positional 0=--animal
7positional 1=tapir

플래그가 아니면 위치 인자입니다.

Go 런타임에서 KEY=VALUE 형태의 환경 변수 리스트를 가져온다.
사용자가 제공한 각 키의 값을 찾아 표준 출력으로 출력한다.
모두 찾았으면 상태 코드 0으로 종료한다.
누락된 변수를 모두 표준 에러로 출력하고 상태 코드 1로 종료한다.

printenv.py: 여기 클릭

1// printenv.go prints the value of each environment variable given as an argument.
 2// it exits with status 1 if any of the variables are not found.
 3package main
 4
 5import (
 6	"fmt"
 7	"os"
 8)
 9
10func main() {
11	// 1. get the list of environment variables in form KEY=VALUE from the go runtime.
12	env := os.Environ()
13	var printed int
14	keys := os.Args[1:]
15	// 2. look up the value of each key provided by the user and print it to standard output.
16	for _, key := range keys {
17		val, ok := lookupenv(env, key)
18		if ok {
19			fmt.Fprintf(os.Stdout, "%s\t%s\n", key, val)
20			printed++
21		}
22	}
23	// 3. exit with status 0 if all variables were found.
24	if printed == len(keys) {
25		os.Exit(0) //
26	}
27	// 4. print all missing variables to standard error and exit with status 1.
28
29	fmt.Fprintf(os.Stderr, "missing %d/%d environment variables\n", len(os.Args)-1-printed, len(os.Args)-1)
30	for _, key := range keys {
31		if _, ok := lookupenv(env, key); !ok {
32			fmt.Fprintf(os.Stderr, "%s\n", key)
33		}
34	}
35	os.Exit(1) // 4. Exit with status 0.
36}
37
38// look up an environment variable by name, returning its value and whether it was found.
39// in case of duplicates, return the last value.
40// environment variables are stored as "key=value" strings.
41func lookupenv(env []string, key string) (string, bool) { // LIBRARY FUNCTION: first defined in printenv.go
42
43	/*  You may have duplicated environment variables - the operating system doesn't care. It's up to the receiving program to decide what to do with them. Usually the last one wins; we'll do that here.
44	*/
45	for i := len(env)-1; i >= 0; i-- {
46		e := env[i]
47		if len(e) < len(key)+1 { // +1 for the '='
48			continue
49		}
50		if e[:len(key)] != key {
51			continue
52		}
53		if e[len(key)] != '=' {
54			continue
55		}
56		return e[len(key)+1:], true
57	}
58	return "", false
59}

일반적인 환경 변수 몇 개로 printenv를 테스트해 봅시다.

1#!/usr/bin/env bash
2# IN
3go run ./printenv.go USER HOME SHELL

1# OUT
2USER	efron
3HOME	/home/efron
4SHELL	/bin/bash

정리: bash 셸에서 환경 변수 설정하기

특정 프로그램을 한 번 실행할 때만 환경 변수를 설정하려면 ENVVAR=VALUE를 명령 앞에 붙입니다. 예: USER=efron go run ./printenv USER는 efron을 출력합니다.

셸 세션 나머지 동안 유지되도록 환경 변수를 설정하려면 export ENVVAR=VALUE를 사용합니다.

3.1.2. 예시: 셸 세션 나머지 동안 환경 변수 설정하기

1#!/usr/bin/env bash
2# IN
3export ANIMAL=WOOLY_TAPIR
4go run ./printenv ANIMAL
5# OUT
6ANIMAL	WOOLY_TAPIR

3.1.3. 예시: 단일 실행에만 환경 변수 설정하기

1#!/usr/bin/env bash
2export ANIMAL=BAIRDS_TAPIR # 셸 세션 나머지 동안 설정
3ANIMAL=MALAYAN_TAPIR go run ./printenv ANIMAL # 이번 실행에만; 셸 세션 변수를 덮어씀

사이드노트: HOME과 ~

일부 프로그램은 틸드(~)를 홈 디렉터리를 나타내는 데 사용하지만, 이는 운영체제의 근본 기능이라기보다 프로그램별 편의 기능입니다.

bash 같은 대부분의 셸은 ~를 HOME 환경 변수 값으로 확장합니다.

다른 프로그램을 쓸 때 ~가 동작할 거라고 가정하지 마세요. 실제로 HOME 값을 조회하세요.

1#!/usr/bin/env bash
2# IN:
3echo ~
4printenv HOME
5
6# OUT:
7/home/efron
8/home/efron

printenv: 연습문제

인자가 제공되지 않으면 모든 환경 변수를 출력하도록 printenv를 수정해 보세요. 중복은 어떻게 처리해야 할까요?

환경 변수를 정렬된 순서로 출력하도록 printenv를 수정해 보세요. 중복은 어떻게 처리해야 할까요? 대소문자는 어떻게 처리해야 할까요?

사이드노트: 환경 변수 네임스페이싱

두 프로그램이 같은 환경 변수 이름을 쓰는 것을 방지하는 메커니즘은 없습니다. 충돌을 피하는 것은 우리에게 달려 있습니다. PATH, HOME, NAME 같은 짧거나 흔한 이름은 피하세요. 충돌 가능성이 낮은 짧은 접두어를 붙이는 것이 좋습니다. 예를 들어 회사가 Tapir Technology이고 “monitor”라는 프로그램을 만든다면, LOG_LEVEL 대신 TT_MONITOR_LOG_LEVEL 같은 이름을 사용할 수 있습니다.

좋습니다. 환경 변수는 단순한 키-값 매핑인데… 프로그램은 애초에 이 값을 어떻게 받는 걸까요?

Q: 프로그램은 이 값들을 어떻게 얻나요? A: 운영체제가 프로그램 시작 시 execve 시스템 콜을 통해 제공합니다.

4. 시스템 콜

정리: 시스템 콜 라이브러리 함수

시스템 콜 라이브러리 함수는 다음을 수행합니다.

나중을 위해 레지스터를 저장

시스템 콜에 맞는 올바른 레지스터에 인자들을 배치

SYSCALL opcode를 사용해 커널 모드로 전환

… 운영체제가 처리 … <– 실제 시스템 콜은 여기

유저 모드로 복귀

레지스터 복구

시스템 콜 결과 반환

에러 처리

TLDR: 시스템 콜 라이브러리 함수는 시스템 콜을 평범한 함수 호출로 바꿔준다.

4.1. Go의 `syscall.Syscall`과 `syscall.Syscall6`로 보는 시스템 콜 기초

r1, r2, errno := syscall.Syscall(syscallno, arg1, arg2, arg3)는 인자가 3개 이하인 일반적인 시스템 콜에 사용합니다.
r1, r2, errno := syscall.Syscall6(syscallno, arg1, arg2, arg3, arg4, arg5, arg6)는 인자가 4~6개인 시스템 콜에 사용합니다.

파일시스템 연산인 write와 read는 가장 흔한 시스템 콜 중 일부입니다.

정리: 파일 디스크립터

시스템 콜은 파일 “객체”를 대상으로 동작하지 않습니다. 객체는 실재하지 않습니다. 기계는 레지스터와 메모리만 알고 있습니다. 대신 시스템 콜은 열린 파일을 가리키기 위해 정수인 “파일 디스크립터”(보통 fd)를 사용합니다. 운영체제는 프로세스마다 열린 파일 테이블을 유지합니다. 파일 디스크립터는 그 테이블의 인덱스입니다. 이 파일 디스크립터는 파일시스템에서의 파일 위치와 반드시 대응하지는 않습니다. 운영체제는 프로그램이 시작될 때 자동으로 세 개의 파일 디스크립터를 열어 줍니다.

STDIN(“표준 입력”)은 fd 0.

STDOUT(“표준 출력”)은 fd 1.

STDERR(“표준 에러”)은 fd 2

워밍업으로, write 시스템 콜을 Go 함수로 감싸 봅시다.

4.1.1. `write` 시스템 콜 감싸기

1func gowrite(fd int, buf []byte) (int, error) {  // library function
 2	"""write the contents of buf to the file descriptor fd"""
 3	n, _, errno := syscall.Syscall(
 4		syscall.SYS_WRITE, // which syscall?
 5		uintptr(fd),       // write to standard output
 6		uintptr(unsafe.Pointer(&buf[0])), // where to write from
 7		uintptr(len(buf)), // how many bytes to write
 8	)
 9	return int(n), errno // errno implements the error interface
10}

정리: 포인터, unsafe.Pointer, uintptr

기계 내부에는 “포인터” 같은 건 없습니다. 레지스터와 메모리만 있을 뿐입니다. 포인터는 특정 메모리 주소를 “가리키는” 숫자일 뿐입니다. 운영체제에 데이터를 어디서 읽고/어디에 써야 하는지 알려주려면, 우리가 할 수 있는 건 메모리 주소를 나타내는 숫자를 넘겨주고 운영체제가 올바르게 해석해주길 바라는 것뿐입니다.

Go에서는 unsafe.Pointer를 거쳐 포인터를 숫자로 바꾸는 uintptr(포인터를 담을 수 있을 만큼 큰 unsigned integer; pointe**r의 크기)로 이를 할 수 있습니다.

지난번처럼 표준 출력에 “hello, world!”를 출력하는 프로그램을 작성하되… 이번에는 Go의 fmt 대신 시스템 콜을 사용해 봅시다.

4.2. syscallhelloworld.go

4.2.1. 개요

문자열 "hello, world!\n"를 구성하는 바이트 시퀀스의 포인터를 얻는다.
write 시스템 콜을 사용해 그 바이트들을 표준 출력에 쓴다.
성공/실패 여부에 따라 exit 시스템 콜로 프로그램을 종료한다.

hello-world-syscall.py: 여기 클릭

1// syscallhelloworld.go writes "hello, world!" to standard output using the write syscall.
 2package main
 3import (
 4	"syscall"
 5	"unsafe"
 6)
 7func main() {
 8	var b = []byte("hello, world!\n")
 9	n, _, errno := syscall.Syscall(
10		syscall.SYS_WRITE, // which syscall?
11		uintptr(fd),       // write to standard output
12		uintptr(unsafe.Pointer(&buf[0])), // where to write from
13		uintptr(len(buf)), // how many bytes to write
14	)
15	if errno != 0 {
16		fatalf("write: %v\n", errno)
17	}
18	if n != len(buf) {
19		fatalf("write: wrote %d bytes, expected %d\n", n, len(buf))
20	}
21}
22
23// fatalf writes a formatted string to standard error and exits with status 1. LIBRARY
24func fatalf(format string, args ...interface{}) {
25	buf := []byte(fmt.Sprintf(format, args...))
26	syscall.Syscall(syscall.SYS_WRITE, STDERR, uintptr(unsafe.Pointer(&buf[0])), uintptr(len(buf))) // no point in checking the error here; we're about to exit.
27	syscall.Syscall(syscall.SYS_EXIT, 1, 0, 0) // exit with status 1
28}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./hello-world-syscall.go
4
5# OUT
6hello, world!

동작합니다! 파일을 읽거나 쓸 때마다 내부적으로는 이런 일이 일어납니다.

파일을 더 살펴봅시다. 지난번에는 파일을 이어 붙이는 cat을 만들었습니다. 시스템 콜로도 똑같이 해봅시다.

시스템 콜로 파일에서 읽고 표준 출력에 쓰는 syscallcat 프로그램을 작성해 봅시다.

4.3. `syscallcat.go`

4.3.1. `syscallcat`: 개요

첫 번째 인자로 지정된 파일을 SYS_OPEN으로 연다.
SYS_READ로 덩어리(chunk) 단위로 메모리에 읽어들인다.
그 덩어리들을 SYS_WRITEE로 표준 출력에 쓴다.
SYS_FSYNC로 출력 버퍼를 디스크에 플러시한다. (나올 때 더 설명)
SYS_CLOSE로 파일을 닫는다.
SYS_EXIT로 종료한다.

아래 표는 syscallcat.go에서 사용하는 시스템 콜을 요약합니다.

4.3.2. `syscallcat`: 사용한 시스템 콜

이름	번호	인자	설명
close	3	fd	파일 디스크립터 `fd`를 닫는다
exit	60	status	상태 코드 `status`로 프로그램을 종료한다
fsync	74	fd	파일 디스크립터 `fd`를 디스크에 플러시한다
open	2	path, flags, mode	`flags`로 지정된 동작과 `mode` 권한으로 `path`의 파일을 연다
read	0	fd, buf, count	파일 디스크립터 `fd`에서 `count` 바이트를 `buf`로 읽는다
write	1	fd, buf, count	`buf`에서 시작하는 `count` 바이트를 파일 디스크립터 `fd`에 쓴다

syscallcat.py: 여기 클릭

1// syscallcat.go opens the file specified by the first argument and writes its contents to standard output using raw syscalls.
  2package main
  3
  4import (
  5	"fmt"
  6	"os"
  7	"syscall"
  8	"unsafe"
  9)
 10
 11func main() {
 12	if len(os.Args) != 2 {
 13		fmt.Fprintf(os.Stderr, "usage: %s <file>\n", os.Args[0])
 14		os.Exit(1)
 15	}
 16
 17	path := []byte(os.Args[1]) 	 // convert the string to a byte array so we can point to it
 18	path = append(path, 0)          // null-terminate the string
 19	ptr := unsafe.Pointer(&path[0]) // point to the first byte of the array
 20	const MODE = syscall.O_RDONLY   // open the file for reading only
 21	const FLAGS = 0                 // we don't need any
 22
 23
 24	// 1. Open the file specified by the first argument w/ `SYS_OPEN`.
 25
 26	fileDescriptor, _, err := syscall.Syscall(
 27		syscall.SYS_OPEN,
 28		uintptr(unsafe.Pointer(ptr)),
 29		MODE,
 30		FLAGS,
 31	)
 32	if err != 0 {
 33		fatalf("open: %v\n", err)
 34	}
 35
 36
 37	// 2. Read chunks into memory with `SYS_READ`.
 38	// we've now opened the file. let's read from it and copy the data to standard output.
 39	var buf [1024]byte // 1KB to read into
 40
 41READ:
 42	for {
 43		ptr := &buf[0] // point to the first byte of the buffer
 44		n, _, readErr := syscall.Syscall(
 45			syscall.SYS_READ,             // which syscall?
 46			fileDescriptor,               // tell it to read from the file we opened
 47			uintptr(unsafe.Pointer(ptr)), // where to write the data?
 48			uintptr(len(buf)),            // how many bytes to read?
 49		)
 50
 51		// we'll check the error in a second - we may have read some data even if there was an error.
 52
 53
 54		// 3. Write those chunks to standard output with `SYS_WRITE`.
 55
 56		// standard output just another file descriptor: it's automatically opened for us when the program starts. it's always file descriptor 1.
 57		const FD_STDOUT = 1
 58
 59		// we want to write all the data we read to standard output.
 60		// writes are not guaranteed to write all the data you ask for in one go. among other things,
 61		// signals like SIGPIPE or SIGINT can interrupt them (more on that later).
 62		// we need to keep writing until we've written all the data we read.
 63		// functions like io.Copy usually do this for you.
 64		for offset := uintptr(0); offset < n; {
 65			ptr := &buf[offset] // point to the first byte we need to write
 66			m, _, writeErr := syscall.Syscall(
 67				syscall.SYS_WRITE,
 68				FD_STDOUT,
 69				uintptr(unsafe.Pointer(ptr)),
 70				n,
 71			)
 72			if m == n {
 73				continue READ
 74			}
 75			if writeErr != 0 {
 76				fatalf("write: %v\n", writeErr)
 77			}
 78			offset += m
 79		}
 80
 81		if readErr != 0 {
 82			fatalf("read: %v\n", readErr)
 83		}
 84
 85		if n == 0 { // we've read all the data; exit the loop
 86			break READ
 87		}
 88	}
 89
 90	// we've now written all the data we read from standard input to standard output... or have we?
 91	// it's usually pretty inefficient to do lots of small writes to permanent storage, so operating systems maintain a buffer of data to write to disk when it's convenient.
 92	// we can force the operating system to write that buffer to disk with the fsync syscall.
 93	// fsync(fd) writes the buffer for file descriptor fd to disk, blocking until it's done.
 94	// the similarly-named sync() does this for _all_ open files; it's usually better to be specific.
 95	_, _, _ = syscall.Syscall(syscall.SYS_FSYNC, FD_STDOUT, 0, 0) // no error checking here. we're about to exit anyway.
 96
 97	// 5. Close the file with `SYS_CLOSE`.
 98	syscall.Syscall(syscall.SYS_CLOSE, fileDescriptor, 0, 0)
 99
100	// 6. Exit with `SYS_EXIT`.
101	syscall.Syscall(syscall.SYS_EXIT, 0, 0, 0)
102}

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3echo "hello, world!" > hello.txt
4syscallcat hello.txt
5# OUT
6hello, world!

4.3.3. `syscallcat` 연습문제:

파일을 지정하지 않으면 표준 입력에서 읽도록 syscallcat을 수정해 보세요. 힌트: 표준 입력은 파일 디스크립터 0입니다.

여러 파일을 읽어서 표준 출력에 이어 붙이도록 syscallcat을 수정해서, 실제 cat 도구를 구현해 보세요.

물론 시스템 콜입니다. execve는 (적절한 권한이 있다면) 새로운 프로그램을 실행하기 시작합니다.

5. 소유권과 접근 제어

이건 복잡한 주제고, 빠르게 대충 감을 잡기 위해 많은 것을 생략할 겁니다. 절대적인 정답처럼 받아들이지 마세요.

접근 제어의 원조이자 지금도 가장 흔한 형태는 **파일 권한(file permissions)**입니다.

사이드노트: 현대 운영체제의 계보

현대 운영체제는 두 갈래 중 하나에서 내려옵니다.

System V Unix (1983): 리눅스와 BSD의 기원이며, BSD를 통해 macOS로도 이어집니다.

Windows NT (1993): 현대 윈도우의 기원입니다.

몇 가지 예를 들어봅시다.

기호	8진수	설명
`rwxrwxrwx`	`0777`	누구나 읽기/쓰기/실행 가능
`---------`	`0000`	아무도 아무것도 못 함
`rwx------`	`0700`	소유자만 읽기/쓰기/실행 가능; 나머지는 아무것도 못 함
`rwxr-xr--`	`0754`	소유자는 읽기/쓰기/실행; 그룹은 읽기/실행; 다른 사람은 읽기만 가능

5.0.1. 연습문제: 파일 권한

프로그램을 작성하여 printperms라는 이름으로, 파일 권한을 기호 형태로 출력해 보세요.
프로그램을 작성하여 chmod라는 이름으로, 커맨드라인에서 지정한 권한으로 파일 권한을 변경해 보세요. 8진수와 기호 권한을 모두 받아야 합니다.
chmod를 수정해 user, group, other 범주에 대해 +와 - 연산자로 권한을 _수정_할 수 있게 해 보세요. 즉 chmod o+r file은 other에 읽기 권한을 추가하고, chmod g-w file은 group의 쓰기 권한을 제거해야 합니다.
chmod를 수정해 a(all)를 사용하면 모든 범주에 대해 한 번에 권한을 설정할 수 있게 해 보세요. 즉 chmod a-r file은 모든 범주에서 읽기 권한을 제거해야 합니다.

5.1. 사용자

유닉스는 사용자 목록을 유지하며, 각 사용자는 UID(User ID)라 불리는 유일한 숫자 ID와 사람이 읽을 수 있는 이름을 갖습니다. 각 파일과 프로세스는 그것을 ‘소유’하는 사용자가 있습니다. getuid 시스템 콜은 현재 사용자의 UID를 반환합니다. 사용자 이름은 사람이 보기 위한 편의이고, 운영체제는 UID를 사용합니다. id 명령은 현재 사용자의 UID와 GID를 출력합니다.

5.1.1. 연습문제: 사용자

프로그램을 작성하여 getusername라는 이름으로, 지정된 UID에 해당하는 사용자 이름을 출력해 보세요. UID가 지정되지 않으면 현재 사용자의 사용자 이름을 출력해야 합니다. 현재 사용자의 UID는 getuid 시스템 콜을 사용해 얻으세요.
- 힌트: /etc/passwd 파일에는 사용자 목록과 UID가 들어 있습니다. id 명령으로 자신의 사용자 이름과 UID를 찾고, 그걸로 파일 파싱 방법을 유추하세요.
- 힌트: 파일 _전체_를 파싱할 필요는 없습니다.

각 사용자는 최소 하나의 그룹(“기본(primary)” 그룹)에 속해야 하며, 더 많은 그룹에 속할 수도 있습니다. 다음 섹션에서 다룹니다.

5.2. 그룹

5.2.1. 연습문제: 그룹

프로그램을 작성하여 getgroupname이라는 이름으로, 지정된 GID에 해당하는 그룹 이름을 출력해 보세요. GID가 지정되지 않으면 현재 사용자의 그룹 이름을 출력해야 합니다. 현재 사용자의 GID는 getgid 시스템 콜을 사용해 얻으세요.
- 힌트: /etc/group 파일에는 그룹 목록과 GID가 들어 있습니다.
getgroupname에 -e 플래그를 추가해, 기본 GID 대신 현재 사용자의 유효 GID를 출력하게 해 보세요.
chmod와 chown으로 실험해서 getgroupname이 어떻게 동작하는지 확인해 보세요. 효과를 보려면 새 사용자/그룹을 만들어야 할 수도 있습니다.

6. `execve`로 프로그램 시작하기

6.0.1. 정리: C 배열과 `**byte`

C 문자열처럼, 배열은 널(0)로 끝나는 메모리 포인터일 뿐입니다. Go 관점에서 문자열이 *byte라면, 문자열 _배열_은 **byte입니다.

6.1. syscallexec.go

6.1.1. 개요

사용자가 인자를 제공했는지와, 그 인자가 절대 경로인지 확인한다.
Go 스타일의 커맨드라인 인자/환경([]string)을 C 스타일의 널 종료 문자열의 널 종료 배열(**byte)로 변환한다.
SYS_EXECVE 시스템 콜로 execve를 호출해 새 프로그램을 시작한다.

1// syscallexec.go runs another program specified by absolute path using the execve system call. it uses its first command-line argument as the path to the program to run and the rest of the command-line argument as the name of
 2// the program to run.
 3// it should have exactly the same effect as just running the program directly.
 4package main
 5
 6import (
 7	"fmt"
 8	"os"
 9	"syscall"
10	"unsafe"
11)
12
13func main() {
14	//1. Check that the user has provided an argument and that it's an absolute path.
15	if len(os.Args) < 2 {
16		fatalf("usage: %s <command> [args...]\n", os.Args[0])
17	}
18	goargs := os.Args[1:] // the first item is the command, the rest are arguments
19
20	// we'll cover the process PATH later. for now, let's protect our users from themselves.
21	if len(os.Args[1]) == 0 || os.Args[1][0] != '/' {
22		fatalf("error: %s is not an absolute path\n", os.Args[1])
23	}
24	exec(goargs, os.Environ()) // we'll talk about the environment in just a bit.
25}
26
27
28// execute a program, replacing the current process. on success, this never returns,
29// so err is always a non-zero syscall.Errno.
30func exec(args, env []string) error { // LIBRARY
31	// we need to convert the command and arguments to a slice of pointers to null-terminated strings to pass to execve.
32	// we need a null-terminated array of pointers to null-terminated strings.
33
34	cargs := make([]unsafe.Pointer, len(args)+1) // +1 for null terminator
35	for i := range args {
36		cargs[i] = cstr(args[i])
37	}
38
39	cenv := make([]unsafe.Pointer, len(env)+1) // +1 for null terminator
40	for i := range env {
41		cenv[i] = cstr(env[i])
42	}
43
44	path := cstr(args[0])
45
46	// 3. Call `execve` via the `SYS_EXECVE` syscall to start the new program.
47	_, _, err := syscall.Syscall(
48		syscall.SYS_EXECVE,
49		uintptr(path),                      // path to the program to run as null-terminated string
50		uintptr(unsafe.Pointer(&cargs[0])), // pointer to pointer to byte.
51		uintptr(unsafe.Pointer(&cenv[0])),  // pointer to pointer to byte.
52	)
53	return err
54}
55
56// 2. Convert the go-style command-line arguments and environment (`[]string`) to the C-style null-terminated arrays of null-terminated strings (`**byte`).
57// cstr converts a Go string to a null-terminated C string.
58// this allocates memory.
59func cstr(s string) unsafe.Pointer { // LIBRARY
60	b := make([]byte, len(s)+1)
61	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
62	return unsafe.Pointer(&b[0])
63}
64

echo(정확히는 /bin/echo)에 몇 개 인자를 넘겨 실행해 봅시다.

1#!/usr/bin/env bash
2# write the program to a file at the absolute path /bin/syscallexec
3go build -o /bin/syscallexec ./syscallexec.go
4
5# use it to run /bin/echo
6/bin/syscallexec /bin/echo hello, world!
7
8# call our program recursively
9/bin/syscallexec /bin/syscallexec /bin/echo hello, hello, world!

1# OUT
2hello, world!
3hello, hello, world!mj,jkm,

동작은 하지만 그다지 유용하진 않습니다. 한계 몇 가지는 다음과 같습니다.

execve가 현재 프로세스를 _대체_하기 때문에, 한 번에 정확히 한 프로그램만 한 번 실행할 수 있습니다. 먼저 프로세스를 fork해서 해결할 수 있습니다. 곧 다룹니다.
자식 프로그램을 제어할 수 없습니다. 이는 시그널과 파이프로 할 수 있습니다. 이것도 곧 다룹니다.
부모 환경을 수정 없이 상속합니다.

연습: syscallexec에 자식 프로세스의 환경을 설정하는 -e 플래그를 추가해 보세요. 여러 환경 변수를 교체할 수 있어야 합니다.

실행할 프로그램의 절대 경로를 알아야 합니다. 이는 PATH 환경 변수에서 프로그램을 찾아 해결할 수 있습니다. 이를 **명령 해석(command resolution)**이라 합니다.

6.1.2. 정리: `PATH` 환경 변수와 명령 해석

명령은 PATH 환경 변수에서 검색해 해석(resolve)됩니다. PATH는 프로그램을 찾기 위해 검색할 디렉터리들의 콜론(:) 구분 리스트입니다. 첫 번째로 일치하는 것이 승리합니다.

PATH=/bin:/usr/bin:/usr/local/bin이라면 다음 디렉터리를 포함합니다.

/bin

/usr/bin

/usr/local/bin

셸에 syscallexec를 입력하면 셸은 이 디렉터리들을 순서대로 검색하여 첫 번째 일치에서 멈춥니다. 이 경우 /bin/syscallexec가 존재하므로 그것을 실행합니다. syscallexec가 /bin/syscallexec로 **해석(resolve)**된다고 말합니다.

6.1.3. `bash` 셸에서의 명령 해석 예시

1> # IN
  2> # note: no /bin/ prefix
  3> syscallexec echo hello, world!
  4> ```
  5>
  6> ```bash
  7> # OUT
  8> hello, world!
  9> ```
 10
 11다음 섹션에서는 셸처럼 명령을 직접 해석하는 프로그램을 **작성**할 것입니다. 즉, 우리가 실행하는 프로그램이 무엇인지 `which`처럼 찾아내는 겁니다.
 
 ---
 
 ## 7. `whiche.go`로 명령 해석하기
 
 `whiche`(“위치”)는 셸이 하는 것처럼 `PATH` 환경 변수를 사용해 명령의 절대 경로를 찾습니다.
 
 ### 7.1. 개요
 
 1) Go 런타임에서 환경을 가져온다.
 
 2) 문자열 조작으로 `PATH` 환경 변수를 디렉터리 리스트로 해석한다.
 
 3) `SYS_STAT` 시스템 콜로 각 디렉터리에 파일이 존재하는지 확인한다.
 
 4) 일치가 없으면 표준 에러에 메시지를 출력하고 1로 종료한다.
 
 [whiche.py: 여기 클릭](https://gitlab.com/efronlicht/blog/-/blob/64b5bd1c71896796fd486c24d9e36aec688522ff/articles/startingsystems/cmd/pythonports/whiche.py)
 
 ```go
 // whiche.go ("witch-e") finds the absolute path to a command using the PATH environment variable like a shell would.
 package main
 
 import (
 	"fmt"
 	"os"
 	"syscall"
 	"unsafe"
 )
 
 func main() {
 	if len(os.Args) != 2 {
 		fmt.Fprintf(os.Stderr, "usage: %s <command> [args...]\n", os.Args[0])
 		os.Exit(1)
 	}
 	// 1. get the environment from the operating system.
 	env := os.Environ()
 	// 2. resolve the PATH environment variable to a list of directories via string manipulation.
 	var path []string
 	{
 		// this whole block is just doing `strings.Split(rawpath, ":")`.
 		// it's good to practice ordinary string manipulation every now and then.
 		var start int
 		rawpath := getenv(env, "PATH")
 		for i := range rawpath {
 
 			if rawpath[i] != ':' {
 				continue
 			}
 			if start == i {
 				continue // skip empty strings.
 			}
 			path = append(path, rawpath[start:i])
 			start = i + 1
 		}
 		if start < len(path) {
 			path = append(path, rawpath[start:])
 		}
 	}
 
 	// 3. find the first match in the PATH directories; print it to standard output and exit 0.
 	for _, dir := range path {
 		if path, err := exists(dir + "/" + os.Args[1]); err == nil && path {
 			// found it. print & exit.
 			fmt.Println(dir + "/" + os.Args[1])
 			os.Exit(0)
 		}
 	}
 	// 4. if no match is found, print an error message to standard error and exit 1.
 
 	fmt.Fprintf(os.Stderr, "%s: command not found\n", os.Args[1])
 	os.Exit(1)
 }
 
 // getenv retrieves the value of the environment variable named by the key, or an empty string if it's not set.
 func getenv(environ []string, key string) string {
 	key += "=" // environment variables are stored as "key=value"
 	n := len(key)
 	for i := range environ {
 		if len(environ[i]) < len(key) {
 			continue
 		}
 		// KEY=VALUE
 		if environ[i][:n] == key { // KEY=
 			return environ[i][n:] // VALUE
 		}
 
 	}
 	return ""
 }
 
 // check if a file exists using stat (https://linux.die.net/man/2/stat)
 // (true, nil) if it does.
 // (false, nil) if it doesn't.
 // (false, error) if there was an error.
 func exists(path string) (bool, error) {
 	p := cstr(path)
 	var statbuf [144]byte // we'll worry about this later.
 	// the STAT system call fills in a stat structure with information about the file,
 	// returning 0 on success and -1 on error.
 	// errno will be set to the error code if it fails.
 	_, _, err := syscall.Syscall(syscall.SYS_STAT, uintptr(p), uintptr(unsafe.Pointer(&statbuf)), 0)
 	switch err {
 	case 0: // success!
 		return true, nil
 	// the opaquely named ENOENT means Error NO ENTry; aka, "file not found".
 	case syscall.ENOENT: // file doesn't exist.
 		return false, nil
 	default: // some other error.
 		return false, err
 	}
 }
 
 // cstr converts a Go string to a null-terminated C string.
 // this allocates memory.
 func cstr(s string) unsafe.Pointer {
 	b := make([]byte, len(s)+1)
 	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
 	return unsafe.Pointer(&b[0])
 }

실행해 봅시다…

1#!/usr/bin/env bash
2# IN
3go run ./whiche.go echo
4
5# OUT
6/usr/bin/echo

whiche: 연습문제

whiche에 -a 플래그를 추가해, PATH 환경 변수에서 해석 순서대로 모든 매치를 줄바꿈으로 출력해 보세요.

whiche와 syscallexec 기능을 합쳐서, 이름으로 지정한 프로그램을 실행하는 run 프로그램을 작성해 보세요.

마지막으로 시그널을 다루고, 모든 걸 합쳐봅시다.

8. 시그널

가장 흔한 시그널은 SIGSEGV(“segmentation fault”)로, 보통 널 포인터 역참조처럼 접근하면 안 되는 메모리에 접근하려고 할 때 운영체제가 보냅니다.

동작을 보기 위해 세그폴트로 크래시하는 프로그램을 작성해 봅시다.

8.1. segfault.go

1// https://go.dev/play/p/0l8t2y_aQ92
2package main
3
4func main() {
5	var nullptr *int
6	_ = *nullptr
7}

1#!/usr/bin/env bash
2# IN
3go run ./segfault.go

1# OUT
2panic: runtime error: invalid memory address or nil pointer dereference
3[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x468f22]
4
5goroutine 1 [running]:
6main.main()
7/home/efron/scratch/segfault.go:5 +0x2

7.3.2. 정리: 흔한 시그널들

아래 표는 자주 접하게 될 시그널들을 요약합니다.

SIGNAL 설명 예시 비고
SIGINT 키보드 인터럽트; 보통 “편한 시점에 종료를 시작하라” 멈춘 커맨드라인 프로그램에서 ctrl+c
SIGPOLL 읽을 I/O가 준비됨 네트워크 데이터 대기
SIGTERM 지금 종료를 시작하라 kill 잡거나 무시할 수 있음
SIGKILL kill -9 잡거나 무시할 수 없음
SIGSEGV 메모리 접근 위반 위 예시 참고 프로그램 크래시

SIGNAL	설명	예시	비고
SIGINT	키보드 인터럽트; 보통 “편한 시점에 종료를 시작하라”	멈춘 커맨드라인 프로그램에서 `ctrl+c`
SIGPOLL	읽을 I/O가 준비됨	네트워크 데이터 대기
SIGTERM	지금 종료를 시작하라	`kill`	잡거나 무시할 수 있음
SIGKILL		`kill -9`	잡거나 무시할 수 없음
SIGSEGV	메모리 접근 위반	위 예시 참고	프로그램 크래시

다른 프로그램에 시그널을 보내는 프로그램을 작성해 봅시다.

8.2. sendsignal.go

7.3.3.1. 개요

커맨드라인에서 PID와 시그널을 정수로 파싱한다.
kill 시스템 콜로 주어진 PID의 프로세스에 시그널을 보낸다.

sendsignal.py: 여기 클릭

1package main
 2
 3import (
 4	"fmt"
 5	"os"
 6	"strconv"
 7	"syscall"
 8)
 9
10// sendsignal.go sends a signal to a process by PID.
11// usage: sendsignal <pid> <signal>
12func main() {
13	if len(os.Args) != 3 {
14		fatal(fmt.Errorf("usage: %s <pid> <signal>", os.Args[0]))
15	}
16	pid, err := strconv.Atoi(os.Args[1])
17	if err != nil {
18		fatal(err)
19	}
20	signal, err := strconv.Atoi(os.Args[2])
21	if err != nil {
22		fatal(err)
23	}
24	_, _, errno := syscall.Syscall(syscall.SYS_KILL, uintptr(pid), uintptr(signal), 0)
25	if errno != 0 {
26		fatal(errno)
27	}
28}
29
30func fatal(err error) {
31	fmt.Fprintln(os.Stderr, err)
32	os.Exit(1)
33}
34

테스트하려면 다음이 필요합니다.

시그널을 보낼 만큼 충분히 오래 실행되는 프로세스
그 프로세스의 PID

8.3. 정리: PID

PID(Process Identifier)는 시스템에서 프로세스를 유일하게 식별합니다. 각 프로세스는 getpid 시스템 콜로 자신의 PID를 알고, getppid 시스템 콜로 부모 프로세스(자신을 시작한 프로세스)의 PID를 알 수 있습니다. 모든 프로세스는 트리를 이루며, 루트는 init 프로세스(프로세스 1)입니다.

자기 PID를 출력하고 시그널을 기다리는 프로그램을 작성해 봅시다. 이름은 killme로 하겠습니다.

8.4. killme.go

8.4.1. 개요

getpid 시스템 콜로 현재 프로세스의 PID를 얻는다.
PID를 표준 출력에 한 번 출력한다.
15초마다 표준 에러로 ‘still alive’를 출력한다.

프로그램

killme.py: 여기 클릭

1package main
 2func main() {
 3	// 1. use the `getpid` system call to get the PID of the current process.
 4	pid, _, _ := syscall.Syscall(syscall.SYS_GETPID, 0, 0, 0)
 5	// 2. print the PID to standard output once.
 6	fmt.Println(pid)
 7	// 3. print 'still alive' to standard error every 15s.
 8	for ; ; time.Sleep(15*time.Second) {
 9		fmt.Fprintln(os.Stderr, "still alive")
10	}
11}

1#!/usr/bin/env bash
2# IN
3go run ./killme.go &

1# OUT
28656 # 이 숫자는 시스템마다 다릅니다.
3still alive

1#!/usr/bin/env bash
2# IN
3PID=8656 # 위에서 나온 값
4SIGNAL=9 # SIGKILL
5go run ./sendsignal.go $PID $SIGNAL

1# OUT
2signal: killed

killme 프로세스는 이제 ‘still alive’를 더 이상 출력하지 않아야 합니다. sendsignal을 다시 실행하면 에러가 납니다.

1#!/usr/bin/env bash
2# IN
3go run ./sendsignal.go 8656 9

1# OUT
2no such process

8.5. killmeslowly.go

이 프로그램에 대응하는 파이썬 버전은 없습니다.

8.5.1. 개요

PID를 표준 출력에 출력한다.
SIGINT에 대한 시그널 핸들러를 등록한다.
SIGINT를 받을 때마다 stderr에 5에서 1까지 카운트다운한다.
다섯 번째(마지막) SIGINT에서 상태 코드 0으로 종료한다.

8.5.2. 프로그램

1// killmeslowly runs until it receives 5 SIGINT signals.
 2package main
 3
 4import (
 5	"fmt"
 6	"os"
 7	"os/signal"
 8	"syscall"
 9)
10
11func main() {
12	// 1. Print our `PID` to standard output.
13
14	fmt.Println(os.Getpid())
15
16	//	2. Register a signal handler for `SIGINT`.
17
18	ch := make(chan os.Signal, 5)     // always make a buffer of at least 1 so you don't drop signals
19	signal.Notify(ch, syscall.SIGINT) // forward SIGTERM and SIGINT to ch
20
21	// 3. Count down from 5 to 1 on each `SIGINT` on stderr.
22	fmt.Fprintf(os.Stderr, "waiting for SIGINT\n")
23	remaining := 5
24	for range ch {
25		remaining--
26		if remaining == 0 {
27			fmt.Fprintf(os.Stderr, "exit\n")
28			os.Exit(0)
29		}
30		fmt.Fprintf(os.Stderr, "got SIGINT: %d more to exit\n", remaining)
31	}
32}

테스트해 봅시다. ctrl+c는 현재 실행 중인 프로그램에 SIGINT를 보내므로 백그라운드 프로세스가 필요 없습니다.

1#!/usr/bin/env bash
2# IN
3go run ./killmeslowly.go

1# OUT
 243555
 3waiting for SIGINT
 4# ctrl+c
 5got SIGINT: 4 more to exit
 6# ctrl+c
 7got SIGINT: 3 more to exit
 8# ctrl+c
 9got SIGINT: 2 more to exit
10# ctrl+c
11got SIGINT: 1 more to exit
12# ctrl+c
13exit

핵심적으로 셸(또는 커맨드라인 인터프리터)은

표준 입력에서 한 줄을 읽고
그것을 명령으로 해석하고
명령을 실행합니다.

우리는 이제 리눅스 시스템 콜로 이 모든 것을 하는 법을 압니다. 마지막 섹션에서 이를 수행하는 syscallshell 프로그램을 작성해 봅시다.

9. `syscallshell`로 모두 합치기

syscallshell은 사용자가 입력한 서브커맨드를 실행하는 간단한 셸입니다.

9.1. 개요

표준 입력에서 한 줄씩 읽는다.
그 줄을 명령과 인자로 분리한다(지금은 따옴표나 이스케이프는 신경 쓰지 않는다).
PATH 환경 변수로 명령을 절대 경로로 해석한다.
새 프로세스를 fork한다.

1.   자식에서는 `execve`로 새 프로그램을 시작한다.

2.   부모에서는 자식이 끝날 때까지 기다린다.

5. 운영체제로부터 SIGINT(ctrl+c)로 종료 신호를 받을 때까지 반복한다. 메시지를 출력하고 상태 코드 0으로 종료한다.

이 프로그램의 파이썬 버전은 없습니다. 시간이 부족했습니다.

9.2. `syscallshell.go`

1// syscallshell.go implements a simple shell that runs subcommands entered by the user. it uses the PATH environment variable to find the commands to run.
  2// the shell reads a line at a time from standard input, splits it into arguments, and runs the command.
  3// example usage:
  4//
  5//	echo -e "echo hello\necho goodbye" | go run syscallshell.go
  6//	hello
  7package main
  8
  9import (
 10	"bufio"
 11	"errors"
 12	"fmt"
 13	"os"
 14	"strings"
 15	"syscall"
 16	"unsafe"
 17)
 18
 19func main() {
 20	// 1. read a line at a time from standard input.
 21	for scanner := bufio.NewScanner(os.Stdin); scanner.Scan(); { // scan a line at a time.
 22		line := scanner.Text()
 23		line = strings.TrimSpace(line)
 24		if line == "" {
 25			continue
 26		}
 27		// 2. Split the line into a command and arguments (don't worry about quotes or escaping for now).
 28		args := strings.Fields(line) // split on whitespace.
 29
 30		// the first argument is the command to run, as usual. now you get why it's like that ;).
 31
 32		// 3. Use the PATH environment variable to resolve the command to an absolute path.
 33		path, err := whiche(os.Environ(), args[0])
 34		if err != nil {
 35			fmt.Fprintf(os.Stderr, "%q: %v\n", args[0], err)
 36			continue
 37		}
 38
 39		name := path
 40
 41		// replace the command with the resolved path so call() has to do less work.
 42		args[0] = path
 43		// 4. Fork a new process; in the child, use `execve` to start the new program.
 44
 45		status, err := call(args, os.Environ())
 46		if err != nil {
 47			fmt.Fprintf(os.Stderr, "%s: %v\n", name, err)
 48		}
 49		if status != 0 {
 50			fmt.Fprintf(os.Stderr, "%s: exit status %d\n", name, status)
 51		}
 52	}
 53}
 54
 55// call runs a command like a shell would.
 56// it
 57//   - resolves the PATH environment variable to find the command (no syscalls, just string manipulation)
 58//   - forks a new process using FORK
 59//   - CHILD executes the command in the child process (syscall.EXECVE)
 60//   - PARENT waits for the child process to finish in the parent process.
 61//   - PARENT returns the exit status of the child process.
 62func call(args []string, env []string) (status int, err error) { // LIBRARY: first defined in syscallshell.go
 63	{ // bounds checks
 64		if len(args) == 0 {
 65			return 0, errors.New("no command")
 66		}
 67		if args[0] == "" {
 68			return 0, errors.New("empty command")
 69		}
 70		if args[0][0] != '/' {
 71			return 0, errors.New("command must be an absolute path; try using lookupPath")
 72		}
 73	}
 74	// 4. Fork a new process
 75	// spawn a new process.
 76	// we'll know if we're the parent or the child based on the return value of FORK.
 77	// the CHILD gets a 0.
 78	// the parent either gets PID > 0 (child's PID) or a negative number (error).
 79	pid, _, errno := syscall.Syscall(syscall.SYS_FORK, 0, 0, 0) // PID: *P*rocess *ID*entifier
 80	if errno != 0 {
 81		return status, fmt.Errorf("syscall: fork: %v", err)
 82	}
 83
 84	// there are 3 cases:
 85	// - we spawn a new process, it calls EXECVE, and succeeds (exit status 0)
 86	// - we spawn a new process, it calls EXECVE, and fails (exit status 1)
 87	// - we spawn a new process, it fails to call EXECVE (bad path? weird permissions? etc.)
 88
 89	// we want to find about about the third case. we can re-use the exit status to signal that the exec failed.
 90	const STATUS_FAILED_EXEC = 0xB01D // magic number to signal an exec error. picked at random.
 91
 92	// 4.1: in the child, use `execve` to start the new program.
 93	if isChild := pid == 0; isChild { // we're the child.
 94		err := exec(args, env)
 95		// WARNING: might be tempting to return the error here - but the parent process will never see it,
 96		// since we're the child. instead, we'll print the error and exit.
 97		// let's use our magic number to signal that we couldn't exec the command.
 98		fmt.Fprintf(os.Stderr, "syscall: execve: %v\n", err)
 99		os.Exit(STATUS_FAILED_EXEC)
100	}
101
102	// 4.2: in the parent, wait for the child to finish.
103	{ // we're the parent. wait for the child to finish.
104		// the WAIT system call waits for a child process to finish or for a signal, whichever comes first.
105		// returning the PID of the child and its exit status.
106		// if the child hasn't finished yet, it will block until it does.
107		// if the child has already finished, it will return immediately.
108		var waitstatus uint32
109		pid := syscall.Syscall(syscall.SYS_WAIT4, pid, uintptr(unsafe.Pointer(&waitstatus)), 0)
110		fmt.Fprintf(os.Stderr, "pid %d exited with status %d\n", pid, waitstatus)
111
112		// the exit status is in the upper 8 bits of the status.
113		// the lower 8 bits are the signal that killed the process, if any.
114		status = int(waitstatus >> 8)
115		if status == STATUS_FAILED_EXEC {
116			return status, errors.New("execve failed")
117		}
118		// signal handling could also happen here; we'll leave it out for now.
119		return status, nil
120	}
121}
122
123// execute a program, replacing the current process. on success, this never returns,
124// so err is always a non-zero syscall.Errno.
125func exec(args, env []string) error { // LIBRARY: first defined in syscallexec.go
126	// we need to convert the command and arguments to a slice of pointers to null-terminated strings to pass to execve.
127	// we need a null-terminated array of pointers to null-terminated strings.
128
129	cargs := make([]unsafe.Pointer, len(args)+1) // +1 for null terminator
130	for i := range args {
131		cargs[i] = cstr(args[i])
132	}
133
134	cenv := make([]unsafe.Pointer, len(env)+1) // +1 for null terminator
135	for i := range env {
136		cenv[i] = cstr(env[i])
137	}
138
139	path := cstr(args[0])
140
141	_, _, err := syscall.Syscall(
142		syscall.SYS_EXECVE,
143		uintptr(path),                      // path to the program to run as null-terminated string
144		uintptr(unsafe.Pointer(&cargs[0])), // arguments as null-terminated array pointer to null-terminated strings
145		uintptr(unsafe.Pointer(&cenv[0])),  // environment as null-terminated array pointer to null-terminated strings
146	)
147	return err
148}
149
150// whiche resolves the path to a command using the PATH environment variable like a shell would.
151// it returns the absolute path to the command, or an error if the command couldn't be found.
152// suppose our path is "/bin:/usr/bin:/usr/local/bin" and the command is "ls".
153// we'll look for "/bin/ls", "/usr/bin/ls", and "/usr/local/bin/ls", in that order.
154// the first one that exists is returned.
155// absolute paths are returned as-is.
156func whiche(env []string, command string) (string, error) {
157	// 4 cases
158	switch {
159	case strings.Contains(command, "..");
160		return "", errors.New("no weird relative paths, please") // we'll handle this later.
161	case command == "":
162		return "", errors.New("empty path")
163	case command[0] == '/': // already absolute. nothing to do.
164		return command, nil
165	case command[0] == '.': // relative path to the current working directory.
166		// working directories are somewhat complicated; different shells have different rules.
167		// we'll skip this for now and rely on the operating system to handle it.
168		wd, _ := os.Getwd()
169		return lookupPath(command, wd)
170	default: // look up the command in the PATH environment variable.
171
172		// we'll resolve the PATH environment variable to find the command.
173		// we know from our last program that environment variables are handed to a program as an array of null-terminated strings
174		// by the EXECVE syscall. the go runtime converted those to go-style strings for us before go's main() function was called;
175		// we'll grab them with os.Environ() and convert them back to C-style strings.
176		// find the value of the PATH environment variable; it's a ':'-separated list of directories, like /bin:/usr/bin:/usr/local/bin
177		pathEnv := getenv(env, "PATH")
178
179		// resolve the PATH environment variable into a list of directories... PATH=/bin:/usr/bin:/usr/local/bin -> ["/bin", "/usr/bin", "/usr/local/bin"]
180		dirs := strings.Split(pathEnv, ":") // this doesn't handle certain kinds of quoting & escaping, but it's good enough for now.
181		// find the command in the PATH directories.
182		// e.g. if the command is "ls", we'll look for "/bin/ls", "/usr/bin/ls", and "/usr/local/bin/ls".
183		return lookupPath(command, dirs...)
184	}
185}
186
187// getenv retrieves the value of the environment variable named by the key, or an empty string if it's not set.
188func getenv(environ []string, key string) string {
189	key += "=" // environment variables are stored as "key=value"
190	n := len(key)
191	for i := range environ {
192		if len(environ[i]) < len(key) {
193			continue
194		}
195		// KEY=VALUE
196		if environ[i][:n] == key { // KEY=
197			return environ[i][n:] // VALUE
198		}
199
200	}
201	return ""
202}
203
204// lookupPath searches for a file in a list of directories.
205// usually these are the directories in the $PATH environment variable.
206// use resolvePath to get that list.
207func lookupPath(name string, dirs ...string) (string, error) {
208	// different shells have different rules for relative paths.
209	// to keep things simple, we'll just say "no".
210	if strings.Contains(name, "..");
211		return "", fmt.Errorf("no weird relative paths, please: %q", name)
212	}
213	for i, dir := range dirs {
214		path := dir + "/" + name
215		if ok, err := exists(path); ok {
216			return path, nil
217		} else if err != nil {
218			return "", fmt.Errorf("lookupPath: stat in dir #%d: %q: %w", i, path, err)
219		}
220	}
221	return "", errors.New("not found in PATH")
222}
223
224// check if a file exists using stat (https://linux.die.net/man/2/stat)
225// (true, nil) if it does.
226// (false, nil) if it doesn't.
227// (false, error) if there was an error.
228func exists(path string) (bool, error) {
229	p := cstr(path)
230	var statbuf [144]byte // we'll worry about this later.
231	// the STAT system call fills in a stat structure with information about the file,
232	// returning 0 on success and -1 on error.
233	// errno will be set to the error code if it fails.
234	_, _, err := syscall.Syscall(syscall.SYS_STAT, uintptr(p), uintptr(unsafe.Pointer(&statbuf)), 0)
235	switch err {
236	case 0: // success!
237		return true, nil
238	// the opaquely named ENOENT means Error NO ENTry; aka, "file not found".
239	case syscall.ENOENT: // file doesn't exist.
240		return false, nil
241	default: // some other error.
242		return false, err
243	}
244}
245
246// cstr converts a Go string to a null-terminated C string.
247// this allocates memory.
248func cstr(s string) unsafe.Pointer {
249	b := make([]byte, len(s)+1)
250	copy(b, s) // copy the string into the buffer. the leftover byte will be the null terminator.
251	return unsafe.Pointer(&b[0])
252}

9.3. `syscallshell`: 연습문제

syscallshell이 SIGINT(ctrl+c)를 처리하도록 수정해 보세요. 자식 프로세스가 실행 중이면 SIGINT는 셸이 아니라 자식 프로세스로 전달되어야 합니다. 자식이 실행 중이 아니면 셸은 종료해야 합니다.
$VAR 형태의 환경 변수를 해석하도록 syscallshell을 수정해 보세요. 예: “$HOME”은 HOME 환경 변수 값으로 치환되어야 합니다.
bash 스타일의 큰따옴표와 백슬래시 이스케이프를 지원하도록 syscallshell을 수정해 보세요. 예: cat "some file.txt"는 "some와 file.txt"를 각각 처리하려고 하지 말고 some file.txt 파일 내용을 출력해야 합니다.
> 리다이렉션과 >> append 연산자를 추가해, 표준 출력을 파일로 리다이렉트할 수 있게 수정해 보세요. 예: echo hello > file.txt는 hello를 file.txt에 쓰고, echo world >> file.txt는 world를 file.txt에 덧붙여야 합니다.

마무리하기 전에 간단한 셸로 대화형 세션을 실행해 봅시다.

1#!/usr/bin/env bash
2# START PROGRAM
3go run syscallshell.go

1# INTERACTIVE SESSION
2> echo hello, world!
3hello, world!
4> echo goodbye, world!
5goodbye, world!
6> thisisnotaprogram
7"thisisnotaprogram": not found in PATH

10. 결론

기억하세요. 프로그래머는 프로그램을 작성합니다.

더 많은 글

시스템 프로그래밍 시작하기: 2부: 프로그램과 바깥세상: 시스템 콜 & 파일

관련 추천 글

시스템 프로그래밍 시작하기, 1부: 프로그래머는 프로그램을 작성한다

main() 이전의 여정 | Amit의 블로그

STDIO에 대해 알고 싶었던 모든 것

모든 것은 파일이다

관련 추천 글

시스템 프로그래밍 시작하기, 1부: 프로그래머는 프로그램을 작성한다

main() 이전의 여정 | Amit의 블로그

STDIO에 대해 알고 싶었던 모든 것

모든 것은 파일이다