Redis를 운영하고 있는데, 갑자기 아무런 이유없이 Reis가 Read-Only 상태로 변경 됨.
에러 로그에는 다음과 같은 에러 메세지만 출력되고 있음
ubuntu@$ redis-cli 127.0.0.1:6379> select 1 OK 127.0.0.1:6379[1]> keys * (empty list or set) 127.0.0.1:6379[1]> 127.0.0.1:6379[1]> set aaa 1 (error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error. 127.0.0.1:6379[1]> exit |
14474:C 10 Jan 03:27:08.040 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:08.139 # Background saving error 14475:C 10 Jan 03:27:14.056 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:14.156 # Background saving error 14476:C 10 Jan 03:27:20.070 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:20.169 # Background saving error 14477:C 10 Jan 03:27:26.088 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:26.187 # Background saving error 14478:C 10 Jan 03:27:32.104 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:32.203 # Background saving error 14479:C 10 Jan 03:27:38.019 # Failed opening .rdb for saving: Read-only file system 11872:M 10 Jan 03:27:38.119 # Background saving error |
Read-Only로 바뀌는 이유는 I/O 실패 일 경우인데, 사용량도 100%가 아니고, 파일 생성도 된다.
AWS EC2에 올린 장비라서 특별히 파일 퍼미션이 변경 될 일도 없고...
ubuntu@$ df -h Filesystem Size Used Avail Use% Mounted on udev 3.9G 0 3.9G 0% /dev tmpfs 799M 17M 782M 3% /run /dev/xvda1 7.7G 2.6G 5.2G 34% / tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 799M 0 799M 0% /run/user/1000 |
root@:/home/ubuntu# cd /var/lib/redis root@:/var/lib/redis# ls appendonly.aof dump.rdb root@:/var/lib/redis# touch a.txt root@:/var/lib/redis# root@:/var/lib/redis# ls -al total 903476 drwxr-xr-x 2 redis redis 4096 Jan 10 06:59 . drwxr-xr-x 42 root root 4096 Jan 3 05:29 .. -rw-r--r-- 1 redis redis 515643064 Jan 10 03:27 appendonly.aof -rw-r--r-- 1 root root 0 Jan 10 06:59 a.txt -rw-r--r-- 1 redis redis 409498139 Jan 10 03:22 dump.rdb |
로그에는 특별한 메세지가 남아 있지 않아서 내렸다 올릴려고 해도 동작되지 않음.
root@:/var/lib/redis# service redis stop ^C root@:/var/lib/redis# |
그냥 Kill -9로 내림
root@:/var/lib/redis# ps -ef | grep redis redis 11872 1 0 Jan03 ? 01:18:01 /usr/bin/redis-server 0.0.0.0:6379 root 17592 17496 0 06:44 pts/0 00:00:00 tail -f /var/log/redis/redis-server.log root 17968 17920 0 07:03 pts/4 00:00:00 grep --color=auto redis root 23063 1 0 Jan04 ? 00:04:14 redis-stat-daemon root@:/var/lib/redis# kill -9 11872 root@:/var/lib/redis# root@:/var/lib/redis# ps -ef | grep redis root 17592 17496 0 06:44 pts/0 00:00:00 tail -f /var/log/redis/redis-server.log root 17980 17920 0 07:03 pts/4 00:00:00 grep --color=auto redis root 23063 1 0 Jan04 ? 00:04:14 redis-stat-daemon root@:/var/lib/redis# root@:/var/lib/redis# service redis start root@:/var/lib/redis# ps -ef | grep redis root 17592 17496 0 06:44 pts/0 00:00:00 tail -f /var/log/redis/redis-server.log redis 18019 1 99 07:03 ? 00:00:04 /usr/bin/redis-server 0.0.0.0:6379 root 18027 17920 0 07:03 pts/4 00:00:00 grep --color=auto redis root 23063 1 0 Jan04 ? 00:04:14 redis-stat-daemon |
Redis를 기동 시키니 다음과 같은 메세지가 로그 파일에 남음.
커넥션된 클라이언트 개수가 1.101K로 나오고 있었는데, 이 문제 였던 것 같다.
개발자가 개발을 하면서 제대로 Close를 안해 줘서 .... 그래서 개발자에게 Close 다 하고 있는지 체크 부탁 함.
18019:M 10 Jan 07:03:42.937 # You requested maxclients of 10000 requiring at least 10032 max file descriptors. 18019:M 10 Jan 07:03:42.938 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted. 18019:M 10 Jan 07:03:42.938 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'. 18019:M 10 Jan 07:03:42.938 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 18019:M 10 Jan 07:03:42.939 # Server started, Redis version 3.0.6 18019:M 10 Jan 07:03:42.939 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 18019:M 10 Jan 07:03:42.939 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. |
root@:/var/lib/redis# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 31867 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 31867 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
root@:/var/lib/redis# ulimit -n 1024 |
예전에 보면 다음과 같에 커넥션을 안 끊고 있었음
127.0.0.1:6379> client list id=73270 addr=192.168.214.80:59044 fd=1583 name= age=137 idle=137 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=74608 addr=192.168.214.80:40408 fd=2193 name= age=67 idle=67 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=73772 addr=192.168.214.80:34424 fd=1820 name= age=110 idle=110 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=73773 addr=192.168.214.80:34430 fd=1821 name= age=110 idle=110 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=75195 addr=192.168.214.80:44670 fd=2463 name= age=33 idle=33 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=75527 addr=192.168.214.80:47080 fd=2616 name= age=16 idle=16 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info |
현재는 하나만 붙어서 잘 돌아가고 있음. (몇일 모니터링 필요.)
127.0.0.1:6379> client list id=7655 addr=127.0.0.1:39220 fd=6 name= age=2476 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info id=31815 addr=192.168.214.80:30026 fd=7 name= age=693 idle=511 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sscan id=36868 addr=127.0.0.1:39226 fd=8 name= age=493 idle=486 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=smembers id=37934 addr=127.0.0.1:39228 fd=9 name= age=449 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client |