deadlock-like issue with order=strict mounts

Discussion:

Michael L. Semon

2014-06-30 16:46:37 UTC

Hi! With debugging being discussed here, I wanted to pass on an issue
that has no error message associated with it. This will be one of those
error reports that Vyacheslav will find not too informative. He's
been trying to help with those moments where NILFS2 will stop responding
for no visible reason, but whatever issue has 100% reproducibility on
my PC has no reproducibility on his PC. This is a new test; maybe this
test will work.

With NILFS2 '-o order=strict' mounts, at least, there is a repeatable
deadlock-like behavior between segctord and a process that syncs.
Usually, that process is lilo, but other programs can cause this behavior
at random. After this issue is reached, clean shutdowns are almost
impossible. At least here on an old Pentium III--512 MB RAM, Slackware
14.1, kernel 3.16.0-rc2, debug kernel config, old IDE drives--this script
reproduces the issue:

# ==== script ====
#!/bin/bash
hdparm -W 0 /dev/hdc # write cache off
mkfs.nilfs2 -f /dev/hdc4
mount -t nilfs2 -o order=strict /dev/hdc4 /mnt/tmp
cd /mnt/tmp
while true; do
fs_mark -D 4 -t 4 -n 50 -s 512 -L 5 -d todelete
rm -r todelete
sync
sleep 1
done
# ==== end of script ====

Should your PC be too fast to make a deadlock happen, increase any or
all of the numbers in the fs_mark command line. On this PC, it goes
through the loop exactly once.

After forcing a crash and collecting the core dump, I see this using
the crash 7.0.4 program:

crash> bt 274
PID: 274 TASK: dd9caac0 CPU: 0 COMMAND: "segctord"
#0 [c0063d48] __schedule at c1641357
#1 [c0063dc8] schedule at c1641a7e
#2 [c0063dd0] inode_wait at c11467c8
#3 [c0063dd8] __wait_on_bit at c1642133
#4 [c0063df0] __inode_wait_for_writeback at c1156d98
#5 [c0063e24] inode_wait_for_writeback at c1159fff
#6 [c0063e34] evict at c11475de
#7 [c0063e48] iput at c11482ef
#8 [c0063e60] nilfs_dispose_list at c12f104a
#9 [c0063ecc] nilfs_transaction_unlock at c12f14e9
#10 [c0063edc] nilfs_segctor_thread at c12f3fa1
#11 [c0063f28] kthread at c105fb56
#12 [c0063fb0] ret_from_kernel_thread at c164729e

crash> bt 301
PID: 301 TASK: dd9cc020 CPU: 0 COMMAND: "sync"
#0 [de9e1dac] __schedule at c1641357
#1 [de9e1e2c] schedule at c1641a7e
#2 [de9e1e34] schedule_timeout at c1640a80
#3 [de9e1ea8] wait_for_completion at c1642436
#4 [de9e1ed4] sync_inodes_sb at c115ae12
#5 [de9e1f7c] sync_inodes_one_sb at c115e620
#6 [de9e1f84] iterate_supers at c112d1e8
#7 [de9e1fa0] sys_sync at c115e85c
#8 [de9e1fb0] ia32_sysenter_target at c164736b
EAX: 00000024 EBX: bf8b1954 ECX: 00000000 EDX: b775517c
DS: 007b ESI: 00000001 ES: 007b EDI: 00000000
SS: 007b ESP: bf8b187c EBP: bf8b18b8 GS: 0000
CS: 0073 EIP: b776da8c ERR: 00000024 EFLAGS: 00000246

The behavior seems to happen more easily on new filesystems, or on a
filesystem whose old checkpoints have all been removed by rmcp and
nilfs-clean. It gets better once the filesystem is full again and
nilfs_cleanerd has had a chance to run automatically.

If the issue can be reproduced with an order=relaxed mount, I have not
tested it sufficiently. Spot-tests seem OK.

The full core dump is available, should you need it.

Thanks!

Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Michael L. Semon

2014-07-01 00:48:13 UTC

Permalink

Post by Michael L. Semon
Hi! With debugging being discussed here, I wanted to pass on an issue
that has no error message associated with it. This will be one of those
error reports that Vyacheslav will find not too informative. He's
been trying to help with those moments where NILFS2 will stop responding
for no visible reason, but whatever issue has 100% reproducibility on
my PC has no reproducibility on his PC. This is a new test; maybe this
test will work.

<snip>

Post by Michael L. Semon
After forcing a crash and collecting the core dump, I see this using
crash> bt 274
PID: 274 TASK: dd9caac0 CPU: 0 COMMAND: "segctord"
#0 [c0063d48] __schedule at c1641357
#1 [c0063dc8] schedule at c1641a7e
#2 [c0063dd0] inode_wait at c11467c8
#3 [c0063dd8] __wait_on_bit at c1642133
#4 [c0063df0] __inode_wait_for_writeback at c1156d98
#5 [c0063e24] inode_wait_for_writeback at c1159fff
#6 [c0063e34] evict at c11475de
#7 [c0063e48] iput at c11482ef
#8 [c0063e60] nilfs_dispose_list at c12f104a
#9 [c0063ecc] nilfs_transaction_unlock at c12f14e9
#10 [c0063edc] nilfs_segctor_thread at c12f3fa1
#11 [c0063f28] kthread at c105fb56
#12 [c0063fb0] ret_from_kernel_thread at c164729e
crash> bt 301
PID: 301 TASK: dd9cc020 CPU: 0 COMMAND: "sync"
#0 [de9e1dac] __schedule at c1641357
#1 [de9e1e2c] schedule at c1641a7e
#2 [de9e1e34] schedule_timeout at c1640a80
#3 [de9e1ea8] wait_for_completion at c1642436
#4 [de9e1ed4] sync_inodes_sb at c115ae12
#5 [de9e1f7c] sync_inodes_one_sb at c115e620
#6 [de9e1f84] iterate_supers at c112d1e8
#7 [de9e1fa0] sys_sync at c115e85c
#8 [de9e1fb0] ia32_sysenter_target at c164736b
EAX: 00000024 EBX: bf8b1954 ECX: 00000000 EDX: b775517c
DS: 007b ESI: 00000001 ES: 007b EDI: 00000000
SS: 007b ESP: bf8b187c EBP: bf8b18b8 GS: 0000
CS: 0073 EIP: b776da8c ERR: 00000024 EFLAGS: 00000246

Uum, this seems a deadlock issue over I_SYNC flag on inode->i_state.
If so, the issue looks reproducible also for default mount by calling
sync command and removing files repeatedly.
Can you try it ?
Regards,
Ryusuke Konishi

It was fine on the default mount. The script above was used until a
4GB partition was 65% full. Then, I added more threads and populated
it again until the filesystem ran out of space. No problems.

In case you or Vyacheslav need the kernel config, it is here:

https://drive.google.com/file/d/0B41268QKoNjtSUE0SkZsb0E5ckE

I had a similar issue with xfstests; one of the stack traces
may have ended with "no locks were held by segctord/8879." However,
those files did not make it to my USB key to bring here. They will
be included next time, so that I can verify that message. [My memory
is not very good at times.]

Thanks again!

Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Michael L. Semon

2014-07-01 21:59:31 UTC

Permalink

Post by Michael L. Semon

<snip>

It was fine on the default mount. The script above was used until a
4GB partition was 65% full. Then, I added more threads and populated
it again until the filesystem ran out of space. No problems.
https://drive.google.com/file/d/0B41268QKoNjtSUE0SkZsb0E5ckE
I had a similar issue with xfstests; one of the stack traces
may have ended with "no locks were held by segctord/8879." However,
those files did not make it to my USB key to bring here. They will
be included next time, so that I can verify that message. [My memory
is not very good at times.]

OK, here's the stack traces that were missing last time. This involves
segctord and umount.nilfs2:

# Command line: env MOUNT_OPTIONS='-o pp=0,order=strict' ./check -g auto
# Test generic/001 passes, but segctord and umount.nilfs2 become
# unresponsive and trigger the kernel hung task detector.

[ 3824.349820] segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
[ 3828.544963] segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
[ 3829.805282] segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
[ 5040.128148] INFO: task segctord:8987 blocked for more than 720 seconds.
[ 5040.128227] Not tainted 3.16.0-rc2+ #12
[ 5040.128249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5040.128272] segctord D c10870e4 6124 8987 2 0x00000000
[ 5040.128354] de9ffdc4 00000086 00000000 c10870e4 00000000 de9ffd94 c18d9000 0ad32d1d
[ 5040.128487] 00000388 c1802780 ddb85580 00000000 de9ffda8 00000046 00000082 00000002
[ 5040.128618] dfebb000 dfebaff0 00000001 00000046 00000296 dfebaff0 de9ffda8 c108909b
[ 5040.128749] Call Trace:
[ 5040.128812] [<c10870e4>] ? lock_release_holdtime.part.25+0xb4/0x100
[ 5040.128859] [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 5040.128913] [<c1641a83>] schedule+0x23/0x60
[ 5040.128968] [<c11467cd>] inode_wait+0xd/0x20
[ 5040.129081] [<c1642135>] __wait_on_bit+0x55/0x80
[ 5040.129125] [<c11467c0>] ? inode_lru_list_del+0x30/0x30
[ 5040.129170] [<c1156d9d>] __inode_wait_for_writeback+0x8d/0xb0
[ 5040.129227] [<c107f3f0>] ? autoremove_wake_function+0x50/0x50
[ 5040.129274] [<c115a004>] inode_wait_for_writeback+0x24/0x40
[ 5040.129316] [<c11475e3>] evict+0x83/0x160
[ 5040.129358] [<c11482f4>] iput+0xe4/0x190
[ 5040.129421] [<c12f105f>] nilfs_dispose_list+0x11f/0x180
[ 5040.129463] [<c12f14de>] ? nilfs_transaction_unlock.isra.27+0x3e/0x80
[ 5040.129503] [<c12f14fe>] nilfs_transaction_unlock.isra.27+0x5e/0x80
[ 5040.129544] [<c12f3fb6>] nilfs_segctor_thread+0xf6/0x300
[ 5040.129584] [<c12f3ec0>] ? nilfs_segctor_construct+0x300/0x300
[ 5040.129629] [<c105fb58>] kthread+0xa8/0xc0
[ 5040.129671] [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 5040.129712] [<c16472a1>] ret_from_kernel_thread+0x21/0x30
[ 5040.129752] [<c105fab0>] ? insert_kthread_work+0x80/0x80
[ 5040.129779] no locks held by segctord/8987.

crash> bt 8987
PID: 8987 TASK: ddb85580 CPU: 0 COMMAND: "segctord"
#0 [de9ffd48] __schedule at c1641357
#1 [de9ffdc8] schedule at c1641a7e
#2 [de9ffdd0] inode_wait at c11467c8
#3 [de9ffdd8] __wait_on_bit at c1642133
#4 [de9ffdf0] __inode_wait_for_writeback at c1156d98
#5 [de9ffe24] inode_wait_for_writeback at c1159fff
#6 [de9ffe34] evict at c11475de
#7 [de9ffe48] iput at c11482ef
#8 [de9ffe60] nilfs_dispose_list at c12f105a
#9 [de9ffecc] nilfs_transaction_unlock at c12f14f9
#10 [de9ffedc] nilfs_segctor_thread at c12f3fb1
#11 [de9fff28] kthread at c105fb56
#12 [de9fffb0] ret_from_kernel_thread at c164729e

crash> bt 11520
PID: 11520 TASK: dea15580 CPU: 0 COMMAND: "umount.nilfs2"
#0 [dd84fd44] __schedule at c1641357
#1 [dd84fdc4] schedule at c1641a7e
#2 [dd84fdcc] schedule_timeout at c1640a80
#3 [dd84fe40] wait_for_completion at c1642436
#4 [dd84fe6c] sync_inodes_sb at c115ae12
#5 [dd84ff14] sync_filesystem at c115e7df
#6 [dd84ff20] generic_shutdown_super at c112cc87
#7 [dd84ff3c] kill_block_super at c112dd14
#8 [dd84ff4c] deactivate_locked_super at c112c9d5
#9 [dd84ff5c] deactivate_super at c112ca62
#10 [dd84ff70] mntput_no_expire at c114d4b8
#11 [dd84ff8c] sys_oldumount at c114f25e
#12 [dd84ffb0] ia32_sysenter_target at c164736b
EAX: 00000016 EBX: 0990e100 ECX: 00000001 EDX: b7750404
DS: 007b ESI: 0990e008 ES: 007b EDI: 00000000
SS: 007b ESP: bfd3ddbc EBP: b775084c GS: 0000
CS: 0073 EIP: b7765a8c ERR: 00000016 EFLAGS 00000246

Again, the same test is OK when using a default mount.

Thanks!

Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html