Discussion:
[PATCH] nilfs2: fix data loss with mmap()
Andreas Rohner
2014-09-15 19:47:30 UTC
Permalink
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:

----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt

# create 30MB testfile
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

# simple tool that opens /mnt/testfile and
# writes a few blocks using mmap at a 5MB offset
/root/mmaptest/mmaptest /mnt/testfile 30 10 5

sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt

echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);

for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------

The output of the script looks something like this:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced with the following commit:

136e877 nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF
boundary

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if
the page has no buffers attached to it.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6252b17..9e3c525 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)

static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);

if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;

@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)

if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-15 22:01:30 UTC
Permalink
Hi Andreas,
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
# create 30MB testfile
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
# simple tool that opens /mnt/testfile and
# writes a few blocks using mmap at a 5MB offset
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
136e877 nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF
boundary
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if
the page has no buffers attached to it.
---
fs/nilfs2/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6252b17..9e3c525 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)
static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);
if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;
@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)
if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}
--
2.1.0
Thank you for reporting this issue.

I'd like to look into this patch, it looks to point out an important
regression, but it may take some time since I am quite busy this week..

Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-15 22:24:05 UTC
Permalink
Post by Ryusuke Konishi
Hi Andreas,
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
<snip>
Post by Ryusuke Konishi
Thank you for reporting this issue.
I just stumbled upon the weird behaviour of mmap() while testing the
nilfs_sync_fs() patch.
Post by Ryusuke Konishi
I'd like to look into this patch, it looks to point out an important
regression, but it may take some time since I am quite busy this week..
Of course. I understand.

br,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-16 04:42:18 UTC
Permalink
Post by Andreas Rohner
Post by Ryusuke Konishi
Hi Andreas,
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
<snip>
Post by Ryusuke Konishi
Thank you for reporting this issue.
I just stumbled upon the weird behaviour of mmap() while testing the
nilfs_sync_fs() patch.
Post by Ryusuke Konishi
I'd like to look into this patch, it looks to point out an important
regression, but it may take some time since I am quite busy this week..
Of course. I understand.
The patch looks correct. It is my mistake that the commit 136e877
leaked consideration for the case where the page doesn't have buffer
heads. This fix should be backported to stable kernels. (I'll add a
"Cc: stable" tag when sending this to Andrew.)

Did you confirm that the patch works as expected ?

I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-16 08:38:29 UTC
Permalink
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
Hi Andreas,
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
<snip>
Post by Ryusuke Konishi
Thank you for reporting this issue.
I just stumbled upon the weird behaviour of mmap() while testing the
nilfs_sync_fs() patch.
Post by Ryusuke Konishi
I'd like to look into this patch, it looks to point out an important
regression, but it may take some time since I am quite busy this week..
Of course. I understand.
The patch looks correct. It is my mistake that the commit 136e877
leaked consideration for the case where the page doesn't have buffer
heads. This fix should be backported to stable kernels. (I'll add a
"Cc: stable" tag when sending this to Andrew.)
Did you confirm that the patch works as expected ?
Yes at least with the current master kernel:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 3d9183f1c471b9baff15c9cc8d12c303 /mnt/testfile
AFTER REMOUNT: 3d9183f1c471b9baff15c9cc8d12c303 /mnt/testfile

For the record here is the little tool I used for testing mmap:

https://github.com/zeitgeist87/mmaptest

It writes pages with random bytes at a certain offset using mmap. The
frequent umounts and mounts in the test script are necessary to purge
the page cache. There is probably a better way of doing that.
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...

By the way thanks for your continued effort and time investment in
reviewing my patches.

br,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-16 13:57:26 UTC
Permalink
Post by Andreas Rohner
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...
The commit 136e877 was merged to v3.10 and backported to stable trees
of earlier kernels. But, most of earlier stable trees are no longer
maintained. Well maintained trees are the following longterm kernels:

- 3.4.y (backported commit 136e877)
- 3.10.y
- 3.14.y

I think these three kernels are worty to be tested.
Post by Andreas Rohner
By the way thanks for your continued effort and time investment in
reviewing my patches.
Thank you too, for your hard work and patience.

Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-17 08:16:46 UTC
Permalink
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...
The commit 136e877 was merged to v3.10 and backported to stable trees
of earlier kernels. But, most of earlier stable trees are no longer
- 3.4.y (backported commit 136e877)
- 3.10.y
- 3.14.y
I think these three kernels are worty to be tested.
I tested it on all stable kernels including 3.4.x, 3.10.x, 3.14.x. The
bug is present in all of them and the patch fixes it. The patch also
applies cleanly on all kernels. I sent it again yesterday, and added the
Tested-by: tag.

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-17 12:34:39 UTC
Permalink
Post by Andreas Rohner
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...
The commit 136e877 was merged to v3.10 and backported to stable trees
of earlier kernels. But, most of earlier stable trees are no longer
- 3.4.y (backported commit 136e877)
- 3.10.y
- 3.14.y
I think these three kernels are worty to be tested.
I tested it on all stable kernels including 3.4.x, 3.10.x, 3.14.x. The
bug is present in all of them and the patch fixes it. The patch also
applies cleanly on all kernels. I sent it again yesterday, and added the
Tested-by: tag.
Thank you.

I'll pick up the new patch.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-18 07:24:16 UTC
Permalink
Post by Andreas Rohner
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...
The commit 136e877 was merged to v3.10 and backported to stable trees
of earlier kernels. But, most of earlier stable trees are no longer
- 3.4.y (backported commit 136e877)
- 3.10.y
- 3.14.y
I think these three kernels are worty to be tested.
I tested it on all stable kernels including 3.4.x, 3.10.x, 3.14.x. The
bug is present in all of them and the patch fixes it. The patch also
applies cleanly on all kernels. I sent it again yesterday, and added the
Tested-by: tag.
One thing I have a question.

Is the original issue that commit 136e877 fixed still OK ? If you
haven't tested it, I apprecicate if you examine the test for the prior
issue.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-18 10:44:16 UTC
Permalink
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
I'd appreciate your help on testing the patch for some old kernels.
(And, please declare a "Tested-by" tag in the reply mail, if the test
is ok).
Sure I have everything set up. Which kernels do I have to test? Was
commit 136e877 backported? I presume at least stable and some of the
longterm kernels on https://www.kernel.org/...
The commit 136e877 was merged to v3.10 and backported to stable trees
of earlier kernels. But, most of earlier stable trees are no longer
- 3.4.y (backported commit 136e877)
- 3.10.y
- 3.14.y
I think these three kernels are worty to be tested.
I tested it on all stable kernels including 3.4.x, 3.10.x, 3.14.x. The
bug is present in all of them and the patch fixes it. The patch also
applies cleanly on all kernels. I sent it again yesterday, and added the
Tested-by: tag.
One thing I have a question.
Is the original issue that commit 136e877 fixed still OK ? If you
haven't tested it, I apprecicate if you examine the test for the prior
issue.
Yes the original issue is still fixed. I was able to reproduce it by
reverting nilfs_set_page_dirty() to the state prior to commit 136e877.
Then I used the new version of nilfs_set_page_dirty() including my patch
and I couldn't reproduce the issue anymore. So it seems to still be fixed.

Furthermore the original issue was caused by the use of
__set_page_dirty_buffers(), which marked unmapped buffers as dirty. My
patch does not change the fix for that.

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-16 15:11:36 UTC
Permalink
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:

----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt

# create 30MB testfile
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

# simple tool that opens /mnt/testfile and
# writes a few blocks using mmap at a 5MB offset
/root/mmaptest/mmaptest /mnt/testfile 30 10 5

sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt

echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);

for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------

The output of the script looks something like this:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced with the following commit:

136e877 nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF
boundary

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if
the page has no buffers attached to it.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
Tested-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6252b17..9e3c525 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)

static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);

if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;

@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)

if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-18 14:56:25 UTC
Permalink
From: Andreas Rohner <andreas.rohner-***@public.gmane.org>

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:

----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt

dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

/root/mmaptest/mmaptest /mnt/testfile 30 10 5

sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt

echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);

for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------

The output of the script looks something like this:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced with the following commit:

136e877 nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF
boundary

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if
the page has no buffers attached to it.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
Tested-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke-***@public.gmane.org>
Cc: <stable-***@public.gmane.org>
---
fs/nilfs2/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6252b17..9e3c525 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)

static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);

if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;

@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)

if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}
--
1.7.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrew Morton
2014-09-18 19:17:08 UTC
Permalink
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
...
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)
static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);
if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;
@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)
if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
It's quite cosmetic, but it is conventional to use PAGE_CACHE_SHIFT here.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-09-18 23:41:07 UTC
Permalink
Post by Andrew Morton
Post by Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
...
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)
static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);
if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;
@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)
if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits);
It's quite cosmetic, but it is conventional to use PAGE_CACHE_SHIFT here.
Agreed.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-09-19 06:12:46 UTC
Permalink
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:

----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt

dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

/root/mmaptest/mmaptest /mnt/testfile 30 10 5

sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt

echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);

for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------

The output of the script looks something like this:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced with the following commit:

136e877 nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF
boundary

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if
the page has no buffers attached to it.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
Tested-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6252b17..6b0a241 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -219,10 +219,10 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc)

static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);

if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;

@@ -245,6 +245,10 @@ static int nilfs_set_page_dirty(struct page *page)

if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_CACHE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...