Discussion:
[PATCH 0/1] add missing blkdev_issue_flush() to nilfs_sync_fs()
Andreas Rohner
2014-08-25 09:30:17 UTC
Permalink
Hi,

I do not know, if this patch is really necessary or not. I am not an
expert in how the BARRIER flag should be interpreted. So this patch is
more like a question, than a fix for any real bug.

Looking over the code I noticed, that nilfs_sync_file() gets called by
the fsync() syscall and it basically constructs a new partial segment
and calls blkdev_issue_flush().

nilfs_ioctl_sync(), which is used by the cleaner, also essentially works
the same way. It writes all the dirty files to disk and calls
blkdev_issue_flush().

nilfs_sync_fs() also writes out all the dirty files, but there is no
blkdev_issue_flush() at the end. At first I thought, that
nilfs_commit_super() may flush the block device anyway and therefore no
additional flush is necessary, but nilfs_sb_dirty(nilfs) is only set to
true if a new segment is started. So in the following scenario data
could be lost despite a call to sync():

1. Write out less data than a full segment
2. Call sync()
3. nilfs_sb_dirty() is false and nilfs_commit_super() is NOT called
4. Cut power to the device
5. Data loss

As I stated above, I am not sure if this is really necessary. Maybe I
have overlooked something obvious.

br,
Andreas Rohner


Andreas Rohner (1):
nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

fs/nilfs2/super.c | 6 ++++++
1 file changed, 6 insertions(+)
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-08-25 09:30:18 UTC
Permalink
This patch adds a call to blkdev_issue_flush() to nilfs_sync_fs(), which
is the nilfs implementation of the sync() syscall. If the BARRIER
mount option is set, both the nilfs implementation of fsync() and nilfs'
custom ioctl version of sync() used by the cleaner, use
blkdev_issue_flush() to guarantee that the data is written to the
underlying device. To get the same behaviour and guarantees for the
sync() syscall, blkdev_issue_flush() should also be called in
nilfs_sync_fs().

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/super.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 228f5bd..1f21e81 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -514,6 +514,12 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
}
up_write(&nilfs->ns_sem);

+ if (wait && !err && nilfs_test_opt(nilfs, BARRIER)) {
+ err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+ if (err != -EIO)
+ err = 0;
+ }
+
return err;
}
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-08-27 00:29:59 UTC
Permalink
Hi Andreas,
Post by Andreas Rohner
Hi,
I do not know, if this patch is really necessary or not. I am not an
expert in how the BARRIER flag should be interpreted. So this patch is
more like a question, than a fix for any real bug.
Looking over the code I noticed, that nilfs_sync_file() gets called by
the fsync() syscall and it basically constructs a new partial segment
and calls blkdev_issue_flush().
nilfs_ioctl_sync(), which is used by the cleaner, also essentially works
the same way. It writes all the dirty files to disk and calls
blkdev_issue_flush().
nilfs_sync_fs() also writes out all the dirty files, but there is no
blkdev_issue_flush() at the end. At first I thought, that
nilfs_commit_super() may flush the block device anyway and therefore no
additional flush is necessary, but nilfs_sb_dirty(nilfs) is only set to
true if a new segment is started. So in the following scenario data
1. Write out less data than a full segment
2. Call sync()
3. nilfs_sb_dirty() is false and nilfs_commit_super() is NOT called
4. Cut power to the device
5. Data loss
As I stated above, I am not sure if this is really necessary. Maybe I
have overlooked something obvious.
Your indication is right, we have a data integration issue for the
"nilfs_sb_dirty() is false" case in nilfs_sync_fs().

But, I rather would mitigate the cache flush overhead keeping data
integerity instead of simply adding the third blkdev_issue_flush()
call.

We don't have to call blkdev_issue_flush() if the last log was
written synchronously with a cache flush operation.
(this logic looks to be feasible by adding a flag.)

Also, the cache flush is not needed after writing super block; a disk
cache flush is needed BEFORE writing a super block to ensure that the
super block is pointing to a valid log, but a succeeding flush
operation is not needed because the pointer information is recoverable
with mount time recovery.

nilfs_sync_super() uses both FLUSH/FUA options for writing the primary
super block and the FUA option may be superfluous in that sense.
(we need to understand the precise semantics)

Can you improve the patch considering these view points ?

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-08-27 10:14:33 UTC
Permalink
Hi Ryusuke,
Post by Ryusuke Konishi
Post by Andreas Rohner
Looking over the code I noticed, that nilfs_sync_file() gets called by
the fsync() syscall and it basically constructs a new partial segment
and calls blkdev_issue_flush().
nilfs_ioctl_sync(), which is used by the cleaner, also essentially works
the same way. It writes all the dirty files to disk and calls
blkdev_issue_flush().
nilfs_sync_fs() also writes out all the dirty files, but there is no
blkdev_issue_flush() at the end. At first I thought, that
nilfs_commit_super() may flush the block device anyway and therefore no
additional flush is necessary, but nilfs_sb_dirty(nilfs) is only set to
true if a new segment is started. So in the following scenario data
1. Write out less data than a full segment
2. Call sync()
3. nilfs_sb_dirty() is false and nilfs_commit_super() is NOT called
4. Cut power to the device
5. Data loss
As I stated above, I am not sure if this is really necessary. Maybe I
have overlooked something obvious.
Your indication is right, we have a data integration issue for the
"nilfs_sb_dirty() is false" case in nilfs_sync_fs().
But, I rather would mitigate the cache flush overhead keeping data
integerity instead of simply adding the third blkdev_issue_flush()
call.
We don't have to call blkdev_issue_flush() if the last log was
written synchronously with a cache flush operation.
(this logic looks to be feasible by adding a flag.)
Also, the cache flush is not needed after writing super block; a disk
cache flush is needed BEFORE writing a super block to ensure that the
super block is pointing to a valid log, but a succeeding flush
operation is not needed because the pointer information is recoverable
with mount time recovery.
Yes that's true.
Post by Ryusuke Konishi
nilfs_sync_super() uses both FLUSH/FUA options for writing the primary
super block and the FUA option may be superfluous in that sense.
(we need to understand the precise semantics)
I have looked into that. According to
"Documentation/block/writeback_cache_control.txt" the FLUSH option makes
sure, that all previous write requests are in non-volatile storage and
the FUA option makes sure, that the current request only returns
successful if it was written to non-volatile storage. Since it doesn't
matter if the write to the super block is lost, because it can be
recovered, the FUA option is not necessary. But the name of the function
nilfs_sync_super() kind of suggests, that it guarantees, that the super
block is written to non-volatile storage so I don't know if we should
remove the FUA flag.

This also means, that if nilfs_sync_super() was called no additional
blkdev_issue_flush() is necessary. So if the super block was written
during segment construction there is also no need for an additional
blkdev_issue_flush().
Post by Ryusuke Konishi
Can you improve the patch considering these view points ?
Yes I will work on it.

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...