-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29776: Log filtering in IncrementalBackupManager can lead to data loss #7582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
krconv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great; and wanted to note, this has resolved a number of inconsistencies for us. Nearly all of our backups with a host that was taken offline recently (i.e. last ~30 days) was hitting some variation of this issue, where old WALs would be replayed and overwrite newer edits in the incremental backups.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hgromer, I agree HBASE-29776 is an issue (sorry for not responding sooner there, I was on vacation the past weeks), but I'm not yet convinced this is the right approach to fix it. It feels very complex to reason about, so I wonder if there isn't a simpler approach. Already wanted to give some intermediate feedback while I think a bit more about it.
- Since newTimestamps never is pruned, the entry in the backup table will keep growing over time.
newTimestampswill end up being written inBackupSystemTable#createPutForWriteRegionServerLogTimestamp, but these change no longer match the javadoc of that method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to simplify all changes in this file to (the fix for the excluded log files is also needed):
LOG.info("Execute roll log procedure for incremental backup ...");
long rollStartTs = EnvironmentEdgeManager.currentTime();
BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
Map<String, Long> rollTimestamps = readRegionServerLastLogRollResult();
Map<String, Long> newTimestamps =
rollTimestamps.entrySet().stream()
// Region servers that are offline since the last backup will have old roll timestamps,
// prune their information here, as it is not relevant to be stored or used for finding
// the relevant logs.
.filter(entry -> entry.getValue() > rollStartTs)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
// This method needs to be adjusted to use "rollStartTs" if an entry is not found in newTimestamps.
// Or alternatively: getLogFilesForNewBackup(previousTimestampMins,
// DefaultedMap.defaultedMap(newTimestamps, rollStartTs), conf, savedStartCode);
logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, conf, savedStartCode);
Then when finding which logs to include, these are the options:
- server found in both previous and newTimestamps: a region server that is unchanged, include logs older than previous and newer than newTimestamps
- server found in only previous: a region server that has gone offline, all logs will be older than rollStartTs and should be included
- server found in only newTimestamps: a new region server, include all logs that are older than the corresponding newTimestamp
- server found in neither: a region server that was started and went back offline in between the previous and current backup, all logs will be older than rollStartTs and should be included
This approach will keep newTimestamps limited to the relevant entries. We could consider cleaning up the entries for readRegionServerLastLogRollResult as well, but left that out of scope for now.
Similar code suffices in the FullTableBackupClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, appreciate you taking the time to look here. We've found a slew of bugs in the WAL retention system and I think it's important to get this right, so happy to iterate on feedback.
Since newTimestamps never is pruned, the entry in the backup table will keep growing over time.
Agree with this. It's something we should take a look at.
To your point about WAL retention and boundaries, conceptually, I've been trying to think about it from the perspective of "which WAL files have been backed up". Otherwise you run into issues when a host goes offline.
For example, in the case where
We have a Server A with row X
- An incremental backup is taken, A is rolled
- A writes more WAL files
- Row X is deleted
- A major compaction happens
- A full backup is taken, WALs are rolled, but we don't update the timestamp for A. Row X is not included in the full backup
- An incremental backup is taken, we still have a very old roll time for this host. Row X is backed up again, and shows up in the backup even though we had previously deleted (but the tombstone no longer exists).
So we've resurfaced dead data that shouldn't be included. It's problematic to back up WALs that are very old. So this is the main culprit for the added complexity here
Additionally, I'm weary of comparing timestamps across hosts, which is why I was wary of doing something like generating a boundary timestamp in the backup process, which happens client side and opted to compare WAL timestamps which are generated by the same host.
server found in both previous and newTimestamps: a region server that is unchanged, include logs older than previous and newer than newTimestamps
server found in only previous: a region server that has gone offline, all logs will be older than rollStartTs and should be included
If I understand correctly, run into this issue
server found in only newTimestamps: a new region server, include all logs that are older than the corresponding newTimestamp
server found in neither: a region server that was started and went back offline in between the previous and current backup, all logs will be older than rollStartTs and should be included
Agree here on the first backup this happens, but we never update the host TS and so we'll continue to backup the WAL files and run into the issue mentioned above.
I'd be more than happy to find a simpler solution though, I really don't love how complex this WAL retention boundary logic is; but struggled to do so and also avoid corrupting the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<String, Long> newTimestamps =
rollTimestamps.entrySet().stream()
// Region servers that are offline since the last backup will have old roll timestamps,
// prune their information here, as it is not relevant to be stored or used for finding
// the relevant logs.
.filter(entry -> entry.getValue() > rollStartTs)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));I'm not sure I necessarily agree with this logic. B/c the RS not being rolled this go around doesn't mean we've backed up all the files from the RS we need to backup. It just means the host doesn't exist on the cluster at the moment.
- Server A is backed up
- Server A generate more WAL files
- Server A is removed from cluster
- New backup occurs, but we don't get a roll time for Server A so we ignore its files
We need to backup the files that were generated between the last backup, and this backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, appreciate you taking the time to look here. We've found a slew of bugs in the WAL retention system and I think it's important to get this right, so happy to iterate on feedback.
Since newTimestamps never is pruned, the entry in the backup table will keep growing over time.
Agree with this. It's something we should take a look at.
To your point about WAL retention and boundaries, conceptually, I've been trying to think about it from the perspective of "which WAL files have been backed up". Otherwise you run into issues when a host goes offline.
For example, in the case where
We have a Server A with row X
1. An incremental backup is taken, A is rolled 2. A writes more WAL files 3. Row X is deleted 4. A major compaction happens 5. A full backup is taken, WALs are rolled, but we don't update the timestamp for A. Row X is _not_ included in the full backup 6. An incremental backup is taken, we still have a very old roll time for this host. Row X is backed up again, and shows up in the backup even though we had previously deleted (but the tombstone no longer exists).So we've resurfaced dead data that shouldn't be included. It's problematic to back up WALs that are very old. So this is the main culprit for the added complexity here
I agree with your example, and agree that the change to FullTableBackupClient would fix this. It also shrinks the newTimestamps, which nice.
Additionally, I'm weary of comparing timestamps across hosts, which is why I was wary of doing something like generating a boundary timestamp in the backup process, which happens client side and opted to compare WAL timestamps which are generated by the same host.
I see, I originally thought it might be less code to generate a client pre-roll timestamp, but it doesn't really simplify things. For the FullTableBackupClient at least, the current code is simple enough. I would suggest to split off a dedicated calculateNewTimestamps method with some proper javadoc. (Still thinking about the incremental case.)
| BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf); | ||
|
|
||
| newTimestamps = readRegionServerLastLogRollResult(); | ||
| Map<String, Long> newTimestamps = readRegionServerLastLogRollResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does an unnecessary scan, since you override all entries in the code you add below.
| long logTs = BackupUtils.getCreationTime(logPath); | ||
| Long latestTimestampToIncludeInBackup = newTimestamps.get(logHost); | ||
| if (latestTimestampToIncludeInBackup == null || logTs > latestTimestampToIncludeInBackup) { | ||
| LOG.info("Updating backup boundary for inactive host {}: timestamp={}", logHost, logTs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this log line correct? Are we dealing with an inactive host if latestTimestampToIncludeInBackup != null?
| logList = excludeProcV2WALs(logList); | ||
| backupInfo.setIncrBackupFileList(logList); | ||
|
|
||
| // Update boundaries based on WALs that will be backed up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, is this code block an optimization, or a necessary fix for a specific case of appearing/disappearing region servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured it out, it is to update the newTimestamps entries for regionservers that have since gone offline, but for which the logs are now backed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't gotten around to processing the changes in this file, but can you sketch why they are needed? Since your original ticket only discusses an issue with incremental backups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured it out, it's to ensure the newTimestamps for no longer active region servers are updated.
| Path logPath = new Path(logFile); | ||
| String logHost = BackupUtils.parseHostFromOldLog(logPath); | ||
| if (logHost == null) { | ||
| logHost = BackupUtils.parseHostNameFromLogFile(logPath.getParent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to support parsing old log names as well, is it possible to merge with the parsing 2 lines above? Though I am confused as to why the former uses logPath and the latter logPath.getParent()
| resultLogFiles.add(currentLogFile); | ||
| } | ||
|
|
||
| // It is possible that a host in .oldlogs is an obsolete region server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing this block entirely is wrong. I believe the semantics of newestTimestamps is "ensure we have everything backed up to this timestamp". So if currentLogTS > newTimestamp is true, we should indeed skip this file.
So I think this block should be kept, but adjusted to:
if (newTimestamp != null && currentLogTS > newTimestamp) {
newestLogs.add(currentLogFile);
}
I also think a similar issue is present for the .logs in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your comment in HBASE-29776:
newTimestamp represents the last time a backup rolled the WAL on the RS. If the RegionServer isn't running and therefore isn't able to roll the WAL again, then this timestamp will be in the past, and we end up filtering out all WAL files that were updated since then. Why would we filter out oldWALs that have been created since then? That seems wrong as well
Your comment is correct, but I think the better fix is to ensure the newTimestamps are correctly updated (as you do in your other changes). Removing this block would result in too many logs being included in the backup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (newTimestamp != null && currentLogTS > newTimestamp) {
newestLogs.add(currentLogFile);
}I don't think so, this would exclude all WAL files between last backup (previousTimestamps) and the current log roll (newTimestamp). Unless I'm misunderstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the relevant filtering out for very old log files happens here
So this logic is safe to remove imo
| } | ||
| allLogs.addAll(Arrays.asList(fs.listStatus(hostLogDir.getPath()))); | ||
| } | ||
| allLogs.addAll(Arrays.asList(fs.listStatus(oldLogDir))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's apparently a config key hbase.separate.oldlogdir.by.regionserver that adds the server address as a subfolder, so this line will not work as expected when that's set to true. (Solution is to also check files that are one folder deeper.)
Question: when exactly (and by what) are WALs moved to the oldWals folder (oldLogDir)?
cc @DieterDP-ng @krconv