Namenode-bootstrapStandby
- process
- Synchronize metadata command
- Source code interpretation
-
- Configuration parsing
- sync metadata
- Download fsImage file
- Check whether shareEditsLog exists
- Download fsImage
- Finally sync metadata complete
Process
- Obtain nameserviceId and namenodeId according to configuration items
- Obtain other namenode information and establish rpc communication.
- Determine whether the configuration item
dfs.namenode.support.allow.format
allows formatting, and it is recommended to configure it in general production environments to prevent formatting of existing data by mistake. - Get the formatted directory (fsImage and edits storage directory, and sharedEditsDirs configuration).
- format directory, create current directory, write VERSION file and seen_txid file
- Check whether the editlog file between the last checkpoint and the latest curtxid exists in qjm.
- Download the fsImage file generated by the latest checkpoint from the remote namenode
- The whole process is formatted.
Sync metadata command
hdfs namenode [-bootstrapStandby [-force] [-nonInteractive] [-skipSharedEditsCheck]] # Commonly used commands hdfs namenode -bootstrapStandby
Source Code Interpretation
Configuration analysis
Entry org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run
method
This step does the following:
- Get cluster configuration information
- Find the remote Namenode and get the first one
- Check whether it can be formatted
- Call the specific synchronization process
public int run(String[] args) throws Exception {<!-- --> // parse command line arguments parseArgs(args); // Disable using the RPC tailing mechanism for bootstrapping the standby // since it is less efficient in this case; see HDFS-14806 conf.setBoolean(DFSConfigKeys.DFS_HA_TAILEDITS_INPROGRESS_KEY, false); // Parse configuration, get cluster information, find remoteNN parseConfAndFindOtherNN(); NameNode. checkAllowFormat(conf); InetSocketAddress myAddr = DFSUtilClient.getNNAddress(conf); SecurityUtil.login(conf, DFS_NAMENODE_KEYTAB_FILE_KEY, DFS_NAMENODE_KERBEROS_PRINCIPAL_KEY, myAddr.getHostName()); return SecurityUtil.doAsLoginUserOrFatal(new PrivilegedAction<Integer>() {<!-- --> @Override public Integer run() {<!-- --> try {<!-- --> // Execute sync metadata return doRun(); } catch (IOException e) {<!-- --> throw new RuntimeException(e); } } }); }
Sync metadata
When executing doRun, it integrates the whole process, mainly doing the following things:
- Create a proxy object for remoteNN
- format directory file, create VERSION/seen_txid file
- Ready to download fsImage
private int doRun() throws IOException {<!-- --> // find the active NN NamenodeProtocol proxy = null; NamespaceInfo nsInfo = null; boolean isUpgradeFinalized = false; RemoteNameNodeInfo proxyInfo = null; // The entire large section is creating the proxy object of nn. Through the loop, find the first one that meets the requirements. for (int i = 0; i < remoteNNs. size(); i ++ ) {<!-- --> proxyInfo = remoteNNs. get(i); InetSocketAddress otherIpcAddress = proxyInfo.getIpcAddress(); proxy = createNNProtocolProxy(otherIpcAddress); try {<!-- --> // Get the namespace from any active NN. If you just formatted the primary NN and are // bootstrapping the other NNs from that layout, it will only contact the single NN. // However, if there cluster is already running and you are adding a NN later (e.g. // replacing a failed NN), then this will bootstrap from any node in the cluster. nsInfo = proxy.versionRequest(); isUpgradeFinalized = proxy.isUpgradeFinalized(); break; } catch (IOException ioe) {<!-- --> LOG.warn("Unable to fetch namespace information from remote NN at " + otherIpcAddress + ": " + ioe. getMessage()); if (LOG.isDebugEnabled()) {<!-- --> LOG.debug("Full exception trace", ioe); } } } if (nsInfo == null) {<!-- --> LOG. error( "Unable to fetch namespace information from any remote NN. Possible NameNodes: " + remoteNNs); return ERR_CODE_FAILED_CONNECT; } // Judge layout, currently -66 if (!checkLayoutVersion(nsInfo)) {<!-- --> LOG.error("Layout version on remote node (" + nsInfo.getLayoutVersion() + ") does not match " + "this node's layout version (" + HdfsServerConstants. NAMENODE_LAYOUT_VERSION + ")"); return ERR_CODE_INVALID_VERSION; } // print cluster information System.out.println( "==================================================== ====\ " + "About to bootstrap Standby ID " + nnId + " from:\ " + " Nameservice ID: " + nsId + "\ " + " Other Namenode ID: " + proxyInfo.getNameNodeID() + "\ " + " Other NN's HTTP address: " + proxyInfo.getHttpAddress() + "\ " + " Other NN's IPC address: " + proxyInfo.getIpcAddress() + "\ " + " Namespace ID: " + nsInfo.getNamespaceID() + "\ " + " Block pool ID: " + nsInfo.getBlockPoolID() + "\ " + " Cluster ID: " + nsInfo.getClusterID() + "\ " + " Layout version: " + nsInfo. getLayoutVersion() + "\ " + "isUpgradeFinalized: " + isUpgradeFinalized + "\ " + "==================================================== ===="); // Create the storage object to be formatted NNStorage storage = new NNStorage(conf, dirsToFormat, editUrisToFormat); if (!isUpgradeFinalized) {<!-- --> //...Omit the upgrade related part of the code } else if (!format(storage, nsInfo)) {<!-- --> // prompt the user to format storage This step is to create the VERSION/seen_txid file return ERR_CODE_ALREADY_FORMATTED; } // download the fsimage from active namenode // Download the fsImage file from remoteNN via http. int download = downloadImage(storage, proxy, proxyInfo); if (download != 0) {<!-- --> return download; } //...omit part of the code }
Download fsImage file
private int downloadImage(NNStorage storage, NamenodeProtocol proxy, RemoteNameNodeInfo proxyInfo) throws IOException {<!-- --> // Load the newly formatted image, using all of the directories // (including shared edits) // Get the latest checkpointTxid final long imageTxId = proxy.getMostRecentCheckpointTxId(); // Get the current transaction id final long curTxId = proxy. getTransactionID(); FSImage image = new FSImage(conf); try {<!-- --> // Assign cluster information to image image.getStorage().setStorageInfo(storage); // Create a journalSet object and set the state to OPEN_FOR_READING image.initEditLog(StartupOption.REGULAR); assert image.getEditLog().isOpenForRead() : "Expected edit log to be open for read"; // Ensure that we have enough edits already in the shared directory to // start up from the last checkpoint on the active. // Get the editLogs data from curTxId to imageTxId from the shared qjm if (!skipSharedEditsCheck & amp; & amp; !checkLogsAvailableForRead(image, imageTxId, curTxId)) {<!-- --> return ERR_CODE_LOGS_UNAVAILABLE; } // Download fsImage via http, the file name is fsimage.ckpt, and write it to the storage directory. // Download that checkpoint into our storage directories. MD5Hash hash = TransferFsImage.downloadImageToStorage( proxyInfo.getHttpAddress(), imageTxId, storage, true, true); // Save the md5 value of fsImage, and rename fsImage to be officially ckpt-free. image.saveDigestAndRenameCheckpointImage(NameNodeFile.IMAGE, imageTxId, hash); // write seen_txid to directory // Write seen_txid to the formatted image directories. storage.writeTransactionIdFileToStorage(imageTxId, NameNodeDirType.IMAGE); } catch (IOException ioe) {<!-- --> throw ioe; } finally {<!-- --> image. close(); } return 0; }
Check whether shareEditsLog exists
First look at checkLogsAvailableForRead
This step is mainly to obtain the log stream of editlogs between imageTxId and curTxId from QJM
focus directly
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams
method
public Collection<EditLogInputStream> selectInputStreams(long fromTxId, long toAtLeastTxId, MetaRecoveryContext recovery, boolean inProgressOk, boolean onlyDurableTxns) throws IOException {<!-- --> List<EditLogInputStream> streams = new ArrayList<EditLogInputStream>(); synchronized(journalSetLock) {<!-- --> Preconditions. checkState(journalSet. isOpen(), "Cannot call " + "selectInputStreams() on closed FSEditLog"); // Get editLogs from shared qjm and save selectInputStreams(streams, fromTxId, inProgressOk, onlyDurableTxns); } try {<!-- --> // Check if there is a gap checkForGaps(streams, fromTxId, toAtLeastTxId, inProgressOk); } catch (IOException e) {<!-- --> if (recovery != null) {<!-- --> // If recovery mode is enabled, continue loading even if we know we // can't load up to AtLeastTxId. LOG.error("Exception while selecting input streams", e); } else {<!-- --> closeAllStreams(streams); throw e; } } return streams; }
Download fsImage
public static MD5Hash downloadImageToStorage(URL fsName, long imageTxId, Storage dstStorage, boolean needDigest, boolean isBootstrapStandby) throws IOException {<!-- --> String fileid = ImageServlet. getParamStringForImage(null, imageTxId, dstStorage, isBootstrapStandby); String fileName = NNStorage. getCheckpointImageFileName(imageTxId); List<File> dstFiles = dstStorage. getFiles( NameNodeDirType.IMAGE, fileName); if (dstFiles. isEmpty()) {<!-- --> throw new IOException("No targets in destination storage!"); } // download and return md5 value MD5Hash hash = getFileClient(fsName, fileid, dstFiles, dstStorage, needDigest); LOG.info("Downloaded file " + dstFiles.get(0).getName() + "size" + dstFiles.get(0).length() + "bytes."); return hash; }
Finally synchronized metadata completed
The following data is stored in the data directory of another node:
── current ├── fsimage_0000000000000000000 ├── fsimage_0000000000000000000.md5 ├── seen_txid └── VERSION 1 directory, 4 files
I hope it will be helpful to you who are viewing the article, remember to pay attention, comment, and favorite, thank you