[lldb] Edits and clarifications to DataFileCache comments, NFC by jasonmolenda · Pull Request #199787 · llvm/llvm-project

jasonmolenda · 2026-05-26T22:31:13Z

I was reading through Greg Clayton's DataFileCache PR and fixed a few small typeos as I went along.

I also had a little trouble understanding the two types of hashes that are calculated for a file, at first, and I tried to write comments for the relevant methods (in Module, ObjectFile, and DataFileCache) to be more explicit about their role and the role of the other hashes that are calculated. It may be more detail than necessary, but it would have been helpful for me while reading this through.

I was reading through Greg Clayton's DataFileCache PR and fixed a few small typeos as I went along. I also had a little trouble understanding the two types of hashes that are calculated for a file, at first, and I tried to write comments for the relevant methods (in Module, ObjectFile, and DataFileCache) to be more explicit about their role and the role of the other hashes that are calculated. It may be more detail than necessary, but it would have been helpful for me while reading this through.

llvmorg-github-actions · 2026-05-26T22:31:53Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

I was reading through Greg Clayton's DataFileCache PR and fixed a few small typeos as I went along.

I also had a little trouble understanding the two types of hashes that are calculated for a file, at first, and I tried to write comments for the relevant methods (in Module, ObjectFile, and DataFileCache) to be more explicit about their role and the role of the other hashes that are calculated. It may be more detail than necessary, but it would have been helpful for me while reading this through.

Full diff: https://github.com/llvm/llvm-project/pull/199787.diff

9 Files Affected:

(modified) lldb/include/lldb/Core/DataFileCache.h (+15-7)
(modified) lldb/include/lldb/Core/Module.h (+9-5)
(modified) lldb/include/lldb/Symbol/ObjectFile.h (+8-4)
(modified) lldb/include/lldb/Symbol/Symbol.h (+2-2)
(modified) lldb/include/lldb/Symbol/Symtab.h (+1-1)
(modified) lldb/include/lldb/Utility/DataEncoder.h (+2-2)
(modified) lldb/source/Core/DataFileCache.cpp (+4-4)
(modified) lldb/source/Core/Mangled.cpp (+2-2)
(modified) lldb/source/Symbol/Symtab.cpp (+2-2)

diff --git a/lldb/include/lldb/Core/DataFileCache.h b/lldb/include/lldb/Core/DataFileCache.h
index 8a233afaff386..e4cbfe50b70c7 100644
--- a/lldb/include/lldb/Core/DataFileCache.h
+++ b/lldb/include/lldb/Core/DataFileCache.h
@@ -99,13 +99,21 @@ class DataFileCache {
 
 /// A signature for a given file on disk.
 ///
-/// Any files that are cached in the LLDB index cached need some data that
-/// uniquely identifies a file on disk and this information should be written
-/// into each cache file so we can validate if the cache file still matches
-/// the file we are trying to load cached data for. Objects can fill out this
-/// signature and then encode and decode them to validate the signatures
-/// match. If they do not match, the cache file on disk should be removed as
-/// it is out of date.
+/// There are two hashes for each file in the LLDB Index Cache.  One
+/// hash uniquely identifies the binary, incorporating the name/filepath,
+/// triple, architecture -- things that do not change as the same binary
+/// is recompiled with update code.  ObjectFile::GetCacheHash() and
+/// Module::GetHash() provide these hashes.
+///
+/// The second hash, calculated in this class, is based on the mod date
+/// and UUID of the binary and does change on every recompile of the same
+/// binary.
+///
+/// The two hashes allow us to identify a file uniquely in the LLDB Index
+/// Cache and re-use the cache if the existing entry is a match for both.
+/// It also allows us to remove the entry for a previous build of a specific
+/// file, and save the new recompiled binary's symbol table in the LLDB Index
+/// Cache in its place.
 struct CacheSignature {
   /// UUID of object file or module.
   std::optional<UUID> m_uuid;
diff --git a/lldb/include/lldb/Core/Module.h b/lldb/include/lldb/Core/Module.h
index b605a9e3b4edc..d42bdf1f22e9c 100644
--- a/lldb/include/lldb/Core/Module.h
+++ b/lldb/include/lldb/Core/Module.h
@@ -1008,23 +1008,27 @@ class Module : public std::enable_shared_from_this<Module>,
   /// file, then the hash should include the object name and object offset to
   /// ensure a unique hash. Some examples:
   /// - just a regular object file (mach-o, elf, coff, etc) should create a hash
-  /// - a universal mach-o file that contains to multiple architectures,
+  /// - a universal mach-o file that contains multiple architectures,
   ///   each architecture slice should have a unique hash even though they come
   ///   from the same file
   /// - a .o file inside of a BSD archive. Each .o file will have an object name
   ///   and object offset that should produce a unique hash. The object offset
   ///   is needed as BSD archive files can contain multiple .o files that have
   ///   the same name.
+  ///
+  /// This hash value does not change as the binary is recompiled during
+  /// development.  DataFileCache's CacheSignature() creates a hash based
+  /// on the mod time/UUID to reflect when a binary has been recompiled.
   uint32_t Hash();
 
   /// Get a unique cache key for the current module.
   ///
   /// The cache key must be unique for a file on disk and not change if the file
   /// is updated. This allows cache data to use this key as a prefix and as
-  /// files are modified in disk, we will overwrite the cache files. If one file
-  /// can contain multiple files, like a universal mach-o file or like a BSD
-  /// archive, the cache key must contain enough information to differentiate
-  /// these different files.
+  /// files are modified on disk, we will overwrite the previous cache file. If
+  /// one file can contain multiple files, like a universal mach-o file or like
+  /// BSD archive, the cache key must contain enough information to
+  /// differentiate these different files.
   std::string GetCacheKey();
 
   /// Get the global index file cache.
diff --git a/lldb/include/lldb/Symbol/ObjectFile.h b/lldb/include/lldb/Symbol/ObjectFile.h
index 794bf3c9dac47..84adf6e835b07 100644
--- a/lldb/include/lldb/Symbol/ObjectFile.h
+++ b/lldb/include/lldb/Symbol/ObjectFile.h
@@ -740,12 +740,16 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
     return false;
   }
 
-  /// Get a hash that can be used for caching object file releated information.
+  /// Get a hash that can be used for caching object file related information.
   ///
   /// Data for object files can be cached between runs of debug sessions and
-  /// a module can end up using a main file and a symbol file, both of which
-  /// can be object files. So we need a unique hash that identifies an object
-  /// file when storing cached data.
+  /// a Module can end up using a ObjectFile and a SymbolFile, both of which
+  /// can be ObjectFiles in reality. We need a unique hash that identifies
+  /// an ObjectFile when storing cached data.
+  ///
+  /// The hash value returned does not change when the same binary is
+  /// rebuilt with changes; it does not incorporate a mod date or UUID
+  /// like DataFileCache's CacheSignature does.
   uint32_t GetCacheHash();
 
   static lldb::DataBufferSP MapFileData(const FileSpec &file, uint64_t Size,
diff --git a/lldb/include/lldb/Symbol/Symbol.h b/lldb/include/lldb/Symbol/Symbol.h
index 2ee70ced95a6d..77da9b779bcdf 100644
--- a/lldb/include/lldb/Symbol/Symbol.h
+++ b/lldb/include/lldb/Symbol/Symbol.h
@@ -276,11 +276,11 @@ class Symbol : public SymbolContextScope {
   ///
   /// \param offset_ptr
   ///   A pointer that contains the offset from which the data will be decoded
-  ///   from that gets updated as data gets decoded.
+  ///   from.  The offset_ptr offset value will be updated as data is read.
   ///
   /// \param section_list
   ///   A section list that allows lldb_private::Address objects to be filled
-  ///   in. The address information for symbols are serilized as file addresses
+  ///   in. The address information for symbols are serialized as file addresses
   ///   and must be converted into Address objects with the right section and
   ///   offset.
   ///
diff --git a/lldb/include/lldb/Symbol/Symtab.h b/lldb/include/lldb/Symbol/Symtab.h
index bb3e7ab65519c..224de9e775859 100644
--- a/lldb/include/lldb/Symbol/Symtab.h
+++ b/lldb/include/lldb/Symbol/Symtab.h
@@ -208,7 +208,7 @@ class Symtab {
   /// time when the debugger starts up. The index cache file for the symbol
   /// table has the modification time set to the same time as the main module.
   /// If the cache file exists and the modification times match, we will load
-  /// the symbol table from the serlized cache file.
+  /// the symbol table from the serialized cache file.
   ///
   /// \return
   ///   True if the symbol table was successfully loaded from the index cache,
diff --git a/lldb/include/lldb/Utility/DataEncoder.h b/lldb/include/lldb/Utility/DataEncoder.h
index 0b78161d87ef1..fe5502c781bdc 100644
--- a/lldb/include/lldb/Utility/DataEncoder.h
+++ b/lldb/include/lldb/Utility/DataEncoder.h
@@ -146,7 +146,7 @@ class DataEncoder {
   ///    will be determined by the address size specified in the constructor.
   void AppendAddress(lldb::addr_t addr);
 
-  /// Append a bytes to the end of the owned data.
+  /// Append bytes to the end of the owned data.
   ///
   /// Append the bytes contained in the string reference. This function will
   /// not append a NULL termination character for a C string. Use the
@@ -156,7 +156,7 @@ class DataEncoder {
   ///     A string reference that contains bytes to append.
   void AppendData(llvm::StringRef data);
 
-  /// Append a bytes to the end of the owned data.
+  /// Append bytes to the end of the owned data.
   ///
   /// Append the bytes contained in the array reference.
   ///
diff --git a/lldb/source/Core/DataFileCache.cpp b/lldb/source/Core/DataFileCache.cpp
index c35ee372a2e0a..83e7e88aa8bf3 100644
--- a/lldb/source/Core/DataFileCache.cpp
+++ b/lldb/source/Core/DataFileCache.cpp
@@ -285,10 +285,10 @@ uint32_t ConstStringTable::Add(ConstString s) {
 static const llvm::StringRef kStringTableIdentifier("STAB");
 
 bool ConstStringTable::Encode(DataEncoder &encoder) {
-  // Write an 4 character code into the stream. This will help us when decoding
-  // to make sure we find this identifier when decoding the string table to make
-  // sure we have the rigth data. It also helps to identify the string table
-  // when dumping the hex bytes in a cache file.
+  // Write a 4 character code into the stream. This will help us when decoding
+  // to make sure we find this identifier when decoding the string table.
+  // It also helps to identify the string table when dumping the hex bytes
+  // in a cache file.
   encoder.AppendData(kStringTableIdentifier);
   size_t length_offset = encoder.GetByteSize();
   encoder.AppendU32(0); // Total length of all strings which will be fixed up.
diff --git a/lldb/source/Core/Mangled.cpp b/lldb/source/Core/Mangled.cpp
index 794e94d4626fd..66c785d5871cc 100644
--- a/lldb/source/Core/Mangled.cpp
+++ b/lldb/source/Core/Mangled.cpp
@@ -513,7 +513,7 @@ bool Mangled::Decode(const DataExtractor &data, lldb::offset_t *offset_ptr,
 /// saves us a lot of compute time. For these kinds of names we only need to
 /// save the mangled name and have the encoding set to "MangledOnly".
 ///
-/// If a mangled obejct has only a demangled name, then we save only that string
+/// If a mangled object has only a demangled name, then we save only that string
 /// and have the encoding set to "DemangledOnly".
 ///
 /// Some mangled objects have both mangled and demangled names, but the
@@ -530,7 +530,7 @@ void Mangled::Encode(DataEncoder &file, ConstStringTable &strtab) const {
     if (m_demangled) {
       // We have both mangled and demangled names. If the demangled name is the
       // counterpart of the mangled name, then we only need to save the mangled
-      // named. If they are different, we need to save both.
+      // name. If they are different, we need to save both.
       ConstString s;
       if (!(m_mangled.GetMangledCounterpart(s) && s == m_demangled))
         encoding = MangledAndDemangled;
diff --git a/lldb/source/Symbol/Symtab.cpp b/lldb/source/Symbol/Symtab.cpp
index 9dc78bbba1154..2f226afeaf37c 100644
--- a/lldb/source/Symbol/Symtab.cpp
+++ b/lldb/source/Symbol/Symtab.cpp
@@ -1249,8 +1249,8 @@ bool Symtab::Encode(DataEncoder &encoder) const {
     return false;
   ConstStringTable strtab;
 
-  // Encoder the symbol table into a separate encoder first. This allows us
-  // gather all of the strings we willl need in "strtab" as we will need to
+  // Encode the symbol table into a separate encoder first. This allows us
+  // gather all of the strings we will need in "strtab" as we will need to
   // write the string table out before the symbol table.
   DataEncoder symtab_encoder(encoder.GetByteOrder(),
                               encoder.GetAddressByteSize());

jasonmolenda requested a review from clayborg May 26, 2026 22:31

jasonmolenda requested a review from JDevlieghere as a code owner May 26, 2026 22:31

llvmorg-github-actions Bot added the lldb label May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb] Edits and clarifications to DataFileCache comments, NFC#199787

[lldb] Edits and clarifications to DataFileCache comments, NFC#199787
jasonmolenda wants to merge 1 commit into
llvm:mainfrom
jasonmolenda:datafilecache-comment-touchups

jasonmolenda commented May 26, 2026

Uh oh!

llvmorg-github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jasonmolenda commented May 26, 2026

Uh oh!

llvmorg-github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant