Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
<!--
// SPDX-License-Identifier: GPL-3.0-only

@file README.md
@copyright Copyright (C) 2026 srcML, LLC. (www.srcML.org)
This file is part of the nameCollector application.
-->
# nameCollector
A tool for collecting all user-defined identifier names from a source code file.

Works for C, C++, C#, Java, and Python

Input: A srcML file of source code with --position option. srcML file can be a single unit (one source code file) or an archive (multiple source code files).

Output: A list of identifier names, their type (for declartions and functions), their syntactic category, the file name, and position (line:column) the identifier occurs (declared), the programming langauge, and for methods and classes their stereotype, from [stereocode](https://github.com/srcML/stereocode), if it is in the srcML. Output is plain text (default) or csv with column headings (as below).
Output: A list of identifier names, their type (for declartions and functions), their syntactic category, the file name, and position (line:column) the identifier occurs (declared), the programming langauge, and for methods and classes their stereotype, from [stereocode](https://github.com/srcML/stereocode), if in the srcML. Output is plain text (default) or csv with column headings (as below).

Example:

Expand Down Expand Up @@ -44,7 +51,7 @@ Example:


## Python notes:
In Python, globals, locals, and fields are collected at their first appearance. If a name is assigned to twice within a scope, only the first use of that name will be collected.
In Python, globals, locals, and fields are collected at their first appearance. If a name is assigned more than once within a scope, only the first use of the name is collected.
Additionally, type information is NOT collected for any Python variables or functions.


Expand Down Expand Up @@ -83,15 +90,17 @@ Output is plain text by default. Use -f csv or --csv for comma separated output

## Developer Notes:

The initial version of the application was developed by Decker from the srcSAX examples in June 2023. This was extended to collect the different types of names by Maletic. Maletic added the CLI11 interface and made the first public release (July 2023). Testa added testing framework and testsuite in summer 2025. Behler added support for Python alongside the 1.1.0 release of srcML (August 2025).
The initial version of the application was developed by Decker from the srcSAX examples in June 2023. This was extended to collect the different types of names by Maletic. Maletic added the CLI11 interface and made the first public release (July 2023). Testa added testing framework and testsuite in summer 2025. Behler added support for Python alongside the 1.1.0 release of srcML (August 2025). Sipanhioglu fixed a memory bug June 2026. Ramadan set up Docker image in June 2026.

nameCollector is a good simple example of how to use srcSAX to build fast and scalable tools for collecting analysis information.

Developers of nameCollector:
- Ali Al-Ramadan
- Joshua Behler
- Michael Collard
- Michael Decker
- Jonathan Maletic
- John Sipanhioglu
- Sophia Testa


59 changes: 36 additions & 23 deletions nameCollectorHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
/**
* @file nameCollectorHandler.cpp
*
* @copyright Copyright (C) 2013-2023 srcML, LLC. (www.srcML.org)
* @copyright Copyright (C) 2013-2026 srcML, LLC. (www.srcML.org)
*
* This file is part of the nameCollector application.
*/

/** Modified by MaleticJuly 2023.
/**
*
* Collects all user defined names in a given C, C++, C#, Java file
*
Expand Down Expand Up @@ -182,7 +182,8 @@ class nameCollectorHandler : public srcSAXHandler {
// this is adding all elements, so you might only want to push certain elements


std::string back = elementStack.back();
std::string back = "";
if (!elementStack.empty()) back = elementStack.back();

if (back == "name" && std::string(localname) == "name") // Top-level Names
elementStack.push_back("name_2");
Expand Down Expand Up @@ -266,8 +267,7 @@ class nameCollectorHandler : public srcSAXHandler {
scopeStack.push_back(classScope);
}

//Need to collect some type info for struct and anonymous struct
// struct foo { } x; // x has type foo
//Need to collect some type info for struct and anonymous struct
// struct { } x; // x has type struct
if (isStruct(std::string(localname))) {
typeInfo insertType;
Expand All @@ -277,12 +277,12 @@ class nameCollectorHandler : public srcSAXHandler {
}

//Stop gathering contents of structs when a block is encountered
if ((std::string(localname) == "block") && (typeStack.size() != 0)) {
if ((std::string(localname) == "block") && (!typeStack.empty())) {
if (isStruct(typeStack[typeStack.size()-1].associatedTag)) {
typeStack[typeStack.size()-1].gatherContent = false;
}
}

if (isStereotypableCategory(localname)) {
// Check for stereotype information from stereocode
for (int i = 0; i < numAttributes; ++i) {
Expand Down Expand Up @@ -319,10 +319,10 @@ class nameCollectorHandler : public srcSAXHandler {
* Overide for desired behaviour.
*/
virtual void endUnit(const char* localname, const char* prefix, const char* URI) {
if (elementStack.size() != 0) elementStack.clear();
if (scopeStack.size() != 0) scopeStack.clear();
if (typeStack.size() != 0) typeStack.clear();
if (stereotypeStack.size() != 0) stereotypeStack.clear();
if (!elementStack.empty()) elementStack.clear();
if (!scopeStack.empty()) scopeStack.clear();
if (!typeStack.empty()) typeStack.clear();
if (!stereotypeStack.empty()) stereotypeStack.clear();
}

/**
Expand All @@ -341,12 +341,12 @@ class nameCollectorHandler : public srcSAXHandler {
bool isComplexName = false;
if ((std::string(localname) == "name") && (content != "") && inIndexCount == 0) {
size_t nameDepth = 0;
if (elementStack.size() != 0 && elementStack.back() == "name") {
if (!elementStack.empty() && elementStack.back() == "name") {
category = elementStack.size() >= 2 ? elementStack[elementStack.size()-2] : ""; //Normal name
nameDepth = 1;
complexNameCount = 0;
}
else if (elementStack.size() != 0) {
else if (!elementStack.empty()) {
nameDepth = std::stoi(elementStack.back().substr(5));
category = elementStack.size() >= (nameDepth + 1) ? elementStack[elementStack.size()-(nameDepth+1)] : "";
isComplexName = true;
Expand Down Expand Up @@ -399,8 +399,8 @@ class nameCollectorHandler : public srcSAXHandler {
//Deal with complex function names
//If it is a function name, collect the complex name ex. String::length, String::operator+=
//If it is a decl collect simple name only
if (((category == "destructor") || (category == "constructor") || (category == "function") || (category == "decl")) && (elementStack.back() != "name")) {
if (elementStack.size() != 0) elementStack.pop_back();
if (((category == "destructor") || (category == "constructor") || (category == "function") || (category == "decl")) && ((!elementStack.empty()) && (elementStack.back() != "name"))) {
if (!elementStack.empty()) elementStack.pop_back();
return;
}

Expand All @@ -418,14 +418,26 @@ class nameCollectorHandler : public srcSAXHandler {
}

//Get type from type stack of <type> and <struct>
//Deals with anonymous struct etc.
std::string type = "";
if (isTypedCategory(category) && (typeStack.size() >= 1) && !isUntypedLanguage()) {
if ((category == "field") && (typeStack[typeStack.size()-1].type.find("enum") != std::string::npos)) {
std::string type = ""; //Deal with enum fields without a type
} else {
type = typeStack.size() >= 1 ? typeStack[typeStack.size()-1].type : "";
replaceSubStringInPlace(type, ",", "&#44;");
replaceSubStringInPlace(type, "\n", "");
//Deal with typedefs with structs etc.
if (typeStack.size() >= 1 && typeStack[typeStack.size()-1].associatedTag == "typedef") {
type = typeStack[typeStack.size()-1].type;
size_t blank = type.find(' ');
if (blank != std::string::npos) {
if (type.substr(0, blank).find("struct")!= std::string::npos) type = "struct";
if (type.substr(0, blank).find("enum")!= std::string::npos) type = "enum";
if (type.substr(0, blank).find("class")!= std::string::npos) type = "class";
if (type.substr(0, blank).find("union")!= std::string::npos) type = "union";
}
}
else
type = typeStack.size() >= 1 ? typeStack[typeStack.size()-1].type : "";

if (typeStack.size() >= 1 && type == typeStack[typeStack.size()-1].associatedTag + " ")
replaceSubStringInPlace(type, " ", "");
if (typeStack.size() >= 1 && isStruct(typeStack[typeStack.size()-1].associatedTag)) {
Expand All @@ -436,8 +448,9 @@ class nameCollectorHandler : public srcSAXHandler {
}
}

std::string stereotype = (isStereotypableCategory(category) && stereotypeStack.size() != 0 ? stereotypeStack[stereotypeStack.size() - 1] : "");
if (stereotypeStack.size() != 0) stereotypeStack.pop_back();
std::string stereotype = (isStereotypableCategory(category) && !stereotypeStack.empty() ?
stereotypeStack[stereotypeStack.size() - 1] : "");
if (!stereotypeStack.empty()) stereotypeStack.pop_back();

//Remove any prefix String:: from context - for functions
if (content.find("::") != std::string::npos)
Expand Down Expand Up @@ -630,7 +643,7 @@ class nameCollectorHandler : public srcSAXHandler {
if (typeStack[typeStack.size()-1].associatedTag == localname)
typeStack.pop_back();

if (elementStack.size() != 0) elementStack.pop_back();
if (!elementStack.empty()) elementStack.pop_back();

if (std::string(localname) == "operator" && isNoDeclLanguage()) {
// If at an = operator in expr_stmt, output and then clear the expressions name list
Expand Down Expand Up @@ -672,14 +685,14 @@ class nameCollectorHandler : public srcSAXHandler {
elementStack.push_back("init"); // Deal with namespace foo = x::y;
}
if (std::string(localname) == "namespace" && category != "" && !isNoDeclLanguage()) {
if (elementStack.size() != 0) elementStack.pop_back(); // Deal with namespace foo = x::y;
if (!elementStack.empty()) elementStack.pop_back(); // Deal with namespace foo = x::y;
}

// If in a no decl language, need to keep track of scope
if (isNoDeclLanguage() && (std::string(localname) == "function" ||
std::string(localname) == "lambda" ||
std::string(localname) == "class")) {
if (scopeStack.size() != 0) scopeStack.pop_back();
if (!scopeStack.empty()) scopeStack.pop_back();
}

if (isNoDeclLanguage() && (std::string(localname) == "expr_stmt" ||
Expand Down
12 changes: 6 additions & 6 deletions testsuite/test_c_typedef.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,21 +45,21 @@ EOF

input=$(srcml test_typedef.c --position)
output=$(echo "$input" | ./nameCollector )
expected="Integer is typedef in C file: test_typedef.c:3:13
expected="Integer is a int typedef in C file: test_typedef.c:3:13
x is a int field in C file: test_typedef.c:6:9
y is a int field in C file: test_typedef.c:7:9
Point is a typedef of struct in C file: test_typedef.c:8:3
Point is a struct typedef in C file: test_typedef.c:8:3
PointAgain is a struct in C file: test_typedef.c:11:16
a is a int field in C file: test_typedef.c:12:9
b is a int field in C file: test_typedef.c:13:9
namedStructPoint is a typedef of struct in C file: test_typedef.c:14:3
namedStructPoint is a struct typedef in C file: test_typedef.c:14:3
Color is a enum in C file: test_typedef.c:16:14
RED is a field in C file: test_typedef.c:17:5
GREEN is a field in C file: test_typedef.c:18:5
BLUE is a field in C file: test_typedef.c:19:5
ColorEnum is a typedef of enum in C file: test_typedef.c:20:3
ColorEnum is a enum typedef in C file: test_typedef.c:20:3
functionPointer is a int function in C file: test_typedef.c:22:15
characterArrayPtr is a typedef in C file: test_typedef.c:24:15
characterArrayPtr is a char* typedef in C file: test_typedef.c:24:15
add is a int function in C file: test_typedef.c:26:5
x is a int parameter in C file: test_typedef.c:26:13
y is a int parameter in C file: test_typedef.c:26:20
Expand All @@ -80,4 +80,4 @@ fi
echo "Test test_c_typedef passed!"
# Repeat tests

exit 0
exit 0
15 changes: 8 additions & 7 deletions testsuite/test_cpp_typedef.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
cat <<EOF > test_typedef.cpp
#include <vector>
#include <iostream>
typedef int Integer;

typedef int Integer;
typedef std::vector<int> int_vector;
typedef char* char_array[5]; //array of char ptr
typedef int (*functionPtr)(int, int); // with fxn pointer
Expand All @@ -27,17 +28,17 @@ EOF

input=$(srcml test_typedef.cpp --position)
output=$(echo "$input" | ./nameCollector )
expected="Integer is a typedef in C++ file: test_typedef.cpp:4:13
int_vector is a typedef in C++ file: test_typedef.cpp:5:21
char_array is a typedef in C++ file: test_typedef.cpp:6:15
expected="Integer is a int typedef in C++ file: test_typedef.cpp:4:13
int_vector is a std::vector<int> typedef in C++ file: test_typedef.cpp:5:26
char_array is a char* typedef in C++ file: test_typedef.cpp:6:15
functionPtr is a int function in C++ file: test_typedef.cpp:7:15
Point is a struct in C++ file: test_typedef.cpp:10:16
x is a int field in C++ file: test_typedef.cpp:11:9
y is a int field in C++ file: test_typedef.cpp:11:12
pt is a typedef of struct in C++ file: test_typedef.cpp:12:3
pt is a struct typedef in C++ file: test_typedef.cpp:12:3
v is a int field in C++ file: test_typedef.cpp:15:9
w is a int field in C++ file: test_typedef.cpp:15:12
anon_struct_typedef is a typedef of struct in C++ file: test_typedef.cpp:16:3
anon_struct_typedef is a struct typedef in C++ file: test_typedef.cpp:16:3
main is a int function in C++ file: test_typedef.cpp:18:5"

if [[ "$output" != "$expected" ]]; then
Expand All @@ -49,4 +50,4 @@ fi
echo "Test test_cpp_typedef passed!"
# Repeat tests

exit 0
exit 0