Skip to content

Transaction only topology_hiding enhancements#3908

Open
davidtrihy-genesys wants to merge 12 commits into
OpenSIPS:masterfrom
purecloudlabs:topo_hiding_genesys
Open

Transaction only topology_hiding enhancements#3908
davidtrihy-genesys wants to merge 12 commits into
OpenSIPS:masterfrom
purecloudlabs:topo_hiding_genesys

Conversation

@davidtrihy-genesys

@davidtrihy-genesys davidtrihy-genesys commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Additional features to the topology_hiding module in transaction only mode.

  1. Compact encoding of the thinfo param
  2. One way hiding based on socket tags, this allows for asymmetric transaction topology hiding where the request may not strip any information but the response does or vice versa
  3. Selection of the encoding algorithm, legacy or new compact encoding along with password rotation support
  4. New autoroute functionality toggleable on or off

Details

A strong effort was made to not alter dialog topology_hiding functionality and only target transaction functionality but there was a refactor done to separate the logic as the transaction aware logic grew in size due to the changes so a lot of the common code was refactored into separate source files to make development easier and additional functionality that could be common was added to those same files

  • This is the refactor of the logic in a commit 75ff8a6bdd87466f44a5c8dbeac417cf82493ae4 the idea is to show that there is no functionality change that would affect the dialog aware module
  • This is the new codec for the compact encoding 44b86215d8bc06137e4e7a44ad549f645576d21b
  • pseudovars support commit ac6d588b96741d6aaf6597528b56d2abc6462889
  • This is the modparam changes bbba1f626f1478a040e947ef246da9bd66656ba3

The major logic rework is part of this commit here 11783c1c4faa561541e7b0e26ee4bffa9adb88a0 this is where the bulk of the work was done to support the new model of topology_hiding there are other smaller commits too but these ones give the bulk of changes.

Once the modparam name and scheme has been agreed upon the documentation can be added in another commit before merged the PR

Socket tag matching

Purpose of these tags is to define the sockets that will be used for topology hiding, internal is when the request comes in and the socket is switched, it will allow the topology_hiding to be engaged but also not strip topology information for requests and only for responses, the inverse applies for requests initiated from internal.

External is purely for failover scenarios, in this case if you don't have a DNS advertised for your internal socket and that instance fails, you can do some script initiated failover to another instance that may have the same IP address but has the same tag and it should be able to decode the socket information using the tags to match the information as opposed to IP addresses
If this tag is not set then failover will not work in this scenario, this only affects the auto_route

modparam("topology_hiding", "th_internal_trusted_tag", "my_internal_socket_tag")
modparam("topology_hiding", "th_external_socket_tag", "my_external_socket_tag")

socket = tcp:203.0.100.10:5060 tag my_external_socket_tag
socket = udp:10.100.20.40:5080 tag my_internal_socket_tag

auto route modparam

This modparam is purely to generate an auto route on the trusted internal socket when one way hiding is engaged due to no Contact header update, the script writer can turn it off and choose to do sequential requests in the script

modparam("topology_hiding", "th_auto_route_on_trusted_socket", 1)

param encoding and password

Allows for setting the param name, password and encoding type and choosing which one to use when doing topology_hiding will allow two definitions and supports password rotation

modparam("topology_hiding", "th_contact_encode_param_password", "thinfo:password1:C")
modparam("topology_hiding", "th_use_param", "thinfo")

Solution

The solution has some limitations that could be addressed in future with modparams but I don't think it's entirely necessary for now

  1. New pseudovars do not support legacy encoding scheme, they were written to directly read the decoded buffer of the compact encoding rather than copy
  2. Parts of the buffer management were moved from using pkg_malloc to using static buffers of fixed size, the sizing is 4096 bytes and if the decoding goes above that size it will fail to decode, this could also be made as a modparam and the allocation of the buffer could be done at module init time, the idea is to remove unneeded dynamic allocations during transaction processing
  3. The amount of URIs that can be decoded is fixed at 12, it was a random number I picked, once again it could be made as a modparam or fully dynamic

These listed limitations only really apply to the new encoding type and the target usage of this algorithm is in the one way hiding mode of operation which the script writer should be aware of the internal SIP message Routes and Contact headers so they should be aware of the limitations, in any case I plan to address them in a follow up PR

Compatibility

Retains full backwards compatibility with the current mode of operation.

sipssert tests PR - OpenSIPS/sipssert-opensips-tests#32

Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
Comment thread modules/topology_hiding/th_no_dlg_logic.c
@liviuchircu liviuchircu self-requested a review June 11, 2026 15:39

@liviuchircu liviuchircu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @davidtrihy-genesys ,

I managed to complete the review, and I'll split the action points into: "Critical" and "Bugs / Quirks", the latter of which can be left to solve/assess during the remainder of the beta phase. All feedback below refers to the "non-dialog" TH, of course, since the dialog TH is left unchanged.

So if I understand correctly, before we had ;thinfo= working statelessly, while now it works in a hybrid way. Because it both stores some transactional-local data via callback param, but also stores data in the ;thinfo= just as before, only in a compact way. There is also a 3rd storage (global buffers) for the new $TH_decoded_routes, $TH_decoded_contact variables, but that seems OK.

Here, just to make sure I understood how it works: you need to store data in the transaction in order to restore Via/RR for the replies. But you also encode, using the new codec utilities, some minimal amount of data in the ;thinfo=, which is used for future sequential requests and cross-transaction routing, such as the SIP route set information. Is this accurate? If so, perhaps these goals or invariants of the transactional TH support should be clearly stated (code or docs, both can work), in order to help maintainers know what to expect.

Critical Issues

There is only one "stopper" kind of issue before we can merge:

Licensing / Ownership

The new th_common_logic.[ch] and th_no_dlg_logic.[ch] files contain a significant amount of code moved from existing project sources, yet the current header appears to suggest sole Genesys ownership of code that predates this contribution. Example functions which are near-exact copies of older code:

  • topo_parse_passed_params()
  • topo_parse_passed_ct_params()
  • topo_parse_passed_hdr_ct_params()
  • topo_delete_record_routes()
  • topo_delete_vias()
  • delete_existing_contact()
  • restore_vias_from_req()
  • topo_ct_param_len()
  • topo_ct_param_copy()

... and functions which still clearly retain old implementation chunks or logic:

  • build_encoded_contact_suffix_legacy()
  • th_no_dlg_encode_contact()
  • decode_info_buffer_legacy()
  • th_no_dlg_seq_handling()

To fix this, I would suggest simply appending the original OpenSIPS Foundation copyright to the existing Genesys copyright, in all of these four files:

Copyright (C) 2026 Genesys Cloud Services, Inc.
Copyright (C) 2015 OpenSIPS Foundation

Bugs / Quirks

All of these are results of static analysis, that's why I'm lowering the prio on them. All of them can be left for post-merge phase, while we test the new code more and more.

1. Codec Decoding Vulnerabilities

a) the thinfo_decode_uris() seems to read from p past the input buffer, as it decodes stuff from the input ;thinfo= data, for example coming from a SIP reply. If the reply is malicious or hand-crafted, we could overflow.

b) the BUILD_URI_STRING() seems to write past decoded_uri_str capacity. Needs a double-check.

2. Unsafe strchr()

In th_add_encode_param_password(), notice how we pkg_malloc(strlen(param_val)), then memcpy the same amount, without a terminating \0. Then do a strchr() over that buffer, potentially reading out-of-bounds memory. I believe the reason it hasn't crashed is because heap memory is somewhat zeroed out on startup, which is also when the modparams get parsed. Still, good to look again into and fully solve.

Overall feedback: great work, really looking forward to reviewing/running the tests now! Time to have fun 😄

@davidtrihy-genesys

Copy link
Copy Markdown
Contributor Author

@liviuchircu

Thanks so much for the detailed feedback, as for the licensing yeah I think I just want new file == new license so thanks for that, I'll probably just revert the common logic one to OpenSIPs and do that dual mode you said for the no dlg logic thanks.

Regarding your statement around this being a hybrid stateful/stateless operation mode, that actually has not changed, the original implementation was also transactional stateful and relied on the tm state machine during the offer answer to generate the correct Contact header. The main difference in the new one way hiding mode instead of blindly swapping Record-Routes from uac to uas in both request and reply what this does is use that same logic to create an intersection of the Record-Routes between the uac and uas and place our new auto Record-Route in between the intersection point

Eg. rr1,rr2 from internal and rr3 and rr4 from external when the request traverses through the external untrusted socket rr1 and rr2 are stripped from the request and encoded into the Contact and when the uas replies with rr3 and rr4 the transaction is processed and rr1, rr2, auto rr are inserted into the response alongside the existing rr3 and rr4 for a complete Record-Route set as if there was no topology hiding done.

The main changes are around whether the request is traversing through a trusted socket or not and at that point it still uses the topology_hiding to keep the transaction state for response processing so you get an asymmetric aspect to it where the user agent in the trusted environment gets the full network topology without modification except for the auto Record-Route header which is placed into the request or response depending on the originator to allow for sequential routing, because we are allowing the external network topology to be percolated through to the trusted environment we want to be able to route back through to a server that can do the topology match and the auto Route is essentially a replacement for the Contact header which contains all of the information need to do the topology reversing whereas with the auto Route it just contains the flags and the socket info so it can do the topology hiding on the sequential request and pick the correct socket, it behaves exactly the same as if it was using two way hiding, this functionality can be turned off and you can route sequential requests however you want.

There was two reasons I went with this approach first one is the auto Route is implicit in the Contact header sent to the untrusted side so it doesn't have to be encoded into the thinfo thus saving more space in the encoded param, once you get a topology match from an external side and decode everything then you know you're at the right destination and secondly it could potentially be done by generating a new Contact header but then I felt like it was straying from what the intention of this change was which is to leave the external topology unchanged, though adding support for that type of Contact processing could be pretty easy too.

Hope that answer your questions, I will fix the licensing issues and I'll look at the bugs and quirks with my colleague Claude and see if we can address them too and add the docs too, so I'll three extra commits, licensing, bugs quirks fixes and docs.

@davidtrihy-genesys

Copy link
Copy Markdown
Contributor Author

@liviuchircu

I've addressed all the bugs and quirks and show stoppers, new commits are partitioned appropriately for review. Previous comment also expands on questions you asked as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants