Transaction only topology_hiding enhancements#3908
Conversation
No functionality changes just refactor for ease of diff viewing
b1ae973 to
0d8947c
Compare
There was a problem hiding this comment.
Hi @davidtrihy-genesys ,
I managed to complete the review, and I'll split the action points into: "Critical" and "Bugs / Quirks", the latter of which can be left to solve/assess during the remainder of the beta phase. All feedback below refers to the "non-dialog" TH, of course, since the dialog TH is left unchanged.
So if I understand correctly, before we had ;thinfo= working statelessly, while now it works in a hybrid way. Because it both stores some transactional-local data via callback param, but also stores data in the ;thinfo= just as before, only in a compact way. There is also a 3rd storage (global buffers) for the new $TH_decoded_routes, $TH_decoded_contact variables, but that seems OK.
Here, just to make sure I understood how it works: you need to store data in the transaction in order to restore Via/RR for the replies. But you also encode, using the new codec utilities, some minimal amount of data in the ;thinfo=, which is used for future sequential requests and cross-transaction routing, such as the SIP route set information. Is this accurate? If so, perhaps these goals or invariants of the transactional TH support should be clearly stated (code or docs, both can work), in order to help maintainers know what to expect.
Critical Issues
There is only one "stopper" kind of issue before we can merge:
Licensing / Ownership
The new th_common_logic.[ch] and th_no_dlg_logic.[ch] files contain a significant amount of code moved from existing project sources, yet the current header appears to suggest sole Genesys ownership of code that predates this contribution. Example functions which are near-exact copies of older code:
topo_parse_passed_params()topo_parse_passed_ct_params()topo_parse_passed_hdr_ct_params()topo_delete_record_routes()topo_delete_vias()delete_existing_contact()restore_vias_from_req()topo_ct_param_len()topo_ct_param_copy()
... and functions which still clearly retain old implementation chunks or logic:
build_encoded_contact_suffix_legacy()th_no_dlg_encode_contact()decode_info_buffer_legacy()th_no_dlg_seq_handling()
To fix this, I would suggest simply appending the original OpenSIPS Foundation copyright to the existing Genesys copyright, in all of these four files:
Copyright (C) 2026 Genesys Cloud Services, Inc.
Copyright (C) 2015 OpenSIPS Foundation
Bugs / Quirks
All of these are results of static analysis, that's why I'm lowering the prio on them. All of them can be left for post-merge phase, while we test the new code more and more.
1. Codec Decoding Vulnerabilities
a) the thinfo_decode_uris() seems to read from p past the input buffer, as it decodes stuff from the input ;thinfo= data, for example coming from a SIP reply. If the reply is malicious or hand-crafted, we could overflow.
b) the BUILD_URI_STRING() seems to write past decoded_uri_str capacity. Needs a double-check.
2. Unsafe strchr()
In th_add_encode_param_password(), notice how we pkg_malloc(strlen(param_val)), then memcpy the same amount, without a terminating \0. Then do a strchr() over that buffer, potentially reading out-of-bounds memory. I believe the reason it hasn't crashed is because heap memory is somewhat zeroed out on startup, which is also when the modparams get parsed. Still, good to look again into and fully solve.
Overall feedback: great work, really looking forward to reviewing/running the tests now! Time to have fun 😄
|
Thanks so much for the detailed feedback, as for the licensing yeah I think I just want new file == new license so thanks for that, I'll probably just revert the common logic one to OpenSIPs and do that dual mode you said for the no dlg logic thanks. Regarding your statement around this being a hybrid stateful/stateless operation mode, that actually has not changed, the original implementation was also transactional stateful and relied on the tm state machine during the offer answer to generate the correct Contact header. The main difference in the new one way hiding mode instead of blindly swapping Record-Routes from uac to uas in both request and reply what this does is use that same logic to create an intersection of the Record-Routes between the uac and uas and place our new auto Record-Route in between the intersection point Eg. rr1,rr2 from internal and rr3 and rr4 from external when the request traverses through the external untrusted socket rr1 and rr2 are stripped from the request and encoded into the Contact and when the uas replies with rr3 and rr4 the transaction is processed and rr1, rr2, auto rr are inserted into the response alongside the existing rr3 and rr4 for a complete Record-Route set as if there was no topology hiding done. The main changes are around whether the request is traversing through a trusted socket or not and at that point it still uses the topology_hiding to keep the transaction state for response processing so you get an asymmetric aspect to it where the user agent in the trusted environment gets the full network topology without modification except for the auto Record-Route header which is placed into the request or response depending on the originator to allow for sequential routing, because we are allowing the external network topology to be percolated through to the trusted environment we want to be able to route back through to a server that can do the topology match and the auto Route is essentially a replacement for the Contact header which contains all of the information need to do the topology reversing whereas with the auto Route it just contains the flags and the socket info so it can do the topology hiding on the sequential request and pick the correct socket, it behaves exactly the same as if it was using two way hiding, this functionality can be turned off and you can route sequential requests however you want. There was two reasons I went with this approach first one is the auto Route is implicit in the Contact header sent to the untrusted side so it doesn't have to be encoded into the thinfo thus saving more space in the encoded param, once you get a topology match from an external side and decode everything then you know you're at the right destination and secondly it could potentially be done by generating a new Contact header but then I felt like it was straying from what the intention of this change was which is to leave the external topology unchanged, though adding support for that type of Contact processing could be pretty easy too. Hope that answer your questions, I will fix the licensing issues and I'll look at the bugs and quirks with my colleague Claude and see if we can address them too and add the docs too, so I'll three extra commits, licensing, bugs quirks fixes and docs. |
|
I've addressed all the bugs and quirks and show stoppers, new commits are partitioned appropriately for review. Previous comment also expands on questions you asked as well. |
Summary
Additional features to the topology_hiding module in transaction only mode.
Details
A strong effort was made to not alter dialog topology_hiding functionality and only target transaction functionality but there was a refactor done to separate the logic as the transaction aware logic grew in size due to the changes so a lot of the common code was refactored into separate source files to make development easier and additional functionality that could be common was added to those same files
75ff8a6bdd87466f44a5c8dbeac417cf82493ae4the idea is to show that there is no functionality change that would affect the dialog aware module44b86215d8bc06137e4e7a44ad549f645576d21bac6d588b96741d6aaf6597528b56d2abc6462889bbba1f626f1478a040e947ef246da9bd66656ba3The major logic rework is part of this commit here
11783c1c4faa561541e7b0e26ee4bffa9adb88a0this is where the bulk of the work was done to support the new model of topology_hiding there are other smaller commits too but these ones give the bulk of changes.Once the modparam name and scheme has been agreed upon the documentation can be added in another commit before merged the PR
Socket tag matching
Purpose of these tags is to define the sockets that will be used for topology hiding, internal is when the request comes in and the socket is switched, it will allow the topology_hiding to be engaged but also not strip topology information for requests and only for responses, the inverse applies for requests initiated from internal.
External is purely for failover scenarios, in this case if you don't have a DNS advertised for your internal socket and that instance fails, you can do some script initiated failover to another instance that may have the same IP address but has the same tag and it should be able to decode the socket information using the tags to match the information as opposed to IP addresses
If this tag is not set then failover will not work in this scenario, this only affects the auto_route
auto route modparam
This modparam is purely to generate an auto route on the trusted internal socket when one way hiding is engaged due to no Contact header update, the script writer can turn it off and choose to do sequential requests in the script
param encoding and password
Allows for setting the param name, password and encoding type and choosing which one to use when doing topology_hiding will allow two definitions and supports password rotation
Solution
The solution has some limitations that could be addressed in future with modparams but I don't think it's entirely necessary for now
These listed limitations only really apply to the new encoding type and the target usage of this algorithm is in the one way hiding mode of operation which the script writer should be aware of the internal SIP message Routes and Contact headers so they should be aware of the limitations, in any case I plan to address them in a follow up PR
Compatibility
Retains full backwards compatibility with the current mode of operation.
sipssert tests PR - OpenSIPS/sipssert-opensips-tests#32