Multiinfo#87
Conversation
…lds, additional overwrite/clearing flags, etc
adding gzip support for annotation files
|
I think the |
susannasiebert
left a comment
There was a problem hiding this comment.
I think some of the new methods are missing. Not sure what happened there.
| @@ -12,25 +13,24 @@ def to_array(dictionary): | |||
|
|
|||
| def parse_tsv_file(args): | |||
There was a problem hiding this comment.
| def parse_tsv_file(args): | |
| def parse_tsv_file(args, mappings): |
| vcf_writer = create_vcf_writer(args, vcf_reader) | ||
|
|
||
| values = parse_tsv_file(args) | ||
| mappings = parse_column_mappings(args.column_mappings) |
There was a problem hiding this comment.
I'm not seeing a parse_column_mappings method.
However, you could consider making this parsing part of the argparse argument parsing. Have a look at this example from pVACtools. You can handle the parsing (and any error checking) in a method like that and then set that method as the argument's type. That way args.column_mapping is already parsed into whatever format you'd like and the argument parsing fails upstream if an error is triggered, e.g. when the argument doesn't match the expected format.
|
I somehow got my branches crossed up and this is a mess. Give me a few to untangle this - sorry! |
|
Okay - I've fixed the screwy merge (had to recreate some code that I inadvertently nuked) - sorry! Please take a look at this one when you get a minute to see if it's more comprehensible. Same basic list of changes from the parent comment applies. I've also moved the validation into the argument parser, which is a nice improvement, thanks for the suggestion. |
|
and added explicit checking for VCF valid types: ['Integer', 'Float', 'Flag', 'Character', 'String'] |
overhauls vcf-info-annotator:
Multi-column annotation (--column-mappings / -m)
old: accepted a single info_field argument with separate -f/-d flags for format and description.
new: -m/--column-mappings flag that accepts a comma-separated list of source_col:info_field:type:description mapping
allows multiple TSV columns to be written to multiple INFO fields. Source and version metadata can optionally be appended as 5th and 6th colon-separated fields
TSV format now requires a header row.
does some sanity checking on fields in coerce_value
Points for discussion:
change in how overwrite works:
Reverse compatibility is broken - if we keep these changes, we'll have to update some pipelines downstream. We could shoehorn in a way to maintain the old behavior, but this felt cleaner. We should talk, though!