extract_genome_region package¶

Module contents¶

Given a CSV file of variable information defining the regions of interest along with input and output fasta file paths, write a file that contains a fasta-formatted representation of these regions.

extract_genome_region.__main__.gen_coords(records)[source]¶

Given records as a namedtuple, yield coordinate information as a namedtuple.

Parameters:	records (namedtuple) – each row info from the “regions” CSV file.
Yields:	namedtuple – the actual coordinates for slicing the fasta sequence (accounting for any buffers) for a single row in the “regions” CSV file.

Note

The coordinates in each yielded namedtuple will assume slicing indexing of standard python strings (zero-based).

extract_genome_region.__main__.gen_faidx_objs(fasta, coords, naming_strategy=None)[source]¶

Given the pyfaidx fasta obj and the coords generator, yield each sequence slice as pyfaidx.Sequence objs.

Parameters:	fasta (faidx.Fasta) – faidx fasta object. coords (generator) – of row information from “regions” CSV file. naming_strategy (str) – [csv\|seq_range\|csv_seq_range] how to name each record. If `None`, use coord.record_name as string.
Yields:	generator – of faidx sequence objects (`faidx.Sequence`) for each row in the “regions” CSV file.

extract_genome_region.__main__.gen_out_rec_strings(faidx_objs)[source]¶

Yield the fasta formated record: ready for writing out.

Parameters:	faidx_objs (generator) – of `faidx.Sequence` objects representing the described region of each row in “regions” CSV file.
Yields:	generator – of formated `str` objects representing the fasta record of the described region of each row in “regions” CSV file.

extract_genome_region.__main__.gen_records(path)[source]¶

Given the csv path, yield each record as a namedtuple.

Parameters:	path (str) – location of the “regions” CSV file.
Yields:	namedtuple – each row info from the “regions” CSV file.