extract_genome_region package¶
Module contents¶
Given a CSV file of variable information defining the regions of interest along with input and output fasta file paths, write a file that contains a fasta-formatted representation of these regions.
-
extract_genome_region.__main__.
gen_coords
(records)[source]¶ Given records as a
namedtuple
, yield coordinate information as anamedtuple
.Parameters: records (namedtuple) – each row info from the “regions” CSV file. Yields: namedtuple – the actual coordinates for slicing the fasta sequence (accounting for any buffers) for a single row in the “regions” CSV file. Note
The coordinates in each yielded
namedtuple
will assume slicing indexing of standard python strings (zero-based).
-
extract_genome_region.__main__.
gen_faidx_objs
(fasta, coords, naming_strategy=None)[source]¶ Given the pyfaidx fasta obj and the coords generator, yield each sequence slice as
pyfaidx.Sequence
objs.Parameters: - fasta (faidx.Fasta) – faidx fasta object.
- coords (generator) – of row information from “regions” CSV file.
- naming_strategy (str) – [csv|seq_range|csv_seq_range] how to name each record. If
None
, use coord.record_name as string.
Yields: generator – of faidx sequence objects (
faidx.Sequence
) for each row in the “regions” CSV file.
-
extract_genome_region.__main__.
gen_out_rec_strings
(faidx_objs)[source]¶ Yield the fasta formated record: ready for writing out.
Parameters: faidx_objs (generator) – of faidx.Sequence
objects representing the described region of each row in “regions” CSV file.Yields: generator – of formated str
objects representing the fasta record of the described region of each row in “regions” CSV file.