extract_genome_region package

Module contents

Given a CSV file of variable information defining the regions of interest along with input and output fasta file paths, write a file that contains a fasta-formatted representation of these regions.

extract_genome_region.__main__.gen_coords(records)[source]

Given records as a namedtuple, yield coordinate information as a namedtuple.

Parameters:records (namedtuple) – each row info from the “regions” CSV file.
Yields:namedtuple – the actual coordinates for slicing the fasta sequence (accounting for any buffers) for a single row in the “regions” CSV file.

Note

The coordinates in each yielded namedtuple will assume slicing indexing of standard python strings (zero-based).

extract_genome_region.__main__.gen_faidx_objs(fasta, coords, naming_strategy=None)[source]

Given the pyfaidx fasta obj and the coords generator, yield each sequence slice as pyfaidx.Sequence objs.

Parameters:
  • fasta (faidx.Fasta) – faidx fasta object.
  • coords (generator) – of row information from “regions” CSV file.
  • naming_strategy (str) – [csv|seq_range|csv_seq_range] how to name each record. If None, use coord.record_name as string.
Yields:

generator – of faidx sequence objects (faidx.Sequence) for each row in the “regions” CSV file.

extract_genome_region.__main__.gen_out_rec_strings(faidx_objs)[source]

Yield the fasta formated record: ready for writing out.

Parameters:faidx_objs (generator) – of faidx.Sequence objects representing the described region of each row in “regions” CSV file.
Yields:generator – of formated str objects representing the fasta record of the described region of each row in “regions” CSV file.
extract_genome_region.__main__.gen_records(path)[source]

Given the csv path, yield each record as a namedtuple.

Parameters:path (str) – location of the “regions” CSV file.
Yields:namedtuple – each row info from the “regions” CSV file.