Extract reads within given region(s), and their mates


This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: samviewwithmate [options] Files
      Compression Level. 0: no compression. 9: max compression;
      Default: 5
    -h, --help
      print help and exit
      What kind of help. One of [usage,markdown,xml].
    -o, --output
      Output file. Optional . Default: stdout
  * -b, --bed, -r, --region
      A source of intervals. The following suffixes are recognized: vcf, 
      vcf.gz bed, bed.gz, gtf, gff, gff.gz, gtf.gz.Otherwise it could be an 
      empty string (no interval) or a list of plain interval separated by '[ 
      Default: (unspecified)
      Sam output format.
      Default: SAM
      Possible Values: [BAM, SAM, CRAM]
    -st, --streaming
      Force Streaming mode even if bam is indexed. Warning: Streaming mode 
      doesn't garantee that all mates will be fetched because a read only 
      contains the start position of the mate of which may be out of the 
      user's intervals, unless the MC (mate cigar) attribute is defined.
      Default: false
    -u, --unmapped
      Also search for the unmapped mates. Not available in streaming mode.
      Default: false
      print version and exit


The project is licensed under the MIT license.


Should you cite samviewwithmate ?

The current reference is:

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare.

How it works

Two modes:

  • The streaming mode is set if the input is stdin or if the bam file is NOT indexed. The input is scanned and any read (or mate) that overlap a region is written.

  • The other mode use the bam index. First we scan the regions, we collect the other regions and the names of the reads (requires memory), the bam is opened a second time and we collect the reads.


$ java -jar dist/samviewwithmate.jar -r "9:137230721-137230796"  ./src/test/resources/HG02260.transloc.chr9.14.bam | cut -f 1-9 | tail
ERR251239.10989793  83  9   137230747   60  30S70M  =   137230326   -490
ERR251239.3385449   147 9   137230754   60  1S99M   =   137230352   -500
ERR251240.17111373  99  9   137230764   60  100M    =   137231150   475
ERR251240.46859433  147 9   137230777   60  65S35M  =   137230342   -469
ERR251240.74563730  147 9   137230787   60  1S99M   =   137230407   -478
ERR251240.1291708   83  9   137230789   60  100M    =   137230411   -477
ERR251240.11887757  97  9   137230795   37  100M    14  79839451    0
ERR251239.34016218  81  14  79839349    37  100M    9   137230679   0
ERR251240.10196873  81  14  79839368    37  100M    9   137230721   0
ERR251240.11887757  145 14  79839451    37  100M    9   137230795   0

Cited in

  • Garsed, D.W., Pandey, A., Fereday, S. et al. The genomic and immune landscape of long-term survivors of high-grade serous ovarian cancer. Nat Genet (2022).