Running LIRICAL with a Phenopacket file (HPO and VCF data)¶
Preparing Phenopacket-formated data¶
The following example shows a phenopacket representing an individual with Pfeiffer syndrome. The file is adapted from the phenopacket on Running LIRICAL with a Phenopacket file. We have removed several of the phenotypic features, and added an HtsFiles element that contains the path of the VCF file
(in our exmaple, the path is /example/path/Pfeiffer.vcf, but obviously you need to adjust
the path to a file located on your system).
{
"subject": {
"id": "example-1"
},
"phenotypicFeatures": [{
"type": {
"id": "HP:0000244",
"label": "Turribrachycephaly"
},
"classOfOnset": {
"id": "HP:0003577",
"label": "Congenital onset"
}
}, {
"type": {
"id": "HP:0000238",
"label": "Hydrocephalus"
},
"classOfOnset": {
"id": "HP:0003577",
"label": "Congenital onset"
}
}],
"htsFiles":
[{
"uri": "file://example/path/example.vcf",
"description": "test",
"htsFormat": "VCF",
"genomeAssembly": "GRCh19",
"individualToSampleIdentifiers": {
"patient1": "NA12345"
}
}],
"metaData": {
"createdBy": "Peter R.",
"resources": [{
"id": "hp",
"name": "human phenotype ontology",
"namespacePrefix": "HP",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2018-03-08",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
}]
}
}
Running LIRICAL with clinical and genomic data¶
LIRICAL will perform combined phenotye and variant analysis if the Phenopacket contains an htsFiles element. In this
case, you need to indicate the path to the VCF file on your system as shown above (/example/path/Pfeiffer.vcf).
The -p option is used to indicate the Phenopacket, and the -e option is used to indicate the location of
the Exomiser database files. The minimal command (using all default settings) is as follows.
$ java -jar LIRICAL.java phenopacket -p /path/to/example.json -e /path/to/exomiser-data/
LIRICAL Options for clinical/genomic analysis¶
All of the options for the phenotype-only phenopacket analysis (Running LIRICAL with a Phenopacket file) can be used for the clinical/genomic analysis. Additionally, the following options are available.
-b, --background
LIRICAL uses a background frequency file that records the freqeuncy of predicted pathogenic variants
in protein-coding genes (as estimated from gnomAD data). By default, LIRICAL will use pre-fabricated
files for this (that are included in the src/main/resources/background directory). This is recommended
for most users. If you create your own background file, then you can use it with the -b option, that should
then indicate the path to a non-default background frequency file.
-e, --exomiser
Path to the Exomiser data directory (required for VCF-based analysis).
--transcriptdb
LIRICAL can use transcript data from UCSC, Ensembl, or RefSeq. The default is
RefSeq, but transcript definitions from
UCSC and Ensembl can also be used
(e.g., --transcriptdb USCS or --transcriptdb ensembl).
--global
By default, LIRICAL’s default mode, which only ranks candidate genes for which at least one pathogenic allele is
present in the VCF file. LIRICAL can also be run in a --global mode in which diseases are ranked irrespective of
whether a disease gene is known for a disease or whether the gene is found to have a pathogenic allele or not.