{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Percolator q-values are computed as described in the manuscript" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we inspect the output of Percolator on an arbitrary file.\n", "\n", "The output was obtained in the following way.\n", "\n", "1. An X!Tandem output file was converted to tsv using `tandem2pin`:\n", "\n", " `tandem2pin -o test_in.tsv -P DECOY_ output.t.xml`\n", "\n", "2. Percolator was run using the following command:\n", "\n", " `percolator -B dec.pin test_in.tsv > targ.pin`\n", "\n", "So, decoy peptides are listed in `dec.pin` and target peptides are listed in `targ.pin`.\n", "\n", "`pin` files are not valid TSV files because the last column (`proteinIds`) contains a variable amount of proteins separated with a tab, so, in order to be able to parse these files, they were preprocessed to remove everything to the right of the first protein ID:\n", "\n", " cut -f 1-6 dec.pin > dec_cut.pin\n", " cut -f 1-6 targ.pin > targ_cut.pin\n", " \n", "The scores and q-values calculated by Percolator are intact. This is a recent version of Percolator, so it uses \"target-decoy competition\" by default to calculate q-values, according to the help message.\n", "\n", "Below, we analyze the resulting files." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "# set up the necessary libraries\n", "import pandas as pd\n", "%pylab --no-import-all inline\n", "import seaborn\n", "from pyteomics import pylab_aux as pa, auxiliary as aux" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# read the peptide tables\n", "targ = pd.read_table('/tmp/targ_cut.pin')\n", "dec = pd.read_table('/tmp/dec_cut.pin')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 4916 peptides in the table.\n" ] }, { "data": { "text/html": [ "
\n", " | score | \n", "q-value | \n", "posterior_error_prob | \n", "peptide | \n", "proteinIds | \n", "decoy | \n", "
---|---|---|---|---|---|---|
0 | \n", "6.33811 | \n", "0 | \n", "1.598270e-13 | \n", "K.SCVEEPEPEPEAAEGDGDK.K | \n", "sp|P51858|HDGF_HUMAN-Hepatoma-derived-growth-f... | \n", "False | \n", "
1 | \n", "5.40133 | \n", "0 | \n", "8.943880e-12 | \n", "R.QAHLCVLASNCDEPMYVK.L | \n", "sp|P25398|RS12_HUMAN-40S-ribosomal-protein-S12... | \n", "False | \n", "
2 | \n", "5.38035 | \n", "0 | \n", "9.787290e-12 | \n", "R.DYLDFLDDEEDQGIYQSK.V | \n", "sp|P25205|MCM3_HUMAN-DNA-replication-licensing... | \n", "False | \n", "
3 | \n", "5.34931 | \n", "0 | \n", "1.118340e-11 | \n", "K.QLQQAQAAGAEQEVEK.F | \n", "sp|P39748|FEN1_HUMAN-Flap-endonuclease-1-OS=Ho... | \n", "False | \n", "
4 | \n", "5.23310 | \n", "0 | \n", "1.842490e-11 | \n", "K.AAEAAAAPAESAAPAAGEEPSK.E | \n", "sp|P80723|BASP1_HUMAN-Brain-acid-soluble-prote... | \n", "False | \n", "