- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
Update: 9th May 2015. The most important news is that Liberia has been declared ebola free. A more minor positive event is that, subsequent to two open invited referees reports (much appreciated), our paper "Finding small molecules for the ‘next Ebola" has now been promoted to PMID 25949804 and PMC4406187).
This unique publishing model (see the F1000research motus operandi) has a lot going for it including being author-freindly and eventual Gold OA (sans APCs). We will also be able to make a final version (including a couple of convoluted sentences to simplify!)
Just to be clear (particularly for active EBOV researchers), we are unfortunately not in a position to update or identify new sources (since we all have divergent day-jobs) but a communication was sent to the Wellcome Trust for consideration in the context of their EBOV initiatives. It also has to be said that, despite a wide re-tweet reach and over 2000 views for this post, the envisaged crowd-sourcing experiment failed (i.e. no one actually sent manuscript, patent or data set links). It was thus particularly pleasing the get the concepts "out there" via the F1000 paper.
Update: 8th Feb 2015
7th Dec 2014 some new small-molecule trials picked up by Compound Interest for the repurposing of established antivirals.
BCX4430 = immucillin A = CID 10445549 = AMFDITJFBUXZQN-KUBHLMPHSA-N (note that BCX4430 in Wikipedia and PubChem is linked to the chloride CID 69211190). There are 140 similar stuctures for SAR exploration (but filtering out mixtures drops this to 113). The Biochrist patent WO2012051570 has some antiviral data but it looks like a lead-only filing. There are some close analogue filings as parasite nucleoside and deoxynucleoside hydrolases inhibitors.
favipiravir = CID 492405 = ZCGNOVWYSGBHAU-UHFFFAOYSA-N. There are 14 similar but some are just radiolabled derivatives. It looks like there is also some SAR in US20040235761 from Toyama (but its not an easy document to unravel)
brincidofovir = cidofovir hexadecyloxypropyl ester= CID 483477 = WXJFKKQWPMNTIM-VWLOTQADSA-N. There are 107 similar structures in PubChem (filtered for mixtures) and some data against poxvirus in US20110263536
Note also this VS/docking initiative, just announced from Scripps/IBM, so they could pick up some of these structure sets.
28 Nov. I was pleased that AH from Linguamatics responded a couple of weeks ago to the tweet-out for Ebola patent mining. Normally I would split follow-ups into new blog titles but since this link has hits, RTs and cross-pointers I will add the new stuff on top, but this will become a monster post.
As have made clear in previous posts on antimalarial data trawling, the analogous efforts for Ebola not only hard work but always confounded by imperfect retrieval specificity, especially since the corpus of true-positive documents is much smaller. Note also that this mining is neither part of the day job for Linguamatics nor myself (whereas this stuff is). Ipso facto, neither of us can do it alone and, given the extraordinary circumstances, the exercise becomes massively more efficient if authors and inventors would (with the blessings of their parent institutions) come out of the woodwork and declare "Yes, this stuff we did is what you are looking for" rather than leaving me, or any anyone else, to not find it.
I have outlined hypothetical responses that would really make the difference. For papers it would be along these lines.
In response to your initiative I am sending/putting on figshare/posting on our public server a full set of results from our work, that we believe are useful to others working anti-Ebola chemistry. As you point out, this form of surfacing is easier to globally share, join up, and data mine than our publication(s) (PMIDs xxx) and what might have flowed into the databases without our active engagement. In accordance with your recommendations we include the following in our data sheet a) code numbers used in our paper(s) b) belt and braces chemical specifications (IUPAC, SMILES, InChI strings and keys c) PubChem CIDs or a flag indicating these structures are novel, d) IC50s and or other assay results that can be immediately used for SAR modelling e) any other notes we thing are useful including an outline of the assay with references as well as PubChem CIDs related to similar bioactive chemotypes we have found (e.g. Tanimoto > 0.8 but not necessarily antiviral). Please note that the synthesis of the compounds and assay details are fully described in our M&M but (if these papers were not OA) we have pasted them into another sheet. Note for compounds marked * we have at least multi-mg amounts that we are prepared to distribute to other researches/donate to the Ebola Box. Also according to your suggestion, we will send the paper to ChEMBL and/or directly submit a data set to ChEMBL and/or PubChem Bioassay in due course. We (have/have not) filed a patent/our patent no is WOxxx but all data from this is in our sheet. Note we will also add a PubMed Commons coment pointing to our new data deposition (and, if patented, the no)
The analogous version for patent inventors and assignees would go something like this:
In response to your initiative, we have consulted with our assigning institution for the patent WOxxxx that I am an inventor on and that we believe includes useful anti-Ebola chemistry. We are pleased that they have agreed to surfacing the data in the forms you have recommended for papers (described above) indexed with example numbers from our patent document. Note also rather than use the binned values in the patent, we have added the discrete IC50s, normalised to nM and/or we have extended the result tables to cover those analogues we had exemplified but had not linked to data in the patent. Having inspected SureChEMBL output for our patent document we confirm most of the structures but have corrected a few that had source text errors or difficult image extractions. Note our full synthetic details are in the descriptions. Given the circumstances we accept the prinical of experimental use exemption and consequently researchers and/or the Ebola Box administrators can contact us for supplies of compounds we have available. However, our IP rights regarding claims in the patents remain in force(or)since we are no longer pursuing these chemotypes commercially, we will consider donating this patent to WIPO Re:Search. Also, under the circumstances, we have decided to write up a succinct paper (i.e. removing the turgid patentese) including just the most potent members of the series, with a clear structure <> IC50 table, submitted to one of the journals extracted by ChEMBL, cite our patent with the extended SAR table and include the url of our fighshare data sheet.
I won't overview the Liguamatics results here, since its more important to jump on the possibly useful result below. However, it has to be said that their search ran headlong into the constitutive problems of bio-patent mining. In no particular order these are: a) patent family and kind code redundancy, b) keyword overloading (e.g. Ebola or filovirus mentions but actual result sets on unrelated viruses) c) no-data filings, d) binned values and e) unconvincing potencies (i.e. above 1 uM). Note also, while they are scientific true-positives as documents I have not collated the inhibitors of viral processing, since there are 2030 structures in ChEMBL mapped to CATB alone, and probably at least as many again in many disease patents (if anyone is interested in patents directed towards host-protease viral processing enzymes, by all means ping me)
Notwithstanding, the trawl did pick up some nominal positives, such as US20140275037 (the most recent publication of large patent family with a 2004 priority date) from SIGA Technologies who have, as mentioned in the patent, have received NIH biodefence grants .
However, the problem here is the phylogenetic distance between what they are testing here (i.e. Tacaribe from the Arenaviridae) and if this activity would actually work for the Filoviridae (ebola and Marburg). One of their leads shown in the abstact above is ST-294 = CID 4557048 = ZGNSRLRWMPNAMJ-UHFFFAOYSA-N. This had a 100 nM EC50 for plaque reduction in a Tacribe virus assay. You can get a useful 3D cluster in PubChem of similar structures from the same patent.
It was also possible to comnect across to a series of papers (that look like they were derived from the patent) namely "pH-induced activation of arenavirus membrane fusion is antagonized by small-molecule inhibitors" in 2008 (PMID 18768973 with ST-161, ST-193, ST-294, ST-366). This proposes a mechanism of action for ST294 of interfereing with arenavirus envelope glycoprotein mediated membrane fusion by targeting the interaction of a fusion subunit with the stable signal peptide. This was followed by the 2011 paper "Lead optimization of an acylhydrazone scaffold possessing antiviral activity against Lassa virus" (PMID 24064500 for ST-161).
14 Sep. The Ebola outbreak needs no introduction. The scientific community is now engaging in many new ways but I will exemplify one that does not seem to have been explored so far. This is the open compilation and dissemination of medicinal chemistry data specifically linked to chemical structures with some level of antiviral activity. To be clear; a) this may well be happening already and b) we all know such research activity is years away from translation to epidemic control whatever development-cycle telescoping might be explored in this crucial circumstance. Nonetheless, there are analogies to the situation described for Tuberculosis and Malaria where the patchiness of explicit chemistry connectivity between papers, patents and database entries impedes progress. Since I have tried to be succinct already I have pasted in two tweets below as a summary.
I'm certain not to be the only one to find that searching for Ebola anti-viral medicinal chemistry has specificity challenges, even just associated with synonyms for the virus and its phylogenetic stablemates. Trying the obvious gives thin returns, for example the Europe PMC queries below incorporating the useful ChEMBL select.
This unique publishing model (see the F1000research motus operandi) has a lot going for it including being author-freindly and eventual Gold OA (sans APCs). We will also be able to make a final version (including a couple of convoluted sentences to simplify!)
Just to be clear (particularly for active EBOV researchers), we are unfortunately not in a position to update or identify new sources (since we all have divergent day-jobs) but a communication was sent to the Wellcome Trust for consideration in the context of their EBOV initiatives. It also has to be said that, despite a wide re-tweet reach and over 2000 views for this post, the envisaged crowd-sourcing experiment failed (i.e. no one actually sent manuscript, patent or data set links). It was thus particularly pleasing the get the concepts "out there" via the F1000 paper.
*************************************************
Update: 8th Feb 2015
- Just published by Ekins et al. "A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus" (PMID 25653841). This study builds on previous publications identifying four approved compounds active against different strains of EBOV.
- Nice surprise to see "Small molecule inhibitors of ebola virus infection" (PMID 25532798) published by an ex-AZ colleague of mine!
- Set of structures reported as active against the virus are now hosted at CCD Vault
7th Dec 2014 some new small-molecule trials picked up by Compound Interest for the repurposing of established antivirals.
BCX4430 = immucillin A = CID 10445549 = AMFDITJFBUXZQN-KUBHLMPHSA-N (note that BCX4430 in Wikipedia and PubChem is linked to the chloride CID 69211190). There are 140 similar stuctures for SAR exploration (but filtering out mixtures drops this to 113). The Biochrist patent WO2012051570 has some antiviral data but it looks like a lead-only filing. There are some close analogue filings as parasite nucleoside and deoxynucleoside hydrolases inhibitors.
favipiravir = CID 492405 = ZCGNOVWYSGBHAU-UHFFFAOYSA-N. There are 14 similar but some are just radiolabled derivatives. It looks like there is also some SAR in US20040235761 from Toyama (but its not an easy document to unravel)
brincidofovir = cidofovir hexadecyloxypropyl ester= CID 483477 = WXJFKKQWPMNTIM-VWLOTQADSA-N. There are 107 similar structures in PubChem (filtered for mixtures) and some data against poxvirus in US20110263536
Note also this VS/docking initiative, just announced from Scripps/IBM, so they could pick up some of these structure sets.
28 Nov. I was pleased that AH from Linguamatics responded a couple of weeks ago to the tweet-out for Ebola patent mining. Normally I would split follow-ups into new blog titles but since this link has hits, RTs and cross-pointers I will add the new stuff on top, but this will become a monster post.
As have made clear in previous posts on antimalarial data trawling, the analogous efforts for Ebola not only hard work but always confounded by imperfect retrieval specificity, especially since the corpus of true-positive documents is much smaller. Note also that this mining is neither part of the day job for Linguamatics nor myself (whereas this stuff is). Ipso facto, neither of us can do it alone and, given the extraordinary circumstances, the exercise becomes massively more efficient if authors and inventors would (with the blessings of their parent institutions) come out of the woodwork and declare "Yes, this stuff we did is what you are looking for" rather than leaving me, or any anyone else, to not find it.
I have outlined hypothetical responses that would really make the difference. For papers it would be along these lines.
In response to your initiative I am sending/putting on figshare/posting on our public server a full set of results from our work, that we believe are useful to others working anti-Ebola chemistry. As you point out, this form of surfacing is easier to globally share, join up, and data mine than our publication(s) (PMIDs xxx) and what might have flowed into the databases without our active engagement. In accordance with your recommendations we include the following in our data sheet a) code numbers used in our paper(s) b) belt and braces chemical specifications (IUPAC, SMILES, InChI strings and keys c) PubChem CIDs or a flag indicating these structures are novel, d) IC50s and or other assay results that can be immediately used for SAR modelling e) any other notes we thing are useful including an outline of the assay with references as well as PubChem CIDs related to similar bioactive chemotypes we have found (e.g. Tanimoto > 0.8 but not necessarily antiviral). Please note that the synthesis of the compounds and assay details are fully described in our M&M but (if these papers were not OA) we have pasted them into another sheet. Note for compounds marked * we have at least multi-mg amounts that we are prepared to distribute to other researches/donate to the Ebola Box. Also according to your suggestion, we will send the paper to ChEMBL and/or directly submit a data set to ChEMBL and/or PubChem Bioassay in due course. We (have/have not) filed a patent/our patent no is WOxxx but all data from this is in our sheet. Note we will also add a PubMed Commons coment pointing to our new data deposition (and, if patented, the no)
The analogous version for patent inventors and assignees would go something like this:
In response to your initiative, we have consulted with our assigning institution for the patent WOxxxx that I am an inventor on and that we believe includes useful anti-Ebola chemistry. We are pleased that they have agreed to surfacing the data in the forms you have recommended for papers (described above) indexed with example numbers from our patent document. Note also rather than use the binned values in the patent, we have added the discrete IC50s, normalised to nM and/or we have extended the result tables to cover those analogues we had exemplified but had not linked to data in the patent. Having inspected SureChEMBL output for our patent document we confirm most of the structures but have corrected a few that had source text errors or difficult image extractions. Note our full synthetic details are in the descriptions. Given the circumstances we accept the prinical of experimental use exemption and consequently researchers and/or the Ebola Box administrators can contact us for supplies of compounds we have available. However, our IP rights regarding claims in the patents remain in force(or)since we are no longer pursuing these chemotypes commercially, we will consider donating this patent to WIPO Re:Search. Also, under the circumstances, we have decided to write up a succinct paper (i.e. removing the turgid patentese) including just the most potent members of the series, with a clear structure <> IC50 table, submitted to one of the journals extracted by ChEMBL, cite our patent with the extended SAR table and include the url of our fighshare data sheet.
I won't overview the Liguamatics results here, since its more important to jump on the possibly useful result below. However, it has to be said that their search ran headlong into the constitutive problems of bio-patent mining. In no particular order these are: a) patent family and kind code redundancy, b) keyword overloading (e.g. Ebola or filovirus mentions but actual result sets on unrelated viruses) c) no-data filings, d) binned values and e) unconvincing potencies (i.e. above 1 uM). Note also, while they are scientific true-positives as documents I have not collated the inhibitors of viral processing, since there are 2030 structures in ChEMBL mapped to CATB alone, and probably at least as many again in many disease patents (if anyone is interested in patents directed towards host-protease viral processing enzymes, by all means ping me)
Notwithstanding, the trawl did pick up some nominal positives, such as US20140275037 (the most recent publication of large patent family with a 2004 priority date) from SIGA Technologies who have, as mentioned in the patent, have received NIH biodefence grants .
However, the problem here is the phylogenetic distance between what they are testing here (i.e. Tacaribe from the Arenaviridae) and if this activity would actually work for the Filoviridae (ebola and Marburg). One of their leads shown in the abstact above is ST-294 = CID 4557048 = ZGNSRLRWMPNAMJ-UHFFFAOYSA-N. This had a 100 nM EC50 for plaque reduction in a Tacribe virus assay. You can get a useful 3D cluster in PubChem of similar structures from the same patent.
It was also possible to comnect across to a series of papers (that look like they were derived from the patent) namely "pH-induced activation of arenavirus membrane fusion is antagonized by small-molecule inhibitors" in 2008 (PMID 18768973 with ST-161, ST-193, ST-294, ST-366). This proposes a mechanism of action for ST294 of interfereing with arenavirus envelope glycoprotein mediated membrane fusion by targeting the interaction of a fusion subunit with the stable signal peptide. This was followed by the 2011 paper "Lead optimization of an acylhydrazone scaffold possessing antiviral activity against Lassa virus" (PMID 24064500 for ST-161).
***********************************************
20 Sep. The In the Pipeline Ebola post makes the obvious point that immunology-based approaches have much more likelihood of early success. We all hope this of course but the // tack below still has some utility in the longer term. Another recent blog post lists the Ebola pipeline but without molecular resolution of the agents. Note also, on Friday, the NCBI launched their dedicated Ebolavirus and MERS sequence resource pages.
*****************************************************
14 Sep. The Ebola outbreak needs no introduction. The scientific community is now engaging in many new ways but I will exemplify one that does not seem to have been explored so far. This is the open compilation and dissemination of medicinal chemistry data specifically linked to chemical structures with some level of antiviral activity. To be clear; a) this may well be happening already and b) we all know such research activity is years away from translation to epidemic control whatever development-cycle telescoping might be explored in this crucial circumstance. Nonetheless, there are analogies to the situation described for Tuberculosis and Malaria where the patchiness of explicit chemistry connectivity between papers, patents and database entries impedes progress. Since I have tried to be succinct already I have pasted in two tweets below as a summary.
I'm certain not to be the only one to find that searching for Ebola anti-viral medicinal chemistry has specificity challenges, even just associated with synonyms for the virus and its phylogenetic stablemates. Trying the obvious gives thin returns, for example the Europe PMC queries below incorporating the useful ChEMBL select.
Here only two entries fit the criteria. Note the same query for malaria brings back 572 (papers with chemistry SAR extracted into ChEMBL) and 870 for tuberculosis. A similar search on PubMed is also thin on returns (but I'm sure this could be optimized further).
The retrieval harvest for a simple patent search was no better (see below).
PubChem Bioassay looked initially more fruitful with 57 returns (below).
However, the assay descriptions are multiplexed out of the small set of ChEMBL-extracted papers. Note the entire collection of MLSCN assays retrieves nothing with that keyword.
At the end of the day, there may simply not be much out there but I am certain there are more high-quality documents that are opaque to the simplistic retrieval searches tested above. However, in this case I am calling for authors and assignees (and/or inventors of course) to declare in some openly accessible way the document identifiers for their own relevant papers and/or patents and/or open ELNs (i.e. they know where they are, even if they are difficult to find by anyone else). Note that while I have access to selected journals via the University of Edinburgh it would be useful at least for these Ebola med chem papers, full text could be surfaced outside the publisher's pay-walls.
As the first exploratory test I followed up on the fifth PubMed hit from the list above "Pyridinyl imidazole inhibitors of p38 MAP kinase impair viral entry and reduce cytokine induction by Zaire ebolavirus in human dendritic cells" (PMID 24815087). It certainly helped that SB202190 is PubChem positive as CID 5353940 NJNKPVPFGLGHPA-UHFFFAOYSA-N. This is well-linked as a kinase inhibitor but not directly to these recent Ebola results. MeSH might eventually make the connection but it would help to arrange the pointers asap for all similar papers.
As a second one I selected "Inhibition of Ebola Virus Infection: Identification of Niemann-Pick C1 as the Target by Optimization of a Chemical Probe" (PMID 23526644). This was actually a well-connected OA publication (except for the confusing numbering system for examples in the paper) and it had also been picked by ChEMBL including CID 71254944 KSISWIVDFITFKQ-UHFFFAOYSA-N. This had a 20 nM IC50 in a viral assay AID 725664 along with 58 other results. I was pleased to find that the new SureChEMBL patent links connected these structures through to WO2013022550 (but I couldn't find any activity data in the filing).
For the direct patent example I took a look at the most recent hit above as WO2014060588 (but the chemistry text was better in EP2722047). There is a small result table compared to the chemical exemplifications but has quantitative data (see below).
It turns out that RN-1-177 with the best (but modest) IC50 was CID 71625653 AVBPQEDCJWOBPP-UHFFFAOYSA-N. This happens to have a CHEMBL2323004 link but the paper does not specify anti-Ebola per se.
I could do more of this but it becomes a lot more effective as a communal undertaking. We can see if anyone declares relevant documents for chemistry extraction and/or would be interested in collaborating on this end of the workflow. Even better if an "Ebola Box" analogous to the MMV Malaria Box and upcoming MMV Pathogen Box, could be instigated somewhere out there.
Comments
Post a Comment