pKa: Difference between revisions

From ACD Percepta
Jump to navigation Jump to search
(Created page with "==Overview== <br /> The acid dissociation constant, Ka, is a measure of the tendency of a molecule or ion to keep a proton (H<sup>+</sup>) at its ionization center(s). It is ...")
 
(Updated for 2023 release)
 
(21 intermediate revisions by 2 users not shown)
Line 2: Line 2:
<br />
<br />


The acid dissociation constant, Ka, is a measure of the tendency of a molecule or ion to keep a proton (H<sup>+</sup>) at its ionization center(s). It is related to the ionization ability of chemical species and is a core property that defines chemical and biological behaviour.<br />
The acid dissociation constant, K<sub>a</sub>, is a measure of the tendency of a molecule or ion to keep a proton (H<sup>+</sup>) at its ionization center(s). It is related to the ionization ability of chemical species and is a core property that defines chemical and biological behaviour.<br />


===Features===
===Features===
* Includes two different predictive algorithms – ACD/pKa Classic and ACD/pKa GALAS.  
* Includes two different predictive algorithms – ACD/pKa Classic and ACD/pKa GALAS.  
* Calculates accurate acid and base pKa constants (pKa = -log Ka) under standard conditions (25°C and zero ionic strength) in aqueous solutions for every ionizable group within organic structures.
* Calculates accurate acid and base pK<sub>a</sub> constants (pK<sub>a</sub> = -log K<sub>a</sub>) under standard conditions (25°C and zero ionic strength) in aqueous solutions for every ionizable group within organic structures.
* Provides confidence intervals for all estimations indicating their accuracy.
* Provides confidence intervals for all estimations indicating their accuracy.
* Gives an explicit insight into processes running during each ionization stage. Contains a number of other useful features depending on the selected prediction algorithm.
* Gives an explicit insight into processes running during each ionization stage. Contains a number of other useful features depending on the selected prediction algorithm.
Line 17: Line 17:
<br />
<br />


[[Image:acdpka_classic.png|center]]
[[Image:Acdpka_classic.png|center]]
<br />
# Ionizable groups are highlighted using color shading (red for acid, blue for base, purple for amphoteric ionization centers). More intensive shading denotes strongest acid and base groups
# Select which type of pKa values to predict:
#* Apparent pKa: simulates the actual ionization of the compound in aqueous solution, accounting for the protonation states of other ionizable groups at relevant pH
#* Single pKa: estimates theoretical pKa that would be observed if the considered ionization center would be the only ionizable group in the molecule, so that the remainder of the molecule would always stay electrically neutral
# Strongest acid and base pK<sub>a</sub> values including reliability range in ±log units
# List of pK<sub>a</sub> constants for all stages of ionization
# List of dissociation stages (DS) corresponding to different pK<sub>a</sub> values.
# Hover over to see the screentip showing the respective dissociation reaction: <br>[[Image:Acdpka_classic_screentip.png]]
# Click the appropriate tab to display the protocol, according to which the pK<sub>a</sub> value for that dissociation stage was calculated.
# Click the structure fragment to see it highlighted in the Structure pane.
<br />
<br />
# LogP prediction obtained using ACD/LogP Classic calculation algorithm.
# Press "Configure" button to switch model training on or off, and to select the database file to use for training. 
# LogP calculation protocol. Lists the increments of all functional groups and carbon atoms, as well as the contirbutions of interaction through aliphatic, aromatic and vinylic systems.
# The protocol is interactive. Click on any entry to highlight the respective atom, group, or interaction onto the molecule.
# If the compound is found in '''LogP DB''', all available experimental data for that compound are displayed along with literature references.
<br>


=== ACD/pKa GALAS ===
=== ACD/pKa GALAS ===
Line 31: Line 36:
[[Image:acdpka_galas.png|center]]
[[Image:acdpka_galas.png|center]]
<br />
<br />
# Lipophilic parts of the molecule are highlighted in green, hydrophilic groups in red, and the intensity of the color indicates the predicted degree of lipophilicity or hydrophilicity of an atom or a substructure.
<ol>
# LogP prediction obtained using ACD/LogP GALAS calculation algorithm.
<li>Ionizable groups are highlighted using color shading (red for acid, blue for base, purple for amphoteric). More intensive shading denotes strongest acid and base groups</li>
# Reliability index (RI):<br>RI < 0.3 – Not Reliable,<br>RI in range 0.3-0.5 – Borderline Reliability,<br>RI in range 0.5-0.75 – Moderate Reliability,<br>RI >= 0.75 – High Reliability
<li>Strongest acid and base pK<sub>a</sub> values including reliability range in ±log units</li>
# "Configure" and "Train" buttons provide the means to select the training library for use in calculations and to add new data to that library. The name of the currently selected library is indicated with italic font.
<li>List of pK<sub>a</sub> constants for all stages of ionization</li>
# Displays 5 most similar compounds from '''LogP DB''' with experimental LogP values and literature references
<li>List of partial ionization reactions (microstages) responsible for each ionization stage. Contribution of each microstage to the final pK<sub>a</sub> value is given in percent</li>
<li>Hover over to see the screentip: [[Image:Acdpka_ms_screentip.png|right]]
 
:a. Color shading marks the ionization center
 
:b. Dissociation reaction and its pK<sub>a</sub> microconstant
 
</li><br style="clear:both">
<li>Click the appropriate tab to select the type of plot to be displayed</li>
<li>Net charge vs. pH plot</li>
<li>Protonation states of the molecule. The selected protonation state (PS2 in this example) is displayed in the screentip with ionized atoms marked by color-shading: [[Image:Acdpka_ps_screentip.png|right]]
</li><br style="clear:both">
<li>Click to view the Net Charge vs. pH table. Fractions of the ionic species having a particular net charge are displayed at selected points on the pH scale including physiologically relevant pH values (1.7, 4.6, 6.5, 7.4)
 
[[Image:acdpka_galas_charge_table.png|center]]
 
</li>
<li>Click and drag the slider to see calculated fractions of different ionic forms at precise pH value displayed on the right.</li>
<li>Calculated fractions of different ionic forms at selected pH.</li>
</ol>
<br />
 
[[Image:acdpka_galas_ps_plot.png|center]]
 
<ol>
<li>Protonation State vs. pH plot</li>
<li>Click the label of a protonation state to show / hide its curve on the plot</li>
<li>Fractions of different protonation states at selected pH</li>
<li>Click to view the Protonation State vs. pH table
[[Image:acdpka_galas_ps_table.png|center]]</li>
</ol>
 
<br style="clear: both" />
<br />
<br />


[[Image:acdpka_galas_ig_plot.png|center]]
<ol>
<li>
Ionogenic Group State vs. pH plot</li>
<li>Click the label of a ionogenic group to toggle its curve. Hover over the label to view a screentip with the selected ionogenic group shaded (G1 in this example): [[Image:acdpka_galas_ig_screentip.png|right]]</li>
<li>TC – total charge of all ionogenic groups in the molecule</li><br style="clear: both" />
<li>Click to view the Ionogenic Group State vs. pH table
[[Image:acdpka_galas_ig_table.png|center]]</li>
</ol>


<div class="mw-collapsible mw-collapsed">
<div class="mw-collapsible">


==Technical information==
==Technical information==
<br />


<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
=== The Different Available Models ===


The pKa prediction module offers two different predictive algorithms within ACD/Percepta software – ACD/pKa Classic and ACD/pKa GALAS.<br />
===Introduction to pK<sub>a</sub>===
 
 
The pK<sub>a</sub> is a measure of the tendency of a molecule or ion to keep a proton, H<sup>+</sup>, at its ionization center(s). It is related to ionization capabilities of chemical species. The more likely ionization occurs, the more likely a species will be taken up into aqueous solution, because water is a very polar solvent (its dielectric constant, ε<sup>20</sup> = 80). If a molecule does not readily ionize, then it will tend to stay in a non-polar solvent such as cyclohexane (ε<sup>20</sup> = 2) or octanol (ε<sup>20</sup> = 10). In biological terms, pK<sub>a</sub> is thus an important concept in determining whether a molecule will be taken up by aqueous tissue components or the lipid membranes. It is also closely related to the concepts of pH (the acidity of solution) and log''P'' (the partition coefficient between immiscible liquids).
 
The equilibrium acid ionization constant, K<sub>a</sub>, expresses the ratio of concentrations for the reaction:
 
<p style="text-align: center;">
HA + H<sub>2</sub>O → H<sub>3</sub>O<sup>+</sup> + A<sup>-</sup><br>
K<sub>a</sub> = [H<sub>3</sub>O<sup>+</sup>] [A<sup>-</sup>] / [HA]
</p>
 
where, by convention, it is assumed that the concentration of water is constant, and it is absorbed into the K<sub>a</sub> definition.
 
The acid ionization constant varies by orders of magnitude. For example, at 25°C:
* acetic acid: K<sub>a</sub> = 1.8 x 10<sup>-5</sup>
* phenol: K<sub>a</sub> = 1.0 x 10<sup>-10</sup>
 
It is easier to refer to such extreme numbers on a logarithmic scale and, again by convention, "p" is used to denote the negative logarithm (base 10):
 
<p style="text-align: center;">
pK<sub>a</sub> = -log(K<sub>a</sub>)
</p>
 
The K<sub>a</sub> values of the compounds above are then easily converted to pK<sub>a</sub> values:
* acetic acid: pK<sub>a</sub> = -log(1.8 x 10<sup>-5</sup>) = 4.756
* phenol: pK<sub>a</sub> = -log(1.0 x 10<sup>-10</sup>) = 10.0
 
There is an essential difference between interpreting the pK<sub>a</sub> values for molecules vs. ions. A molecule which loses a proton ionizes:
 
<p style="text-align: center;">
HA + H<sub>2</sub>O → H<sub>3</sub>O<sup>+</sup> + A<sup>-</sup>
</p>
and so a low pK<sub>a</sub> value denotes good aqueous solubility.
 
An ion which loses a proton, however, de-ionizes:
 
<p style="text-align: center;">
HB<sup>+</sup> + H<sub>2</sub>O → H<sub>3</sub>O<sup>+</sup> + B
</p>
and so a high pK<sub>a</sub> value denotes good aqueous solubility.
 
Note that there is no intrinsic reason to rule out pK<sub>a</sub> values less than 0 or greater than 14. For example, sulfuric acid, H<sub>2</sub>SO<sub>4</sub>, has a negative pK<sub>a</sub> for the loss of its first proton:
 
<p style="text-align: center;">
H<sub>2</sub>SO<sub>4</sub> → HSO<sub>4</sub><sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> < 0)
</p>
 
although normally experiment can only measure pK<sub>a</sub> between 1 and 13.
 
====Ionization Centers====
 
The pK<sub>a</sub> determination depends on the presence of heteroatoms such as oxygen or nitrogen. Although in principle a pK<sub>a</sub> value could be calculated for any atomic center, including carbon, in practice the extrapolation is poor for systems which have a very low amount of ionization. For example, the C–H bonds in methane have such highly covalent character that
 
<p style="text-align: center;">
CH<sub>4</sub> + H<sub>2</sub>O → CH<sub>3</sub><sup>-</sup> + H<sub>3</sub>O<sup>+</sup>
</p>
 
has a vanishingly small probability of occurring. Some C-H bonds do have measurable ionic character, and these are calculated by ACD/pKa. For example, the C–H bond of the methylene group at the 2-position in 1,3-cyclopentanedione is highly polarized; its pKa is predicted to be about 8.9:
 
[[File:Cyclopentanedione.gif|center]]
 
Normally, however, a heteroatom is part of the ionization center, and ACD/pKa is designed to test for the presence of heteroatoms which are capable of forming bonds with sufficient ionic character to have measurable pK<sub>a</sub> values, thus enabling reasonable prediction of pK<sub>a</sub> for related compounds.
 
<!--
<span style="color: red">Context specific to ACD/pKa</span>
====Automatic Protonation vs. Fixed Form====
 
The pK<sub>a</sub> software has been designed to do three types of calculation, which are discussed in more detail below:
* "Apparent constants" calculation when the algorithm automatically protonates the sketched-in molecule. You can specify whether you want the approximated or exact values;
* "Microconstants of the current form" calculation when the algorithm accepts the sketched-in molecule “as is” and tries to add a proton;
* "Single pK<sub>a</sub> values" calculation where ionization at each dissociation center is calculated in turn while the rest of the molecule is considered neutral.
 
We recommend the use of exact (or, for large molecules, approximated) apparent constants when first examining a molecule.
-->
 
====Statistical Factor====
 
The approximated calculation of constants will yield the '''statistical factor''' which takes into account identical protonation sites. Here is how the statistical factor is defined by leading authorities:
 
<blockquote>
"When a polybasic acid has ''n'' groups, each of which has an equal probability of losing a proton, the observed pK<sub>a</sub> will be less by (log ''n'') than the pK<sub>a</sub> of a closely related monobasic acid. This "statistical effect" arises because there are ''n'' equivalent ways of losing a proton but only one site to which the proton can be restored. Similarly, for second proton loss, the correction becomes (log((''n'' – 1) / 2), then (log((''n'' – 2) / 3), and so on. Thus, for a molecule such as butanedioic acid (HOOC–CH<sub>2</sub>–CH<sub>2</sub>–COOH), which has two identical acidic groups, loss of a proton from either group leads to the same monoanion. The consequence is that the first ionization constant, pK<sub>a1</sub>, for the dibasic acid is twice as large as that for the closely related monobasic acid, that is, the observed pK<sub>a1</sub> is 0.3 (= log2) units less than would be expected from a consideration of factors other than probability. Conversely, the monoanion has only one ionizable proton whereas the dianion has two identical sites for proton addition, so that the second ionization step, pK<sub>a2</sub>, appears to be weaker by a factor of two, and the observed pK<sub>a2</sub> to be greater by 0.3 than anticipated. Similarly, for a base with n basic centers, the measured pK<sub>a</sub> ["apparent pK<sub>a</sub>" in ACD/pKa] of greatest magnitude, pK<sub>aN</sub>, will be greater than anticipated by log ''n'', and so on."
</blockquote>
<blockquote>
D. D. Perrin, Boyd Dempsey and E. P. Serjeant, ''pKa Prediction for Organic Acids and Bases'', '''1981''', pp.16–17.
</blockquote>
 
====Experimental Measurement of pK<sub>a</sub>====
 
When comparing calculated pK<sub>a</sub> values with experimentally determined data, it is wise to bear in mind how these measurements are carried out.
 
The determination of pK<sub>a</sub> is based on pH measurements for a series of mixtures of the acid and its salt. For pK<sub>a</sub> values in the range 2–12, this is frequently done by titrimetric methods. The pH is converted to proton molality, and then K<sub>a</sub> is determined by measuring (or estimating) the activity coefficients of species in solution. Note that the temperature, ionic strength, and reference solutions used in these determinations can influence the measured pK<sub>a</sub> substantially. For example, benzoic acid was determined to have a pK<sub>a</sub> of 4.2 by one experimental group and 4.0 by another.
 
Another standard method is the spectrophotometric determination of pK<sub>a</sub>. This is particularly recommended for very small quantities of sample, or for poorly soluble sample. A refinement of this method requires an estimate of the spectra for each form from the data. The pK<sub>a</sub> values are determined by nonlinear curve fitting, assuming good initial estimates can be chosen. In theory, any kind of spectral data can be used—UV-Vis, IR, NMR, etc., provided that the pH of the solution in which the spectrum was obtained can be measured. A plot of absorbance versus pH will show asymptotes at the absorbance of the conjugate acid and base forms of the molecule. Each wavelength gives different asymptotes, but the same inflection point. Data at enough wavelengths will generate the spectra of the conjugate acid and base forms, even if they can't be measured experimentally, say, for molecules with pK<sub>a</sub> outside of the range 2–12. The (common) inflection point is the pK<sub>a</sub>. For molecules with multiple ionization sites, a sum of S-shaped curves that need to be deconvolved is obtained. Without good initial estimates, the calculations can be tedious. The better the initial estimate, the faster the convergence. ACD/pKa can provide good initial estimates for these calculations.
 
Just as there are aspects of experimental design which affect the accuracy of a pK<sub>a</sub> determination, there are also aspects to the physical solution which can lead to apparent disagreement between the calculated and measured pK<sub>a</sub>. For example, one factor which may cause a discrepancy between calculated and experimentally measured pK<sub>a</sub> values is the presence of a non-negligible tautomeric ratio. '''ACD/Percepta''' automatically checks for tautomers when a structure is entered in the '''Prediction module''' Workspace, and to check for tautomers in '''Spreadsheet''' Workspace, choose '''Check Tautomers''' command from the '''Utilities''' menu.
 
<!--
===Apparent Constants===
 
The "apparent constant" of the pK<sub>a</sub> is a method of calculation which mimics the experimental situation by "adding" protons to the molecule in the order the molecule would normally be protonated in solution.
 
For example, sketching the neutral glycine molecule H<sub>2</sub>N–CH<sub>2</sub>–COOH and specifying "apparent constant" will give two values: 9.64 and 2.43. A later section in this reference manual will describe how to attribute these values using ACD/pKa, but for now we note that these values are calculated for the actual ionization equilibria:
 
<p style="text-align: center;">
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>2</sub>N–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 9.64) <br>
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 2.43) <br>
</p>
 
<span style="color: red">Specific for ACD/pKa</span>
You can choose between “approximated” and “exact” forms of this calculation; the difference lies in the type of algorithm used. At the root of the difference is the fact that if two or more dissociated groups in the structure have close experimental pKa values, they interfere with each other so that the calculated values lie farther apart than what the experimentally-observed values actually do. With an “exact” calculation, this interference phenomenon is taken into consideration every time. With an “approximated” calculation, this is taken into account only for strictly identical groups. For example, for 2-methylbutanedioic acid, where there are two similar but not identical carboxyl groups, the approximated apparent pKa values have an uncertainty of 0.23 and 0.19 and differs from experimental values by 0.37 and 0.44 whereas the exact apparent pKa values disagree with the experimentally-measured pKa values by 0.08 and 0.10.
 
===Microconstants of Current Form===
 
The "microconstants of current form" is a method of calculation which accepts the input structure ''exactly as it is'', and tries to remove a proton from it.
 
For example, sketching the neutral glycine molecule H<sub>2</sub>N–CH<sub>2</sub>–COOH and specifying "microconstants of current form" will give two results: "not calculated" and 4.14. These are for:
 
<p style="text-align: center;">
H<sub>2</sub>N–CH<sub>2</sub>–COOH → HN<sup>-</sup>–CH<sub>2</sub>–COOH + H<sup>+</sup> (unlikely, therefore "not calculated") <br>
H<sub>2</sub>N–CH<sub>2</sub>–COOH → H<sub>2</sub>N<sup>+</sup>–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 4.14) <br>
</p>
 
On the other hand, if the input structure had specifically shown an ammonium group, H<sub>3</sub>N<sup>+</sup>, the current form calculation would have reported these two pK<sub>a</sub> values:
 
<p style="text-align: center;">
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>2</sub>N–CH<sub>2</sub>–COOH + H<sup>+</sup> (pK<sub>a</sub> = 7.56) <br>
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 2.43) <br>
</p>
 
===Single pKa Values===
 
The "single pKa" is a method intuitive to the way medicinal chemists views pKa values. If there are two acidic sites in the molecule, a chemist desires to know the relative acid pKa values. It could be done by calculating the pKa for each ionization site while the rest of the molecule is considered neutral. This is not what actually happens experimentally, if a di-acid is dissolved in strong base and titrated with acid, but it does indicate the relative ease of ionization at each center.
-->
 
===Database of Experimental pK<sub>a</sub> Values===
 
The internal database contains '''15,924''' structures with more than '''31,000''' experimental values under different temperatures and ionic strengths in purely aqueous solutions. In '''ACD/Percepta''', the database is directly accessible and searchable as '''Databases\pKa''' data source, and each experimental value is provided with a reference to the original literature. No pKa values in organic solvents or aqueous-organic mixtures are included.
 
===Description of ACD/pKa GALAS Algorithm===
 
Estimation of ionization constants using this algorithm is a multi-step procedure involving estimation of pK<sub>a</sub> microconstants for all possible ionization centers in a hypothetical state of an uncharged molecule ("fundamental microconstants"), numerous corrections of these initial pK<sub>a</sub> values according to the surrounding of the reaction center and calculation of charge influences of ionized groups to the neighbouring ionization centers. Calculation routine utilizes a database of 4,600 ionization centers, a set of ''ca.'' 500 various interaction constants and four interaction calculation methods for different types of interactions, producing a full range of microconstants from which pK<sub>a</sub> macroconstants are obtained. This allows for a simulation of complete distribution plot of all protonation states of the molecule at different pH conditions. For example, the complete simulated ionization profile for cysteine molecule is illustrated in the following figure:
 
[[File:Cysteine_Ionization_Profile.png|center]]
 
<span style="font-size:8pt">
<sup>1</sup>Experimental pK<sub>a</sub> values obtained from ''The Merck Index'' (see full citation below).
</span>
 
ACD/pKa GALAS algorithm is based on a training set containing 17,593 compounds (>20,000 ionization centers) obtained from various articles in peer-reviewed scientific journals and well-known reference books:
* ''The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals'', O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
* ''Therapeutic Drugs'', Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
* ''Clarke's Isolation and Identification of Drugs'', Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
 
A specific features of this algorithm include is the graphical/tabular representation of the obtained predictions in the form of pH dependency of:
* Net molecular charge
* Distribution of protonation states
* Average charge of each ionization centre


'''ACD/pKa Classic'''<br />
===Description of ACD/pKa Classic Algorithm===
   
   
The algorithm uses Hammet-type equations and electronic substituent constants (σ) to predict pKa values for ionisable groups. Effects considered by the software include tautomeric equilibria, covalent hydration, and resonance effects in α, β-unsaturated systems.<br>
This algorithm uses Hammett-type equations and electronic substituent constants (σ) to predict pK<sub>a</sub> values for ionizable groups. Effects considered by the software include tautomeric equilibria, covalent hydration, and resonance effects in α, β-unsaturated systems.
''Hammet-Type Equations'' — every ionizable group is characterized by several Hammet-type equations that have been parameterized to cover the most popular ionizable functional groups.<br>
 
''Sigma constants'' — the internal training set contains >3,000 derived experimental electronic constants.  When the required substituent constant is not available from the experimental database, one of four algorithms are used to describe electronic effect transmissions through the molecular system.<br>
''Hammett-Type Equations'' — every ionizable group is characterized by several Hammett-type equations that have been parameterized to cover the most popular ionizable functional groups.
Specific features of this particular algorithm include:<br>
 
* Number of compounds in the internal training set: 15,932 (>30,000 pKa values). Data sources: various articles from peer-reviewed scientific journals
''Sigma constants'' — the internal training set contains >3,000 derived experimental electronic constants.  When the required substituent constant is not available from the experimental database, one of four algorithms are used to describe electronic effect transmissions through the molecular system.
* Presents a detailed calculation protocol on how prediction has been carried out (including Hammett-type equations, substituent constants, and literature references where available).
 
* To improve prediction accuracy and make the model relevant to in-house chemical space or a particular project, the ACD/pKa Classic prediction model offers the ability for training with user provided experimental data. Training is user-friendly, and may be switched on, off, or certain training sets used for different predictions, putting full control in your hands.<br />
This method of pK<sub>a</sub> calculation mimics the experimental situation by "adding" protons to the molecule in the order the molecule would normally be protonated in solution. For example, performing the calculation for a neutral glycine molecule H<sub>2</sub>N–CH<sub>2</sub>–COOH will give two values: 9.64 and 2.43. These values are calculated for the actual ionization equilibria:
 
<p style="text-align: center;">
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>2</sub>N–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 9.64) <br>
H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COOH → H<sub>3</sub>N<sup>+</sup>–CH<sub>2</sub>–COO<sup>-</sup> + H<sup>+</sup> (pK<sub>a</sub> = 2.43) <br>
</p>
 
The internal training set of ACD/pKa Classic algorithm contains 15,932 molecules representing >30,000 pK<sub>a</sub> values.  
 
Specific features of this particular algorithm are as follows:
* A detailed calculation protocol on how the prediction has been carried out is provided for each molecule (including Hammett-type equations, substituent constants, and literature references where available).
* To improve prediction accuracy and make the model relevant to in-house chemical space or a particular project, the ACD/pKa Classic prediction model offers the ability for training with user provided experimental data. Training is user-friendly, and may be switched on, off, or certain training sets used for different predictions, putting full control in your hands.
 
Further sections of this document provide more detailed information regarding the various aspects of ACD/pKa Classic algorithm.
 
===Database of Hammett-type Equations===
 
The Hammett-type equations used in ACD/pKa calculations have been parameterized to cover over 1,500 combinations of over 650 of the most popular ionizable functional groups. Each functional group has been characterized by several equations involving different types of substituent constants in order to achieve the most accurate calculation. All equations for a given functional group have been ranked according to their reliability (number of correlated structures, correlation coefficient and standard deviation) and reliability of available substituent constants. For example, the following ranking has been used for calculating pK<sub>a</sub> values of para-substituted quinolines:
 
# pK<sub>a</sub> = 5.009 – 5.058*σ<sub>I</sub> – 4.363*σ<sub>R</sub><sup>+</sup> : ''n'' = 10, ''r'' = 0.9989, ''sd'' = 0.13
# pK<sub>a</sub> = 4.874 – 4.561*σ<sub>I</sub> – 5.63*σ<sub>R</sub> : ''n'' = 10, ''r'' = 0.9878, ''sd'' = 0.46
# pK<sub>a</sub> = 5.179 – 5.318*σ<sub>Para</sub> : ''n'' = 9, ''r'' = 0.9878, ''sd'' = 0.42
 
===Database of Electronic Substituent Constants (σ)===
 
There are many variants of the original electronic substituent constant, σ. The ACD/pKa database contains constants for over 1,200 substituents with over 3,000 carefully derived experimental electronic constants. The following table summarizes the number of constant values present in the database.
 
{| border="1" class="wikitable" style="margin: 1em auto 1em auto; text-align: center;"
|-
! Sigma !! Number in Database
|-
| σ<sub>I</sub> || 592
|-
| σ<sup>*</sup> (Taft) || 265
|-
| σ<sub>R</sub> || 453
|-
| σ<sub>R</sub><sup>–</sup> || 157
|-
| σ<sub>R</sub><sup>+</sup> || 143
|-
| σ<sub>Para</sub> || 585
|-
| σ<sub>Meta</sub> || 431
|-
| σ<sub>Para</sub><sup>–</sup> || 142
|-
| σ<sub>Para</sub><sup>+</sup> || 135
|-
| σ<sub>Phosph</sub> (P-Acids) || 68
|-
| σ<sub>Ortho</sub> (Benzoic acid) || 41
|-
| σ<sub>Ortho</sub> (Phenol) || 37
|-
| σ<sub>Ortho</sub> (Aniline) || 30
|-
| σ<sub>Ortho</sub> (Pyridine) || 48
|-
|}
 
===Estimation of Electronic Substituent Constants===
 
Although the parameter database contains a wide array of σ values, in some cases no reliable constant is available. When the required substituent constant is not available from the experimental database it can be calculated by one of the algorithms described in this section.
 
====Electronic Effect Transmission through Skeleton====
 
This estimation is based on the following formula:
 
<p style="text-align: center;">
''σ<sup>R–G–</sup>'' = ''σ<sup>–G–</sup>'' + Σ''z<sub>I,R,…</sub><sup>–G–</sup>''∙''σ<sub>I,R,…</sub><sup>R–</sup>'' + Σ''z<sub>I,R,…</sub><sup>–G–</sup>''∙(''σ<sub>I</sub><sup>R–</sup>''∙''σ<sub>R</sub><sup>R–</sup>'')…,
</p>
 
where all ''σ<sub>I,R,…</sub><sup>R–</sup>'' are substituent R electronic constants (inductive, resonance, etc.) and all ''z<sub>I,R,…sup>–G–</sup>'' are skeleton G transmission constants. The accuracy of the ''σ<sup>R–G–</sup>'' calculation is usually '''better than ±0.05–0.1'''. The algorithm contains 42 of the most frequently used skeletons G described by 126 such equations:
 
<p style="text-align: center;">
''σ<sub>I</sub>''–36, ''σ<sub>R</sub>''–25, ''σ<sub>R</sub><sup>-</sup>''–6, ''σ<sub>R</sub><sup>+</sup>''–4, ''σ<sub>Para</sub>''–24, ''σ<sub>Meta</sub>''–24, ''σ<sub>Phosph</sub>''–7
</p>
 
For example, the following constants which are calculated for carbamate species containing the carbamate functional group were determined to be ''σ<sub>I</sub>'' = 0.45, ''σ<sub>R</sub>'' = -0.34, ''σ<sub>R</sub><sup>-</sup>'' = -0.36, ''σ<sub>R</sub><sup>+</sup>'' = -0.38, ''σ<sub>Para</sub>'' = 0.10, ''σ<sub>Meta</sub>'' = 0.32, ''σ<sub>Phosph</sub>'' = 0.0238.
 
[[File:pKa_Carbamate.gif|center]]
 
Using these parameters, the pK<sub>a</sub> of 2-ammonio-4-thioxohexanedioate calculated by this method is 7.72 (experimental is 7.90).
 
[[File:2-ammonio-4-thioxohexanedioate.gif|center]]
 
====Secondary Algorithm====
 
If the preceding estimate cannot be made, a back-up method is available, based on the following formula:
 
<p style="text-align: center;">
''σ<sup>R–G–</sup>'' = ''σ<sup>–G–</sup>'' + ''z<sub>I</sub><sup>–G–</sup>''∙''σ<sub>I</sub><sup>R–</sup>''
</p>
 
The accuracy of the ''σ<sup>R–G–</sup>'' calculation is usually '''±0.15–0.20'''. It is not as good as the first algorithm, but it can be used to calculate the ''σ<sub>I</sub>'', ''σ<sup>*</sup>'', ''σ<sub>R</sub>'' and ''σ<sub>R</sub><sup>-</sup>'' electronic constants for '''any possible substituents'''.
 
For example, the constants ''σ<sub>I</sub>'' = 0.37, and ''σ<sub>R</sub>'' = 0.08 are calculated for N-trifluoromethyl-carbamothioic halides:
 
[[File:N-trifluoromethyl-carbamothioic_halide.gif|center]]
 
====Transmission through Aliphatic Cycles====
 
This algorithm is based on the modified '''Exner-Fiedler method'''. The original Exner-Fiedler method can be used to calculate electronic transmission effects for only very limited number of aliphatic cycles. The improved ACD/pKa method allows calculation of these effects for '''any possible aliphatic (poly)cycles'''.
 
For example, the calculated transmission factor for variants of bicyclo[1.1.0]butane-1-carboxylic acid is 1.72 (experimental is 1.92).
 
[[File:bicyclobutane-1-carboxylic_acid.gif|center]]
 
====Transmission through Condensed Polyaromatic Systems====
 
This algorithm is based on the '''modified Dewar-Grisdale''' method. The original Dewar-Grisdale method can be used to calculate electronic transmission effects for only very limited number of condensed polyaromatic systems (Dewar M.J.S., Grisdale P.J., ''J. Am. Chem. Soc.'', '''1962''', 84, 3539). [http://pubs.acs.org/doi/abs/10.1021/ja00877a023] The improved ACD/pKa method allows you to calculate these effects for '''virtually any polyaromatic system'''.
 
For example, the pK<sub>a</sub> of the 3-amino-5-hydroxynaphthalene-2,7-disulfonate calculated by this method is 8.64 (the experimentally determined value is 8.54):
 
[[File:3-amino-5-hydroxynaphthalene-2,7-disulfonate.gif|center]]
 
===Calculation of Steric Effects===
 
In most cases, steric effects have been taken into account by defining the ionization center as an ionizable functional group with a sufficiently large invariable skeleton. In cases where the variable substituents are in close proximity to ionizable groups, steric effects are calculated by the modified branching equations. For example, pK<sub>a</sub> of N-monoalkylanilynium ions are calculated by the following equation:
 
<p style="text-align: center;">
pK<sub>a</sub> = 4.85 + 0.27 x (''n<sub>β</sub>'')<sup>1.84</sup> - 0.08 x (''n<sub>γ</sub>'')<sup>2.36</sup> + 0.01 x (''n<sub>δ</sub>'')<sup>2.36</sup> (''sd'' = 0.2)
</p>
 
where ''n<sub>β</sub>'', ''n<sub>γ</sub>'' and ''n<sub>δ</sub>'' denote the numbers of atoms in second, third and fourth spheres of the N-alkyl substituent. The accuracy of the pK<sub>a</sub> calculation for N-t-butyl anilynium is ±0.1, whereas without this equation it would be ±2!
 
===Calculation of Charge Effects===
 
In most cases, charge effects have been taken into account by including the constant charged substituent into the definition of ionizable center. For example, the pK<sub>a</sub> of carboxy groups in α-amino acids are calculated from the equation characterizing the –CH(NH<sub>3</sub><sup>+</sup>)COOH ionization center. In the cases when the charged substituent is variable, its effect is calculated from the distance to ionization center.
 
===Other Effects===
 
ACD/pKa warns you when other effects may appear which affect the experimentally observed pK<sub>a</sub> values. These effects, if not properly taken into account, may cause a large discrepancy between the calculated and experimentally observed pK<sub>a</sub> values.
 
====Tautomeric Equilibria====


For certain compounds, there is mixture of two or more structurally distinct species which are in rapid equilibrium. Normally proton transfer is involved in tautomeric equilibria. Some of the most common instances of tautomerism are related to the following forms:
* keto-enol;
* phenol-keto;
* nitroso-oxime;
* aliphatic nitro compounds; and
* imine-enamine.


'''ACD/pKa GALAS'''<br />
If you are calculating pK<sub>a</sub> values for species which contain these functional groups, after entering the compound structure in '''ACD/Percepta''' you should always choose the appropriate tautomer from the '''Select Tautomeric Form''' dialog box that is automatically shown in such cases. For example, 3 tautomeric forms are possible for the hydroxytriazoliumonate species:
 
[[File:Hydroxytriazoliumonate_tautomers.gif|center]]
 
====Covalent Hydration====
 
If the energy barrier to the addition of water across a double bond is relatively low, this can be a significant complicating factor in the accurate experimental determination of pK<sub>a</sub>; thus, ACD/pKa is designed to flag known cases. For example, for pteridine, a pK<sub>a</sub> calculation will automatically flag the species on the left as undergoing covalent hydration:
 
[[File:pKa_Covalent_Hydration.gif|center]]
 
====Vinylology====
 
Another complicating factor in the calculation and measurement of pK<sub>a</sub> is vinylology. Vinylology occurs due to resonance effects being transmitted through the double bond. In α,β-unsaturated ketones, nitriles, and esters, such as in the following structures
 
[[File:pKa_Vinylology.gif|center]]
the γ-hydrogen acquires a level of acidity normally held by the position α to the carbonyl group. Due to vinylology, alkylation at the α-position competes with alkylation at the γ-position.


Estimation of ionization constants using this algorithm is a multi-step procedure involving estimation of pKa microconstants for all possible ionization centers in a hypothetical state of an uncharged molecule (“fundamental microconstants”), numerous corrections of these initial pKa values according to the surrounding of the reaction center and calculation of charge influences of ionized groups to the neighbouring ionization centers. Calculation routine utilizes a database of 4,600 ionization centers, a set of ca. 500 various interaction constants and four interaction calculation methods for different types of interactions, producing a full range of microconstants from which pKa macroconstants are obtained. This allows for a simulation of complete distribution plot of all protonation states of the molecule at different pH conditions.<br>
ACD/pKa does not explicitly flag cases of vinylology, although a message about tautomeric forms may appear.
Specific features of this particular algorithm include:
* Number of compounds in the internal training set: 17,593 (>20,000 ionization centers).  Data sources:
** Reference books:
*** ''The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals'', O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
*** ''Therapeutic Drugs'', Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
*** ''Clarke's Isolation and Identification of Drugs'', Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
** Various articles from peer-reviewed scientific journals
* Provides graphical/tabular representation of the obtained predictions in the form of pH dependency of:
** Net molecular charge
** Distribution of protonation states
** Average charge of each ionization centre<br />


====Limitations====


=== Internal pKa database ===
ACD/pKa Classic algorithm will refuse to predict the pKa for structures that:
To complete the picture, pKa predictive module can be accompanied by a reference database containing high quality experimental data compiled from the literature for nearly 16,000 individual chemical compounds. With this database at hand you can always check what pKa values have been reported in practice for the molecules that are related to your compound of interest.
* Contain more than 255 atoms (note that the program refuses to predict pKa for some cyclic compounds having less than 255 atoms due to the fact that the program uses a cycle-breaking algorithm that increases the number of atoms)
* Do not contain an ionization center
* Contain atoms of non-typical valence
* Contain atoms other than C, H, O, S, P, N, F, Cl, Br, I, Se, Si, Ge, Pb, Sn, As, and B
* Contain two or more fragments
* Contain more than 30 ionizable centers
* Contain d-block or f-block metal atoms
* Contain textual abbreviations which cannot be transformed to structural fragments.


</div>
'''Note:''' There certainly exist some structures that formally meet the aforementioned limitations, but cannot be calculated with the current algorithm.
</div>

Latest revision as of 09:29, 26 July 2023

Overview


The acid dissociation constant, Ka, is a measure of the tendency of a molecule or ion to keep a proton (H+) at its ionization center(s). It is related to the ionization ability of chemical species and is a core property that defines chemical and biological behaviour.

Features

  • Includes two different predictive algorithms – ACD/pKa Classic and ACD/pKa GALAS.
  • Calculates accurate acid and base pKa constants (pKa = -log Ka) under standard conditions (25°C and zero ionic strength) in aqueous solutions for every ionizable group within organic structures.
  • Provides confidence intervals for all estimations indicating their accuracy.
  • Gives an explicit insight into processes running during each ionization stage. Contains a number of other useful features depending on the selected prediction algorithm.


Interface


ACD/pKa Classic


Acdpka classic.png


  1. Ionizable groups are highlighted using color shading (red for acid, blue for base, purple for amphoteric ionization centers). More intensive shading denotes strongest acid and base groups
  2. Select which type of pKa values to predict:
    • Apparent pKa: simulates the actual ionization of the compound in aqueous solution, accounting for the protonation states of other ionizable groups at relevant pH
    • Single pKa: estimates theoretical pKa that would be observed if the considered ionization center would be the only ionizable group in the molecule, so that the remainder of the molecule would always stay electrically neutral
  3. Strongest acid and base pKa values including reliability range in ±log units
  4. List of pKa constants for all stages of ionization
  5. List of dissociation stages (DS) corresponding to different pKa values.
  6. Hover over to see the screentip showing the respective dissociation reaction:
    Acdpka classic screentip.png
  7. Click the appropriate tab to display the protocol, according to which the pKa value for that dissociation stage was calculated.
  8. Click the structure fragment to see it highlighted in the Structure pane.


ACD/pKa GALAS


acdpka galas.png


  1. Ionizable groups are highlighted using color shading (red for acid, blue for base, purple for amphoteric). More intensive shading denotes strongest acid and base groups
  2. Strongest acid and base pKa values including reliability range in ±log units
  3. List of pKa constants for all stages of ionization
  4. List of partial ionization reactions (microstages) responsible for each ionization stage. Contribution of each microstage to the final pKa value is given in percent
  5. Hover over to see the screentip:
    Acdpka ms screentip.png
    a. Color shading marks the ionization center
    b. Dissociation reaction and its pKa microconstant

  6. Click the appropriate tab to select the type of plot to be displayed
  7. Net charge vs. pH plot
  8. Protonation states of the molecule. The selected protonation state (PS2 in this example) is displayed in the screentip with ionized atoms marked by color-shading:
    Acdpka ps screentip.png

  9. Click to view the Net Charge vs. pH table. Fractions of the ionic species having a particular net charge are displayed at selected points on the pH scale including physiologically relevant pH values (1.7, 4.6, 6.5, 7.4)
    acdpka galas charge table.png
  10. Click and drag the slider to see calculated fractions of different ionic forms at precise pH value displayed on the right.
  11. Calculated fractions of different ionic forms at selected pH.


acdpka galas ps plot.png
  1. Protonation State vs. pH plot
  2. Click the label of a protonation state to show / hide its curve on the plot
  3. Fractions of different protonation states at selected pH
  4. Click to view the Protonation State vs. pH table
    acdpka galas ps table.png



acdpka galas ig plot.png
  1. Ionogenic Group State vs. pH plot
  2. Click the label of a ionogenic group to toggle its curve. Hover over the label to view a screentip with the selected ionogenic group shaded (G1 in this example):
    acdpka galas ig screentip.png
  3. TC – total charge of all ionogenic groups in the molecule

  4. Click to view the Ionogenic Group State vs. pH table
    acdpka galas ig table.png

Technical information

Introduction to pKa

The pKa is a measure of the tendency of a molecule or ion to keep a proton, H+, at its ionization center(s). It is related to ionization capabilities of chemical species. The more likely ionization occurs, the more likely a species will be taken up into aqueous solution, because water is a very polar solvent (its dielectric constant, ε20 = 80). If a molecule does not readily ionize, then it will tend to stay in a non-polar solvent such as cyclohexane (ε20 = 2) or octanol (ε20 = 10). In biological terms, pKa is thus an important concept in determining whether a molecule will be taken up by aqueous tissue components or the lipid membranes. It is also closely related to the concepts of pH (the acidity of solution) and logP (the partition coefficient between immiscible liquids).

The equilibrium acid ionization constant, Ka, expresses the ratio of concentrations for the reaction:

HA + H2O → H3O+ + A-
Ka = [H3O+] [A-] / [HA]

where, by convention, it is assumed that the concentration of water is constant, and it is absorbed into the Ka definition.

The acid ionization constant varies by orders of magnitude. For example, at 25°C:

  • acetic acid: Ka = 1.8 x 10-5
  • phenol: Ka = 1.0 x 10-10

It is easier to refer to such extreme numbers on a logarithmic scale and, again by convention, "p" is used to denote the negative logarithm (base 10):

pKa = -log(Ka)

The Ka values of the compounds above are then easily converted to pKa values:

  • acetic acid: pKa = -log(1.8 x 10-5) = 4.756
  • phenol: pKa = -log(1.0 x 10-10) = 10.0

There is an essential difference between interpreting the pKa values for molecules vs. ions. A molecule which loses a proton ionizes:

HA + H2O → H3O+ + A-

and so a low pKa value denotes good aqueous solubility.

An ion which loses a proton, however, de-ionizes:

HB+ + H2O → H3O+ + B

and so a high pKa value denotes good aqueous solubility.

Note that there is no intrinsic reason to rule out pKa values less than 0 or greater than 14. For example, sulfuric acid, H2SO4, has a negative pKa for the loss of its first proton:

H2SO4 → HSO4- + H+ (pKa < 0)

although normally experiment can only measure pKa between 1 and 13.

Ionization Centers

The pKa determination depends on the presence of heteroatoms such as oxygen or nitrogen. Although in principle a pKa value could be calculated for any atomic center, including carbon, in practice the extrapolation is poor for systems which have a very low amount of ionization. For example, the C–H bonds in methane have such highly covalent character that

CH4 + H2O → CH3- + H3O+

has a vanishingly small probability of occurring. Some C-H bonds do have measurable ionic character, and these are calculated by ACD/pKa. For example, the C–H bond of the methylene group at the 2-position in 1,3-cyclopentanedione is highly polarized; its pKa is predicted to be about 8.9:

Cyclopentanedione.gif

Normally, however, a heteroatom is part of the ionization center, and ACD/pKa is designed to test for the presence of heteroatoms which are capable of forming bonds with sufficient ionic character to have measurable pKa values, thus enabling reasonable prediction of pKa for related compounds.


Statistical Factor

The approximated calculation of constants will yield the statistical factor which takes into account identical protonation sites. Here is how the statistical factor is defined by leading authorities:

"When a polybasic acid has n groups, each of which has an equal probability of losing a proton, the observed pKa will be less by (log n) than the pKa of a closely related monobasic acid. This "statistical effect" arises because there are n equivalent ways of losing a proton but only one site to which the proton can be restored. Similarly, for second proton loss, the correction becomes (log((n – 1) / 2), then (log((n – 2) / 3), and so on. Thus, for a molecule such as butanedioic acid (HOOC–CH2–CH2–COOH), which has two identical acidic groups, loss of a proton from either group leads to the same monoanion. The consequence is that the first ionization constant, pKa1, for the dibasic acid is twice as large as that for the closely related monobasic acid, that is, the observed pKa1 is 0.3 (= log2) units less than would be expected from a consideration of factors other than probability. Conversely, the monoanion has only one ionizable proton whereas the dianion has two identical sites for proton addition, so that the second ionization step, pKa2, appears to be weaker by a factor of two, and the observed pKa2 to be greater by 0.3 than anticipated. Similarly, for a base with n basic centers, the measured pKa ["apparent pKa" in ACD/pKa] of greatest magnitude, pKaN, will be greater than anticipated by log n, and so on."

D. D. Perrin, Boyd Dempsey and E. P. Serjeant, pKa Prediction for Organic Acids and Bases, 1981, pp.16–17.

Experimental Measurement of pKa

When comparing calculated pKa values with experimentally determined data, it is wise to bear in mind how these measurements are carried out.

The determination of pKa is based on pH measurements for a series of mixtures of the acid and its salt. For pKa values in the range 2–12, this is frequently done by titrimetric methods. The pH is converted to proton molality, and then Ka is determined by measuring (or estimating) the activity coefficients of species in solution. Note that the temperature, ionic strength, and reference solutions used in these determinations can influence the measured pKa substantially. For example, benzoic acid was determined to have a pKa of 4.2 by one experimental group and 4.0 by another.

Another standard method is the spectrophotometric determination of pKa. This is particularly recommended for very small quantities of sample, or for poorly soluble sample. A refinement of this method requires an estimate of the spectra for each form from the data. The pKa values are determined by nonlinear curve fitting, assuming good initial estimates can be chosen. In theory, any kind of spectral data can be used—UV-Vis, IR, NMR, etc., provided that the pH of the solution in which the spectrum was obtained can be measured. A plot of absorbance versus pH will show asymptotes at the absorbance of the conjugate acid and base forms of the molecule. Each wavelength gives different asymptotes, but the same inflection point. Data at enough wavelengths will generate the spectra of the conjugate acid and base forms, even if they can't be measured experimentally, say, for molecules with pKa outside of the range 2–12. The (common) inflection point is the pKa. For molecules with multiple ionization sites, a sum of S-shaped curves that need to be deconvolved is obtained. Without good initial estimates, the calculations can be tedious. The better the initial estimate, the faster the convergence. ACD/pKa can provide good initial estimates for these calculations.

Just as there are aspects of experimental design which affect the accuracy of a pKa determination, there are also aspects to the physical solution which can lead to apparent disagreement between the calculated and measured pKa. For example, one factor which may cause a discrepancy between calculated and experimentally measured pKa values is the presence of a non-negligible tautomeric ratio. ACD/Percepta automatically checks for tautomers when a structure is entered in the Prediction module Workspace, and to check for tautomers in Spreadsheet Workspace, choose Check Tautomers command from the Utilities menu.


Database of Experimental pKa Values

The internal database contains 15,924 structures with more than 31,000 experimental values under different temperatures and ionic strengths in purely aqueous solutions. In ACD/Percepta, the database is directly accessible and searchable as Databases\pKa data source, and each experimental value is provided with a reference to the original literature. No pKa values in organic solvents or aqueous-organic mixtures are included.

Description of ACD/pKa GALAS Algorithm

Estimation of ionization constants using this algorithm is a multi-step procedure involving estimation of pKa microconstants for all possible ionization centers in a hypothetical state of an uncharged molecule ("fundamental microconstants"), numerous corrections of these initial pKa values according to the surrounding of the reaction center and calculation of charge influences of ionized groups to the neighbouring ionization centers. Calculation routine utilizes a database of 4,600 ionization centers, a set of ca. 500 various interaction constants and four interaction calculation methods for different types of interactions, producing a full range of microconstants from which pKa macroconstants are obtained. This allows for a simulation of complete distribution plot of all protonation states of the molecule at different pH conditions. For example, the complete simulated ionization profile for cysteine molecule is illustrated in the following figure:

Cysteine Ionization Profile.png

1Experimental pKa values obtained from The Merck Index (see full citation below).

ACD/pKa GALAS algorithm is based on a training set containing 17,593 compounds (>20,000 ionization centers) obtained from various articles in peer-reviewed scientific journals and well-known reference books:

  • The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals, O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
  • Therapeutic Drugs, Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
  • Clarke's Isolation and Identification of Drugs, Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986

A specific features of this algorithm include is the graphical/tabular representation of the obtained predictions in the form of pH dependency of:

  • Net molecular charge
  • Distribution of protonation states
  • Average charge of each ionization centre

Description of ACD/pKa Classic Algorithm

This algorithm uses Hammett-type equations and electronic substituent constants (σ) to predict pKa values for ionizable groups. Effects considered by the software include tautomeric equilibria, covalent hydration, and resonance effects in α, β-unsaturated systems.

Hammett-Type Equations — every ionizable group is characterized by several Hammett-type equations that have been parameterized to cover the most popular ionizable functional groups.

Sigma constants — the internal training set contains >3,000 derived experimental electronic constants. When the required substituent constant is not available from the experimental database, one of four algorithms are used to describe electronic effect transmissions through the molecular system.

This method of pKa calculation mimics the experimental situation by "adding" protons to the molecule in the order the molecule would normally be protonated in solution. For example, performing the calculation for a neutral glycine molecule H2N–CH2–COOH will give two values: 9.64 and 2.43. These values are calculated for the actual ionization equilibria:

H3N+–CH2–COOH → H2N–CH2–COO- + H+ (pKa = 9.64)
H3N+–CH2–COOH → H3N+–CH2–COO- + H+ (pKa = 2.43)

The internal training set of ACD/pKa Classic algorithm contains 15,932 molecules representing >30,000 pKa values.

Specific features of this particular algorithm are as follows:

  • A detailed calculation protocol on how the prediction has been carried out is provided for each molecule (including Hammett-type equations, substituent constants, and literature references where available).
  • To improve prediction accuracy and make the model relevant to in-house chemical space or a particular project, the ACD/pKa Classic prediction model offers the ability for training with user provided experimental data. Training is user-friendly, and may be switched on, off, or certain training sets used for different predictions, putting full control in your hands.

Further sections of this document provide more detailed information regarding the various aspects of ACD/pKa Classic algorithm.

Database of Hammett-type Equations

The Hammett-type equations used in ACD/pKa calculations have been parameterized to cover over 1,500 combinations of over 650 of the most popular ionizable functional groups. Each functional group has been characterized by several equations involving different types of substituent constants in order to achieve the most accurate calculation. All equations for a given functional group have been ranked according to their reliability (number of correlated structures, correlation coefficient and standard deviation) and reliability of available substituent constants. For example, the following ranking has been used for calculating pKa values of para-substituted quinolines:

  1. pKa = 5.009 – 5.058*σI – 4.363*σR+ : n = 10, r = 0.9989, sd = 0.13
  2. pKa = 4.874 – 4.561*σI – 5.63*σR : n = 10, r = 0.9878, sd = 0.46
  3. pKa = 5.179 – 5.318*σPara : n = 9, r = 0.9878, sd = 0.42

Database of Electronic Substituent Constants (σ)

There are many variants of the original electronic substituent constant, σ. The ACD/pKa database contains constants for over 1,200 substituents with over 3,000 carefully derived experimental electronic constants. The following table summarizes the number of constant values present in the database.

Sigma Number in Database
σI 592
σ* (Taft) 265
σR 453
σR 157
σR+ 143
σPara 585
σMeta 431
σPara 142
σPara+ 135
σPhosph (P-Acids) 68
σOrtho (Benzoic acid) 41
σOrtho (Phenol) 37
σOrtho (Aniline) 30
σOrtho (Pyridine) 48

Estimation of Electronic Substituent Constants

Although the parameter database contains a wide array of σ values, in some cases no reliable constant is available. When the required substituent constant is not available from the experimental database it can be calculated by one of the algorithms described in this section.

Electronic Effect Transmission through Skeleton

This estimation is based on the following formula:

σR–G– = σ–G– + ΣzI,R,…–G–σI,R,…R– + ΣzI,R,…–G–∙(σIR–σRR–)…,

where all σI,R,…R– are substituent R electronic constants (inductive, resonance, etc.) and all zI,R,…sup>–G– are skeleton G transmission constants. The accuracy of the σR–G– calculation is usually better than ±0.05–0.1. The algorithm contains 42 of the most frequently used skeletons G described by 126 such equations:

σI–36, σR–25, σR-–6, σR+–4, σPara–24, σMeta–24, σPhosph–7

For example, the following constants which are calculated for carbamate species containing the carbamate functional group were determined to be σI = 0.45, σR = -0.34, σR- = -0.36, σR+ = -0.38, σPara = 0.10, σMeta = 0.32, σPhosph = 0.0238.

pKa Carbamate.gif

Using these parameters, the pKa of 2-ammonio-4-thioxohexanedioate calculated by this method is 7.72 (experimental is 7.90).

2-ammonio-4-thioxohexanedioate.gif

Secondary Algorithm

If the preceding estimate cannot be made, a back-up method is available, based on the following formula:

σR–G– = σ–G– + zI–G–σIR–

The accuracy of the σR–G– calculation is usually ±0.15–0.20. It is not as good as the first algorithm, but it can be used to calculate the σI, σ*, σR and σR- electronic constants for any possible substituents.

For example, the constants σI = 0.37, and σR = 0.08 are calculated for N-trifluoromethyl-carbamothioic halides:

N-trifluoromethyl-carbamothioic halide.gif

Transmission through Aliphatic Cycles

This algorithm is based on the modified Exner-Fiedler method. The original Exner-Fiedler method can be used to calculate electronic transmission effects for only very limited number of aliphatic cycles. The improved ACD/pKa method allows calculation of these effects for any possible aliphatic (poly)cycles.

For example, the calculated transmission factor for variants of bicyclo[1.1.0]butane-1-carboxylic acid is 1.72 (experimental is 1.92).

bicyclobutane-1-carboxylic acid.gif

Transmission through Condensed Polyaromatic Systems

This algorithm is based on the modified Dewar-Grisdale method. The original Dewar-Grisdale method can be used to calculate electronic transmission effects for only very limited number of condensed polyaromatic systems (Dewar M.J.S., Grisdale P.J., J. Am. Chem. Soc., 1962, 84, 3539). [1] The improved ACD/pKa method allows you to calculate these effects for virtually any polyaromatic system.

For example, the pKa of the 3-amino-5-hydroxynaphthalene-2,7-disulfonate calculated by this method is 8.64 (the experimentally determined value is 8.54):

3-amino-5-hydroxynaphthalene-2,7-disulfonate.gif

Calculation of Steric Effects

In most cases, steric effects have been taken into account by defining the ionization center as an ionizable functional group with a sufficiently large invariable skeleton. In cases where the variable substituents are in close proximity to ionizable groups, steric effects are calculated by the modified branching equations. For example, pKa of N-monoalkylanilynium ions are calculated by the following equation:

pKa = 4.85 + 0.27 x (nβ)1.84 - 0.08 x (nγ)2.36 + 0.01 x (nδ)2.36 (sd = 0.2)

where nβ, nγ and nδ denote the numbers of atoms in second, third and fourth spheres of the N-alkyl substituent. The accuracy of the pKa calculation for N-t-butyl anilynium is ±0.1, whereas without this equation it would be ±2!

Calculation of Charge Effects

In most cases, charge effects have been taken into account by including the constant charged substituent into the definition of ionizable center. For example, the pKa of carboxy groups in α-amino acids are calculated from the equation characterizing the –CH(NH3+)COOH ionization center. In the cases when the charged substituent is variable, its effect is calculated from the distance to ionization center.

Other Effects

ACD/pKa warns you when other effects may appear which affect the experimentally observed pKa values. These effects, if not properly taken into account, may cause a large discrepancy between the calculated and experimentally observed pKa values.

Tautomeric Equilibria

For certain compounds, there is mixture of two or more structurally distinct species which are in rapid equilibrium. Normally proton transfer is involved in tautomeric equilibria. Some of the most common instances of tautomerism are related to the following forms:

  • keto-enol;
  • phenol-keto;
  • nitroso-oxime;
  • aliphatic nitro compounds; and
  • imine-enamine.

If you are calculating pKa values for species which contain these functional groups, after entering the compound structure in ACD/Percepta you should always choose the appropriate tautomer from the Select Tautomeric Form dialog box that is automatically shown in such cases. For example, 3 tautomeric forms are possible for the hydroxytriazoliumonate species:

Hydroxytriazoliumonate tautomers.gif

Covalent Hydration

If the energy barrier to the addition of water across a double bond is relatively low, this can be a significant complicating factor in the accurate experimental determination of pKa; thus, ACD/pKa is designed to flag known cases. For example, for pteridine, a pKa calculation will automatically flag the species on the left as undergoing covalent hydration:

pKa Covalent Hydration.gif

Vinylology

Another complicating factor in the calculation and measurement of pKa is vinylology. Vinylology occurs due to resonance effects being transmitted through the double bond. In α,β-unsaturated ketones, nitriles, and esters, such as in the following structures

pKa Vinylology.gif

the γ-hydrogen acquires a level of acidity normally held by the position α to the carbonyl group. Due to vinylology, alkylation at the α-position competes with alkylation at the γ-position.

ACD/pKa does not explicitly flag cases of vinylology, although a message about tautomeric forms may appear.

Limitations

ACD/pKa Classic algorithm will refuse to predict the pKa for structures that:

  • Contain more than 255 atoms (note that the program refuses to predict pKa for some cyclic compounds having less than 255 atoms due to the fact that the program uses a cycle-breaking algorithm that increases the number of atoms)
  • Do not contain an ionization center
  • Contain atoms of non-typical valence
  • Contain atoms other than C, H, O, S, P, N, F, Cl, Br, I, Se, Si, Ge, Pb, Sn, As, and B
  • Contain two or more fragments
  • Contain more than 30 ionizable centers
  • Contain d-block or f-block metal atoms
  • Contain textual abbreviations which cannot be transformed to structural fragments.

Note: There certainly exist some structures that formally meet the aforementioned limitations, but cannot be calculated with the current algorithm.