proteusPy.DisulfideClass_Constructor

DisulfideBond Class Analysis Dictionary creation Author: Eric G. Suchanek, PhD. License: BSD Last Modification: 2025-01-22 00:02:43 -egs-

Disulfide Class creation and manipulation. Binary classes using the +/- formalism of Hogg et al. (Biochem, 2006, 45, 7429-7433), are created for all 32 possible classes from the Disulfides extracted. Classes are named per Hogg's convention. This approach is extended to create sixfold and eightfold classes based on the subdividing each dihedral angle chi1 - chi5 into 8 equal segments, effectively quantizing them.

  1"""
  2DisulfideBond Class Analysis Dictionary creation
  3Author: Eric G. Suchanek, PhD.
  4License: BSD
  5Last Modification: 2025-01-22 00:02:43 -egs-
  6
  7Disulfide Class creation and manipulation. Binary classes using the +/- formalism of Hogg et al. 
  8(Biochem, 2006, 45, 7429-7433), are created for all 32 possible classes from the Disulfides 
  9extracted. Classes are named per Hogg's convention. This approach is extended to create 
 10sixfold and eightfold classes based on the subdividing each dihedral angle chi1 - chi5 into 
 118 equal segments, effectively quantizing them.
 12"""
 13
 14# pylint: disable=C0301
 15# pylint: disable=C0103
 16
 17import itertools
 18import pickle
 19from io import StringIO
 20from pathlib import Path
 21
 22import numpy as np
 23import pandas as pd
 24
 25from proteusPy import __version__
 26from proteusPy.DisulfideList import DisulfideList
 27from proteusPy.logger_config import create_logger
 28from proteusPy.ProteusGlobals import (
 29    DATA_DIR,
 30    SS_CLASS_DEFINITIONS,
 31    SS_CONSENSUS_BIN_FILE,
 32    SS_CONSENSUS_OCT_FILE,
 33)
 34
 35_logger = create_logger(__name__)
 36_logger.setLevel("INFO")
 37
 38
 39class DisulfideClass_Constructor:
 40    r"""
 41    This Class manages structural classes for the disulfide bonds contained
 42    in the proteusPy disulfide database.
 43
 44    Class builds the internal dictionary mapping disulfides to class names.
 45
 46    Disulfide binary classes are defined using the ± formalism described by
 47    Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible
 48    binary sidechain torsional combinations. Classes are named per Schmidt's convention.
 49    The ``class_id`` represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$
 50    where *0* represents *negative* dihedral angle and *2* a *positive* angle.
 51
 52    |   class_id | SS_Classname   | FXN        |   count |   incidence |   percentage |
 53    |-----------:|:---------------|:-----------|--------:|------------:|-------------:|
 54    |      00000 | -LHSpiral      | UNK        |   40943 |  0.23359    |    23.359    |
 55    |      00002 | 00002          | UNK        |    9391 |  0.0535781  |     5.35781  |
 56    |      00020 | -LHHook        | UNK        |    4844 |  0.0276363  |     2.76363  |
 57    |      00022 | 00022          | UNK        |    2426 |  0.0138409  |     1.38409  |
 58    |      00200 | -RHStaple      | Allosteric |   16146 |  0.092117   |     9.2117   |
 59    |      00202 | 00202          | UNK        |    1396 |  0.00796454 |     0.796454 |
 60    |      00220 | 00220          | UNK        |    7238 |  0.0412946  |     4.12946  |
 61    |      00222 | 00222          | UNK        |    6658 |  0.0379856  |     3.79856  |
 62    |      02000 | 02000          | UNK        |    7104 |  0.0405301  |     4.05301  |
 63    |      02002 | 02002          | UNK        |    8044 |  0.0458931  |     4.58931  |
 64    |      02020 | -LHStaple      | UNK        |    3154 |  0.0179944  |     1.79944  |
 65    |      02022 | 02022          | UNK        |    1146 |  0.00653822 |     0.653822 |
 66    |      02200 | -RHHook        | UNK        |    7115 |  0.0405929  |     4.05929  |
 67    |      02202 | 02202          | UNK        |    1021 |  0.00582507 |     0.582507 |
 68    |      02220 | -RHSpiral      | UNK        |    8989 |  0.0512845  |     5.12845  |
 69    |      02222 | 02222          | UNK        |    7641 |  0.0435939  |     4.35939  |
 70    |      20000 | ±LHSpiral      | UNK        |    5007 |  0.0285662  |     2.85662  |
 71    |      20002 | +LHSpiral      | UNK        |    1611 |  0.00919117 |     0.919117 |
 72    |      20020 | ±LHHook        | UNK        |    1258 |  0.00717721 |     0.717721 |
 73    |      20022 | +LHHook        | UNK        |     823 |  0.00469542 |     0.469542 |
 74    |      20200 | ±RHStaple      | UNK        |     745 |  0.00425042 |     0.425042 |
 75    |      20202 | +RHStaple      | UNK        |     538 |  0.00306943 |     0.306943 |
 76    |      20220 | ±RHHook        | Catalytic  |    1907 |  0.0108799  |     1.08799  |
 77    |      20222 | 20222          | UNK        |    1159 |  0.00661239 |     0.661239 |
 78    |      22000 | -/+LHHook      | UNK        |    3652 |  0.0208356  |     2.08356  |
 79    |      22002 | 22002          | UNK        |    2052 |  0.0117072  |     1.17072  |
 80    |      22020 | ±LHStaple      | UNK        |    1791 |  0.0102181  |     1.02181  |
 81    |      22022 | +LHStaple      | UNK        |     579 |  0.00330334 |     0.330334 |
 82    |      22200 | -/+RHHook      | UNK        |    8169 |  0.0466062  |     4.66062  |
 83    |      22202 | +RHHook        | UNK        |     895 |  0.0051062  |     0.51062  |
 84    |      22220 | ±RHSpiral      | UNK        |    3581 |  0.0204305  |     2.04305  |
 85    |      22222 | +RHSpiral      | UNK        |    8254 |  0.0470912  |     4.70912  |
 86    """
 87
 88    def __init__(self, loader, verbose=True) -> None:
 89        self.verbose = verbose
 90        self.binaryclass_dict = {}
 91        self.binaryclass_df = None
 92        self.eightclass_df = None
 93        self.eightclass_dict = {}
 94        self.consensus_binary_list = None
 95        self.consensus_oct_list = None
 96
 97        if self.verbose:
 98            _logger.info(
 99                "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE
100            )
101        self.consensus_binary_list = self.load_consensus_file(oct=False)
102
103        if self.verbose:
104            _logger.info(
105                "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE
106            )
107        self.consensus_oct_list = self.load_consensus_file(oct=True)
108
109        self.build_classes(loader)
110
111    def __getitem__(self, item: str) -> np.ndarray:
112        """
113        Implements indexing against a class ID string.
114
115        Return an array of disulfide IDs given the input Class ID string.
116
117        :param item: The class ID string to index.
118        :type item: str
119        :return: An array of disulfide IDs corresponding to the class ID.
120        :rtype: np.ndarray
121        :raises ValueError: If an integer index is provided.
122        :raises DisulfideException: If the class ID is invalid.
123        """
124        disulfides = None
125
126        if isinstance(item, int):
127            raise ValueError("Integer indexing not supported. Use a string key.")
128
129        if isinstance(item, str):
130            disulfides = self.class_to_sslist(item)
131            return disulfides
132
133        return disulfides
134
135    def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList:
136        """Load the consensus file from the specified file."""
137
138        res = None
139        if oct:
140            fname = fpath / SS_CONSENSUS_OCT_FILE
141        else:
142            fname = fpath / SS_CONSENSUS_BIN_FILE
143
144        if not fname.exists():
145            _logger.error("Cannot find file %s", fname)
146            raise FileNotFoundError(f"Cannot find file {fname}")
147
148        with open(fname, "rb") as f:
149            res = pickle.load(f)
150        return res
151
152    def build_class_df(self, class_df, group_df):
153        """Build a new DataFrame from the input DataFrames."""
154        ss_id_col = group_df["ss_id"]
155        result_df = pd.concat([class_df, ss_id_col], axis=1)
156        return result_df
157
158    def class_to_sslist(self, clsid: str, base=8) -> np.ndarray:
159        """
160        Return the list of disulfides corresponding to the input `clsid`.
161        This list is a list of disulfide identifiers, not the Disulfide objects themselves.
162
163        :param clsid: The class name to extract. Must be a string
164        in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates
165        binary or octant classes, respectively.
166        :type clsid: str
167        :param base: The base class to use, 2 or 8. Default is 8.
168        :type base: int
169        :param verbose: If True, display progress bars, by default False.
170        :type verbose: bool
171        :return: The list of disulfide bonds from the class. NB: this is the list
172        of disulfide identifiers, not the Disulfide objects themselves.
173        :rtype: DisulfideList
174        :raises ValueError: If an invalid base value is provided.
175        :raises KeyError: If the clsid is not found in the dictionary.
176        """
177        cls = clsid[:5]
178
179        if not isinstance(clsid, str):
180            _logger.error("Invalid class ID: %s", clsid)
181            return np.array([])
182
183        match len(clsid):
184            case 6:
185                match clsid[-1]:
186                    case "b":
187                        eightorbin = self.binaryclass_dict
188                    case "o":
189                        eightorbin = self.eightclass_dict
190                    case _:
191                        _logger.error("Invalid class ID suffix: %s", clsid)
192                        return np.array([])
193
194            case 5:
195                match base:
196                    case 8:
197                        eightorbin = self.eightclass_dict
198                    case 2:
199                        eightorbin = self.binaryclass_dict
200                    case _:
201                        _logger.error("Invalid base: %d", base)
202                        return np.array([])
203            case _:
204                _logger.error("Invalid class ID length: %s", clsid)
205                return np.array([])
206
207        try:
208            ss_ids = eightorbin[cls]
209
210        except KeyError:
211            _logger.error("Cannot find key %s in SSBond DB", clsid)
212            return np.array([])
213
214        return ss_ids
215
216    def list_classes(self, base=2):
217        """
218        List the Disulfide structural classes.
219
220        :param self: The instance of the DisulfideClass_Constructor class.
221        :type self: DisulfideClass_Constructor
222        :param base: The base class to use, 2 or 8.
223        :type base: int
224        :return: None
225        :rtype: None
226        :raises ValueError: If an invalid base value is provided.
227        """
228        match base:
229            case 2:
230                for k, v in enumerate(self.binaryclass_dict):
231                    print(f"Class: |{k}|, |{v}|")
232            case 8:
233                for k, v in enumerate(self.eightclass_dict):
234                    print(f"Class: |{k}|, |{v}|")
235            case _:
236                raise ValueError("Invalid base. Must be 2 or 8.")
237
238    def concat_dataframes(self, df1, df2):
239        """
240        Concatenates columns from one data frame into the other
241        and returns the new result.
242
243        Parameters
244        ----------
245        df1 : pandas.DataFrame
246            The first data frame.
247        df2 : pandas.DataFrame
248            The second data frame.
249
250        Returns
251        -------
252        pandas.DataFrame
253            The concatenated data frame.
254
255        """
256        # Merge the data frames based on the 'SS_Classname' column
257        result = pd.merge(df1, df2, on="class_id")
258
259        return result
260
261    def binary_to_class(self, class_str: str, base: int = 8) -> list:
262        """
263        Convert a binary input string to a list of possible class strings based on the specified base.
264
265        Returns a list of all possible combinations of ordinal sections of a unit circle
266        divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise,
267        based on the sign of each angle in the input string.
268
269        :param class_str: A string of length 5, where each character represents the sign
270        of an angle in the range of -180-180 degrees.
271        :type class_str: str
272        :param base: The base class to use, 6 or 8.
273        :type base: int
274        :return: A list of strings of length 5, representing all possible class strings.
275        :rtype: list
276        :raises ValueError: If an invalid base value is provided.
277        """
278        match base:
279            case 6:
280                angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]}
281            case 8:
282                angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]}
283            case _:
284                raise ValueError("Invalid base value. Must be 6 or 8.")
285
286        class_lists = [angle_maps[char] for char in class_str]
287        class_combinations = itertools.product(*class_lists)
288        class_strings = ["".join(combination) for combination in class_combinations]
289        return class_strings
290
291    def build_classes(self, loader) -> None:
292        """
293        Build the internal structures needed for the binary and octant disulfide structural classes
294        based on dihedral angle rules.
295
296        :param loader: The DisulfideLoader object containing the data.
297        :type loader: DisulfideLoader
298        :return: None
299        :rtype: None
300        """
301
302        self.version = __version__
303
304        tors_df = loader.getTorsions()
305
306        if self.verbose:
307            _logger.info("Creating binary SS classes...")
308
309        grouped = self.create_binary_classes(tors_df)
310
311        class_df = pd.read_csv(
312            StringIO(SS_CLASS_DEFINITIONS),
313            dtype={
314                "class_id": "string",
315                "FXN": "string",
316                "SS_Classname": "string",
317            },
318        )
319        class_df["FXN"].str.strip()
320        class_df["SS_Classname"].str.strip()
321        class_df["class_id"].str.strip()
322
323        merged = self.concat_dataframes(class_df, grouped)
324        merged.drop(
325            columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"],
326            inplace=True,
327        )
328
329        classdict = dict(zip(merged["class_id"], merged["ss_id"]))
330        self.binaryclass_dict = classdict
331        self.binaryclass_df = merged.copy()
332
333        if self.verbose:
334            _logger.info("Creating eightfold SS classes...")
335
336        grouped_eightclass = self.create_classes(tors_df, 8)
337        self.eightclass_df = grouped_eightclass.copy()
338        self.eightclass_dict = dict(
339            zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"])
340        )
341
342        if self.verbose:
343            _logger.info("Initialization complete.")
344
345        return
346
347    def create_binary_classes(self, df) -> pd.DataFrame:
348        """
349        Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping.
350
351        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance',
352        'cb_distance', 'torsion_length', and 'energy'.
353        :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id'
354         is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that
355         grouping, and 'count' is the number of rows in that grouping.
356        """
357        # Create new columns with the sign of each chi column
358        chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"]
359        sign_columns = [col + "_s" for col in chi_columns]
360        df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1)
361
362        # Create a new column with the class ID for each row
363        class_id_column = "class_id"
364        df[class_id_column] = (df[sign_columns] + 1).apply(
365            lambda x: "".join(x.astype(str)), axis=1
366        )
367
368        # Group the DataFrame by the class ID and return the grouped data
369        grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index()
370        grouped["count"] = grouped["ss_id"].apply(len)
371        grouped["incidence"] = grouped["count"] / len(df)
372        grouped["percentage"] = grouped["incidence"] * 100
373
374        return grouped
375
376    def create_classes(self, df, base=8) -> pd.DataFrame:
377        """
378        Create a new DataFrame from the input with a 8-class encoding for input 'chi' values.
379
380        The function takes a pandas DataFrame containing the following columns:
381        'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance',
382        'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules:
383
384        1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi.
385        2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group,
386        the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the
387        percentage of incidence.
388
389        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5',
390                'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho'
391        :return: The grouped DataFrame with the added class column.
392        """
393
394        _df = pd.DataFrame()
395        if base == 6:
396            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
397                _df[col_name + "_t"] = df[col_name].apply(
398                    DisulfideClass_Constructor.get_sixth_quadrant
399                )
400        elif base == 8:
401            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
402                _df[col_name + "_t"] = df[col_name].apply(
403                    DisulfideClass_Constructor.get_eighth_quadrant
404                )
405        else:
406            raise ValueError("Base must be either 6 or 8")
407
408        df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg(
409            "".join, axis=1
410        )
411
412        grouped = df.groupby("class_id").agg({"ss_id": "unique"})
413        grouped["count"] = grouped["ss_id"].str.len()
414        grouped["incidence"] = grouped["count"] / len(df)
415        grouped["percentage"] = grouped["incidence"] * 100
416        grouped.reset_index(inplace=True)
417
418        return grouped
419
420    def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame:
421        """
422        Filter the specified class definitions by percentage.
423
424        :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output
425        :param base: An optional integer specifying the class type to filter, defaults to 8
426        :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff
427        :rtype: pandas.DataFrame
428        """
429
430        match base:
431            case 8:
432                df = self.eightclass_df
433            case 2:
434                df = self.binaryclass_df
435            case _:
436                raise ValueError("Invalid base. Must be 6 or 8.")
437
438        return df[df["percentage"] >= cutoff].copy()
439
440    @staticmethod
441    def get_binary_quadrant(angle_deg):
442        """
443        Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments.
444
445        :param angle_deg (float or array-like): The angle in degrees.
446
447        Returns:
448        :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to.
449        """
450        angle_deg = (
451            np.array(angle_deg) % 360
452        )  # Normalize the angle to the range [0, 360)
453
454        if np.isscalar(angle_deg):
455            if angle_deg >= 0 and angle_deg < 180:
456                return str(2)
457
458            if angle_deg >= 180 and angle_deg < 360:
459                return str(0)
460
461            raise ValueError(
462                "Invalid angle value: angle must be in the range [-360, 360)."
463            )
464
465        quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0")
466        return "".join(quadrants)
467
468    @staticmethod
469    def get_sixth_quadrant(angle_deg):
470        """
471        Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments.
472
473        :param angle_deg (float or array-like): The angle in degrees.
474
475        Returns:
476        :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to.
477        """
478        angle_deg = (
479            np.array(angle_deg) % 360
480        )  # Normalize the angle to the range [0, 360)
481
482        if np.isscalar(angle_deg):
483            if angle_deg >= 0 and angle_deg < 60:
484                return str(6)
485            elif angle_deg >= 60 and angle_deg < 120:
486                return str(5)
487            elif angle_deg >= 120 and angle_deg < 180:
488                return str(4)
489            elif angle_deg >= 180 and angle_deg < 240:
490                return str(3)
491            elif angle_deg >= 240 and angle_deg < 300:
492                return str(2)
493            elif angle_deg >= 300 and angle_deg < 360:
494                return str(1)
495            else:
496                raise ValueError(
497                    "Invalid angle value: angle must be in the range [-360, 360)."
498                )
499        else:
500            quadrants = np.empty(angle_deg.shape, dtype=str)
501            quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6"
502            quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5"
503            quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4"
504            quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3"
505            quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2"
506            quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1"
507            return "".join(quadrants)
508
509    @staticmethod
510    def get_eighth_quadrant(angle_deg):
511        """
512        Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments.
513
514        :param angle_deg (float or array-like): The angle in degrees.
515
516        Returns:
517        :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to.
518        """
519        angle_deg = (
520            np.array(angle_deg) % 360
521        )  # Normalize the angle to the range [0, 360)
522
523        if np.isscalar(angle_deg):
524            if angle_deg >= 0 and angle_deg < 45:
525                return str(8)
526            elif angle_deg >= 45 and angle_deg < 90:
527                return str(7)
528            elif angle_deg >= 90 and angle_deg < 135:
529                return str(6)
530            elif angle_deg >= 135 and angle_deg < 180:
531                return str(5)
532            elif angle_deg >= 180 and angle_deg < 225:
533                return str(4)
534            elif angle_deg >= 225 and angle_deg < 270:
535                return str(3)
536            elif angle_deg >= 270 and angle_deg < 315:
537                return str(2)
538            elif angle_deg >= 315 and angle_deg < 360:
539                return str(1)
540            else:
541                raise ValueError(
542                    "Invalid angle value: angle must be in the range [-360, 360)."
543                )
544        else:
545            quadrants = np.empty(angle_deg.shape, dtype=str)
546            quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8"
547            quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7"
548            quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6"
549            quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5"
550            quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4"
551            quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3"
552            quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2"
553            quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1"
554            return "".join(quadrants)
555
556    @staticmethod
557    def class_string_from_dihedral(*args, base=8) -> str:
558        """
559        Return the class string for a set of dihedral angles, given the base.
560
561        :param args: One or five dihedral angles.
562        :param base: The base class to use, 2, 6, or 8. Defaults to 8.
563        :return: The class string for the input dihedral angles.
564        :rtype: str
565        :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8.
566        """
567        if len(args) not in [1, 5]:
568            raise ValueError("You must enter either 1 or 5 dihedral angles.")
569
570        if base not in [2, 6, 8]:
571            raise ValueError("Invalid base. Must be 2, 6, or 8.")
572
573        angles = np.array(args).flatten()
574
575        if len(angles) == 1:
576            match base:
577                case 2:
578                    return DisulfideClass_Constructor.get_binary_quadrant(angles[0])
579                case 6:
580                    return DisulfideClass_Constructor.get_sixth_quadrant(angles[0])
581                case 8:
582                    return DisulfideClass_Constructor.get_eighth_quadrant(angles[0])
583                case _:
584                    raise ValueError("Invalid base. Must be 2, 6, or 8.")
585
586        elif len(angles) == 5:
587            match base:
588                case 2:
589                    return DisulfideClass_Constructor.get_binary_quadrant(angles)
590                case 6:
591                    return DisulfideClass_Constructor.get_sixth_quadrant(angles)
592                case 8:
593                    return DisulfideClass_Constructor.get_eighth_quadrant(angles)
594                case _:
595                    raise ValueError("Invalid base. Must be 2, 6, or 8.")
596
597    def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame:
598        """
599        Return the 'ss_id' value in the given DataFrame that corresponds to the
600        input 'cls' string in the class description
601        """
602        if base == 2:
603            df = self.binaryclass_df
604        elif base == 8:
605            df = self.eightclass_df
606        else:
607            raise ValueError("Invalid base. Must be 2 or 8.")
608
609        filtered_df = df[df["class_id"] == cls]
610
611        if len(filtered_df) == 0:
612            return None
613
614        if len(filtered_df) > 1:
615            raise ValueError(f"Multiple rows found for class_id '{cls}'")
616
617        return filtered_df.iloc[0]["ss_id"]
618
619    def class_to_binary(self, cls_str, base=8):
620        """
621        Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees
622        into a string of 5 characters, where each character is either '0' if the corresponding input character represents a
623        negative angle or '2' if it represents a positive angle.
624
625        :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees.
626        :param base (int): The base of the ordinal section (6 or 8).
627        :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle.
628        """
629        if base not in [6, 8]:
630            raise ValueError("Base must be either 6 or 8")
631
632        output_str = ""
633        for char in cls_str:
634            if base == 6:
635                if char in ["1", "2", "3"]:
636                    output_str += "2"
637                elif char in ["4", "5", "6"]:
638                    output_str += "0"
639            elif base == 8:
640                if char in ["1", "2", "3", "4"]:
641                    output_str += "2"
642                elif char in ["5", "6", "7", "8"]:
643                    output_str += "0"
644        return output_str
645
646    def get_class_df(self, base=8):
647        """
648        Get the Disulfide structural classes DataFrame.
649
650        :param base: The base class to use, either 2 or 8. Defaults to 8.
651        :type base: int
652        :return: A DataFrame containing the class_id, count, incidence, and percentage columns.
653        :rtype: pandas.DataFrame
654        :raises ValueError: If the base is not 2 or 8.
655        """
656        columns = ["class_id", "count", "incidence", "percentage"]
657        match base:
658            case 2:
659                class_df = self.binaryclass_df
660            case 8:
661                class_df = self.eightclass_df
662            case _:
663                raise ValueError("Invalid base. Must be 2, or 8.")
664
665        result_df = class_df[columns]
666        return result_df
667
668
669# class definition ends
670
671# end of file
class DisulfideClass_Constructor:
 40class DisulfideClass_Constructor:
 41    r"""
 42    This Class manages structural classes for the disulfide bonds contained
 43    in the proteusPy disulfide database.
 44
 45    Class builds the internal dictionary mapping disulfides to class names.
 46
 47    Disulfide binary classes are defined using the ± formalism described by
 48    Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible
 49    binary sidechain torsional combinations. Classes are named per Schmidt's convention.
 50    The ``class_id`` represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$
 51    where *0* represents *negative* dihedral angle and *2* a *positive* angle.
 52
 53    |   class_id | SS_Classname   | FXN        |   count |   incidence |   percentage |
 54    |-----------:|:---------------|:-----------|--------:|------------:|-------------:|
 55    |      00000 | -LHSpiral      | UNK        |   40943 |  0.23359    |    23.359    |
 56    |      00002 | 00002          | UNK        |    9391 |  0.0535781  |     5.35781  |
 57    |      00020 | -LHHook        | UNK        |    4844 |  0.0276363  |     2.76363  |
 58    |      00022 | 00022          | UNK        |    2426 |  0.0138409  |     1.38409  |
 59    |      00200 | -RHStaple      | Allosteric |   16146 |  0.092117   |     9.2117   |
 60    |      00202 | 00202          | UNK        |    1396 |  0.00796454 |     0.796454 |
 61    |      00220 | 00220          | UNK        |    7238 |  0.0412946  |     4.12946  |
 62    |      00222 | 00222          | UNK        |    6658 |  0.0379856  |     3.79856  |
 63    |      02000 | 02000          | UNK        |    7104 |  0.0405301  |     4.05301  |
 64    |      02002 | 02002          | UNK        |    8044 |  0.0458931  |     4.58931  |
 65    |      02020 | -LHStaple      | UNK        |    3154 |  0.0179944  |     1.79944  |
 66    |      02022 | 02022          | UNK        |    1146 |  0.00653822 |     0.653822 |
 67    |      02200 | -RHHook        | UNK        |    7115 |  0.0405929  |     4.05929  |
 68    |      02202 | 02202          | UNK        |    1021 |  0.00582507 |     0.582507 |
 69    |      02220 | -RHSpiral      | UNK        |    8989 |  0.0512845  |     5.12845  |
 70    |      02222 | 02222          | UNK        |    7641 |  0.0435939  |     4.35939  |
 71    |      20000 | ±LHSpiral      | UNK        |    5007 |  0.0285662  |     2.85662  |
 72    |      20002 | +LHSpiral      | UNK        |    1611 |  0.00919117 |     0.919117 |
 73    |      20020 | ±LHHook        | UNK        |    1258 |  0.00717721 |     0.717721 |
 74    |      20022 | +LHHook        | UNK        |     823 |  0.00469542 |     0.469542 |
 75    |      20200 | ±RHStaple      | UNK        |     745 |  0.00425042 |     0.425042 |
 76    |      20202 | +RHStaple      | UNK        |     538 |  0.00306943 |     0.306943 |
 77    |      20220 | ±RHHook        | Catalytic  |    1907 |  0.0108799  |     1.08799  |
 78    |      20222 | 20222          | UNK        |    1159 |  0.00661239 |     0.661239 |
 79    |      22000 | -/+LHHook      | UNK        |    3652 |  0.0208356  |     2.08356  |
 80    |      22002 | 22002          | UNK        |    2052 |  0.0117072  |     1.17072  |
 81    |      22020 | ±LHStaple      | UNK        |    1791 |  0.0102181  |     1.02181  |
 82    |      22022 | +LHStaple      | UNK        |     579 |  0.00330334 |     0.330334 |
 83    |      22200 | -/+RHHook      | UNK        |    8169 |  0.0466062  |     4.66062  |
 84    |      22202 | +RHHook        | UNK        |     895 |  0.0051062  |     0.51062  |
 85    |      22220 | ±RHSpiral      | UNK        |    3581 |  0.0204305  |     2.04305  |
 86    |      22222 | +RHSpiral      | UNK        |    8254 |  0.0470912  |     4.70912  |
 87    """
 88
 89    def __init__(self, loader, verbose=True) -> None:
 90        self.verbose = verbose
 91        self.binaryclass_dict = {}
 92        self.binaryclass_df = None
 93        self.eightclass_df = None
 94        self.eightclass_dict = {}
 95        self.consensus_binary_list = None
 96        self.consensus_oct_list = None
 97
 98        if self.verbose:
 99            _logger.info(
100                "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE
101            )
102        self.consensus_binary_list = self.load_consensus_file(oct=False)
103
104        if self.verbose:
105            _logger.info(
106                "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE
107            )
108        self.consensus_oct_list = self.load_consensus_file(oct=True)
109
110        self.build_classes(loader)
111
112    def __getitem__(self, item: str) -> np.ndarray:
113        """
114        Implements indexing against a class ID string.
115
116        Return an array of disulfide IDs given the input Class ID string.
117
118        :param item: The class ID string to index.
119        :type item: str
120        :return: An array of disulfide IDs corresponding to the class ID.
121        :rtype: np.ndarray
122        :raises ValueError: If an integer index is provided.
123        :raises DisulfideException: If the class ID is invalid.
124        """
125        disulfides = None
126
127        if isinstance(item, int):
128            raise ValueError("Integer indexing not supported. Use a string key.")
129
130        if isinstance(item, str):
131            disulfides = self.class_to_sslist(item)
132            return disulfides
133
134        return disulfides
135
136    def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList:
137        """Load the consensus file from the specified file."""
138
139        res = None
140        if oct:
141            fname = fpath / SS_CONSENSUS_OCT_FILE
142        else:
143            fname = fpath / SS_CONSENSUS_BIN_FILE
144
145        if not fname.exists():
146            _logger.error("Cannot find file %s", fname)
147            raise FileNotFoundError(f"Cannot find file {fname}")
148
149        with open(fname, "rb") as f:
150            res = pickle.load(f)
151        return res
152
153    def build_class_df(self, class_df, group_df):
154        """Build a new DataFrame from the input DataFrames."""
155        ss_id_col = group_df["ss_id"]
156        result_df = pd.concat([class_df, ss_id_col], axis=1)
157        return result_df
158
159    def class_to_sslist(self, clsid: str, base=8) -> np.ndarray:
160        """
161        Return the list of disulfides corresponding to the input `clsid`.
162        This list is a list of disulfide identifiers, not the Disulfide objects themselves.
163
164        :param clsid: The class name to extract. Must be a string
165        in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates
166        binary or octant classes, respectively.
167        :type clsid: str
168        :param base: The base class to use, 2 or 8. Default is 8.
169        :type base: int
170        :param verbose: If True, display progress bars, by default False.
171        :type verbose: bool
172        :return: The list of disulfide bonds from the class. NB: this is the list
173        of disulfide identifiers, not the Disulfide objects themselves.
174        :rtype: DisulfideList
175        :raises ValueError: If an invalid base value is provided.
176        :raises KeyError: If the clsid is not found in the dictionary.
177        """
178        cls = clsid[:5]
179
180        if not isinstance(clsid, str):
181            _logger.error("Invalid class ID: %s", clsid)
182            return np.array([])
183
184        match len(clsid):
185            case 6:
186                match clsid[-1]:
187                    case "b":
188                        eightorbin = self.binaryclass_dict
189                    case "o":
190                        eightorbin = self.eightclass_dict
191                    case _:
192                        _logger.error("Invalid class ID suffix: %s", clsid)
193                        return np.array([])
194
195            case 5:
196                match base:
197                    case 8:
198                        eightorbin = self.eightclass_dict
199                    case 2:
200                        eightorbin = self.binaryclass_dict
201                    case _:
202                        _logger.error("Invalid base: %d", base)
203                        return np.array([])
204            case _:
205                _logger.error("Invalid class ID length: %s", clsid)
206                return np.array([])
207
208        try:
209            ss_ids = eightorbin[cls]
210
211        except KeyError:
212            _logger.error("Cannot find key %s in SSBond DB", clsid)
213            return np.array([])
214
215        return ss_ids
216
217    def list_classes(self, base=2):
218        """
219        List the Disulfide structural classes.
220
221        :param self: The instance of the DisulfideClass_Constructor class.
222        :type self: DisulfideClass_Constructor
223        :param base: The base class to use, 2 or 8.
224        :type base: int
225        :return: None
226        :rtype: None
227        :raises ValueError: If an invalid base value is provided.
228        """
229        match base:
230            case 2:
231                for k, v in enumerate(self.binaryclass_dict):
232                    print(f"Class: |{k}|, |{v}|")
233            case 8:
234                for k, v in enumerate(self.eightclass_dict):
235                    print(f"Class: |{k}|, |{v}|")
236            case _:
237                raise ValueError("Invalid base. Must be 2 or 8.")
238
239    def concat_dataframes(self, df1, df2):
240        """
241        Concatenates columns from one data frame into the other
242        and returns the new result.
243
244        Parameters
245        ----------
246        df1 : pandas.DataFrame
247            The first data frame.
248        df2 : pandas.DataFrame
249            The second data frame.
250
251        Returns
252        -------
253        pandas.DataFrame
254            The concatenated data frame.
255
256        """
257        # Merge the data frames based on the 'SS_Classname' column
258        result = pd.merge(df1, df2, on="class_id")
259
260        return result
261
262    def binary_to_class(self, class_str: str, base: int = 8) -> list:
263        """
264        Convert a binary input string to a list of possible class strings based on the specified base.
265
266        Returns a list of all possible combinations of ordinal sections of a unit circle
267        divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise,
268        based on the sign of each angle in the input string.
269
270        :param class_str: A string of length 5, where each character represents the sign
271        of an angle in the range of -180-180 degrees.
272        :type class_str: str
273        :param base: The base class to use, 6 or 8.
274        :type base: int
275        :return: A list of strings of length 5, representing all possible class strings.
276        :rtype: list
277        :raises ValueError: If an invalid base value is provided.
278        """
279        match base:
280            case 6:
281                angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]}
282            case 8:
283                angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]}
284            case _:
285                raise ValueError("Invalid base value. Must be 6 or 8.")
286
287        class_lists = [angle_maps[char] for char in class_str]
288        class_combinations = itertools.product(*class_lists)
289        class_strings = ["".join(combination) for combination in class_combinations]
290        return class_strings
291
292    def build_classes(self, loader) -> None:
293        """
294        Build the internal structures needed for the binary and octant disulfide structural classes
295        based on dihedral angle rules.
296
297        :param loader: The DisulfideLoader object containing the data.
298        :type loader: DisulfideLoader
299        :return: None
300        :rtype: None
301        """
302
303        self.version = __version__
304
305        tors_df = loader.getTorsions()
306
307        if self.verbose:
308            _logger.info("Creating binary SS classes...")
309
310        grouped = self.create_binary_classes(tors_df)
311
312        class_df = pd.read_csv(
313            StringIO(SS_CLASS_DEFINITIONS),
314            dtype={
315                "class_id": "string",
316                "FXN": "string",
317                "SS_Classname": "string",
318            },
319        )
320        class_df["FXN"].str.strip()
321        class_df["SS_Classname"].str.strip()
322        class_df["class_id"].str.strip()
323
324        merged = self.concat_dataframes(class_df, grouped)
325        merged.drop(
326            columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"],
327            inplace=True,
328        )
329
330        classdict = dict(zip(merged["class_id"], merged["ss_id"]))
331        self.binaryclass_dict = classdict
332        self.binaryclass_df = merged.copy()
333
334        if self.verbose:
335            _logger.info("Creating eightfold SS classes...")
336
337        grouped_eightclass = self.create_classes(tors_df, 8)
338        self.eightclass_df = grouped_eightclass.copy()
339        self.eightclass_dict = dict(
340            zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"])
341        )
342
343        if self.verbose:
344            _logger.info("Initialization complete.")
345
346        return
347
348    def create_binary_classes(self, df) -> pd.DataFrame:
349        """
350        Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping.
351
352        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance',
353        'cb_distance', 'torsion_length', and 'energy'.
354        :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id'
355         is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that
356         grouping, and 'count' is the number of rows in that grouping.
357        """
358        # Create new columns with the sign of each chi column
359        chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"]
360        sign_columns = [col + "_s" for col in chi_columns]
361        df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1)
362
363        # Create a new column with the class ID for each row
364        class_id_column = "class_id"
365        df[class_id_column] = (df[sign_columns] + 1).apply(
366            lambda x: "".join(x.astype(str)), axis=1
367        )
368
369        # Group the DataFrame by the class ID and return the grouped data
370        grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index()
371        grouped["count"] = grouped["ss_id"].apply(len)
372        grouped["incidence"] = grouped["count"] / len(df)
373        grouped["percentage"] = grouped["incidence"] * 100
374
375        return grouped
376
377    def create_classes(self, df, base=8) -> pd.DataFrame:
378        """
379        Create a new DataFrame from the input with a 8-class encoding for input 'chi' values.
380
381        The function takes a pandas DataFrame containing the following columns:
382        'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance',
383        'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules:
384
385        1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi.
386        2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group,
387        the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the
388        percentage of incidence.
389
390        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5',
391                'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho'
392        :return: The grouped DataFrame with the added class column.
393        """
394
395        _df = pd.DataFrame()
396        if base == 6:
397            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
398                _df[col_name + "_t"] = df[col_name].apply(
399                    DisulfideClass_Constructor.get_sixth_quadrant
400                )
401        elif base == 8:
402            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
403                _df[col_name + "_t"] = df[col_name].apply(
404                    DisulfideClass_Constructor.get_eighth_quadrant
405                )
406        else:
407            raise ValueError("Base must be either 6 or 8")
408
409        df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg(
410            "".join, axis=1
411        )
412
413        grouped = df.groupby("class_id").agg({"ss_id": "unique"})
414        grouped["count"] = grouped["ss_id"].str.len()
415        grouped["incidence"] = grouped["count"] / len(df)
416        grouped["percentage"] = grouped["incidence"] * 100
417        grouped.reset_index(inplace=True)
418
419        return grouped
420
421    def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame:
422        """
423        Filter the specified class definitions by percentage.
424
425        :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output
426        :param base: An optional integer specifying the class type to filter, defaults to 8
427        :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff
428        :rtype: pandas.DataFrame
429        """
430
431        match base:
432            case 8:
433                df = self.eightclass_df
434            case 2:
435                df = self.binaryclass_df
436            case _:
437                raise ValueError("Invalid base. Must be 6 or 8.")
438
439        return df[df["percentage"] >= cutoff].copy()
440
441    @staticmethod
442    def get_binary_quadrant(angle_deg):
443        """
444        Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments.
445
446        :param angle_deg (float or array-like): The angle in degrees.
447
448        Returns:
449        :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to.
450        """
451        angle_deg = (
452            np.array(angle_deg) % 360
453        )  # Normalize the angle to the range [0, 360)
454
455        if np.isscalar(angle_deg):
456            if angle_deg >= 0 and angle_deg < 180:
457                return str(2)
458
459            if angle_deg >= 180 and angle_deg < 360:
460                return str(0)
461
462            raise ValueError(
463                "Invalid angle value: angle must be in the range [-360, 360)."
464            )
465
466        quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0")
467        return "".join(quadrants)
468
469    @staticmethod
470    def get_sixth_quadrant(angle_deg):
471        """
472        Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments.
473
474        :param angle_deg (float or array-like): The angle in degrees.
475
476        Returns:
477        :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to.
478        """
479        angle_deg = (
480            np.array(angle_deg) % 360
481        )  # Normalize the angle to the range [0, 360)
482
483        if np.isscalar(angle_deg):
484            if angle_deg >= 0 and angle_deg < 60:
485                return str(6)
486            elif angle_deg >= 60 and angle_deg < 120:
487                return str(5)
488            elif angle_deg >= 120 and angle_deg < 180:
489                return str(4)
490            elif angle_deg >= 180 and angle_deg < 240:
491                return str(3)
492            elif angle_deg >= 240 and angle_deg < 300:
493                return str(2)
494            elif angle_deg >= 300 and angle_deg < 360:
495                return str(1)
496            else:
497                raise ValueError(
498                    "Invalid angle value: angle must be in the range [-360, 360)."
499                )
500        else:
501            quadrants = np.empty(angle_deg.shape, dtype=str)
502            quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6"
503            quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5"
504            quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4"
505            quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3"
506            quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2"
507            quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1"
508            return "".join(quadrants)
509
510    @staticmethod
511    def get_eighth_quadrant(angle_deg):
512        """
513        Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments.
514
515        :param angle_deg (float or array-like): The angle in degrees.
516
517        Returns:
518        :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to.
519        """
520        angle_deg = (
521            np.array(angle_deg) % 360
522        )  # Normalize the angle to the range [0, 360)
523
524        if np.isscalar(angle_deg):
525            if angle_deg >= 0 and angle_deg < 45:
526                return str(8)
527            elif angle_deg >= 45 and angle_deg < 90:
528                return str(7)
529            elif angle_deg >= 90 and angle_deg < 135:
530                return str(6)
531            elif angle_deg >= 135 and angle_deg < 180:
532                return str(5)
533            elif angle_deg >= 180 and angle_deg < 225:
534                return str(4)
535            elif angle_deg >= 225 and angle_deg < 270:
536                return str(3)
537            elif angle_deg >= 270 and angle_deg < 315:
538                return str(2)
539            elif angle_deg >= 315 and angle_deg < 360:
540                return str(1)
541            else:
542                raise ValueError(
543                    "Invalid angle value: angle must be in the range [-360, 360)."
544                )
545        else:
546            quadrants = np.empty(angle_deg.shape, dtype=str)
547            quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8"
548            quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7"
549            quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6"
550            quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5"
551            quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4"
552            quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3"
553            quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2"
554            quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1"
555            return "".join(quadrants)
556
557    @staticmethod
558    def class_string_from_dihedral(*args, base=8) -> str:
559        """
560        Return the class string for a set of dihedral angles, given the base.
561
562        :param args: One or five dihedral angles.
563        :param base: The base class to use, 2, 6, or 8. Defaults to 8.
564        :return: The class string for the input dihedral angles.
565        :rtype: str
566        :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8.
567        """
568        if len(args) not in [1, 5]:
569            raise ValueError("You must enter either 1 or 5 dihedral angles.")
570
571        if base not in [2, 6, 8]:
572            raise ValueError("Invalid base. Must be 2, 6, or 8.")
573
574        angles = np.array(args).flatten()
575
576        if len(angles) == 1:
577            match base:
578                case 2:
579                    return DisulfideClass_Constructor.get_binary_quadrant(angles[0])
580                case 6:
581                    return DisulfideClass_Constructor.get_sixth_quadrant(angles[0])
582                case 8:
583                    return DisulfideClass_Constructor.get_eighth_quadrant(angles[0])
584                case _:
585                    raise ValueError("Invalid base. Must be 2, 6, or 8.")
586
587        elif len(angles) == 5:
588            match base:
589                case 2:
590                    return DisulfideClass_Constructor.get_binary_quadrant(angles)
591                case 6:
592                    return DisulfideClass_Constructor.get_sixth_quadrant(angles)
593                case 8:
594                    return DisulfideClass_Constructor.get_eighth_quadrant(angles)
595                case _:
596                    raise ValueError("Invalid base. Must be 2, 6, or 8.")
597
598    def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame:
599        """
600        Return the 'ss_id' value in the given DataFrame that corresponds to the
601        input 'cls' string in the class description
602        """
603        if base == 2:
604            df = self.binaryclass_df
605        elif base == 8:
606            df = self.eightclass_df
607        else:
608            raise ValueError("Invalid base. Must be 2 or 8.")
609
610        filtered_df = df[df["class_id"] == cls]
611
612        if len(filtered_df) == 0:
613            return None
614
615        if len(filtered_df) > 1:
616            raise ValueError(f"Multiple rows found for class_id '{cls}'")
617
618        return filtered_df.iloc[0]["ss_id"]
619
620    def class_to_binary(self, cls_str, base=8):
621        """
622        Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees
623        into a string of 5 characters, where each character is either '0' if the corresponding input character represents a
624        negative angle or '2' if it represents a positive angle.
625
626        :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees.
627        :param base (int): The base of the ordinal section (6 or 8).
628        :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle.
629        """
630        if base not in [6, 8]:
631            raise ValueError("Base must be either 6 or 8")
632
633        output_str = ""
634        for char in cls_str:
635            if base == 6:
636                if char in ["1", "2", "3"]:
637                    output_str += "2"
638                elif char in ["4", "5", "6"]:
639                    output_str += "0"
640            elif base == 8:
641                if char in ["1", "2", "3", "4"]:
642                    output_str += "2"
643                elif char in ["5", "6", "7", "8"]:
644                    output_str += "0"
645        return output_str
646
647    def get_class_df(self, base=8):
648        """
649        Get the Disulfide structural classes DataFrame.
650
651        :param base: The base class to use, either 2 or 8. Defaults to 8.
652        :type base: int
653        :return: A DataFrame containing the class_id, count, incidence, and percentage columns.
654        :rtype: pandas.DataFrame
655        :raises ValueError: If the base is not 2 or 8.
656        """
657        columns = ["class_id", "count", "incidence", "percentage"]
658        match base:
659            case 2:
660                class_df = self.binaryclass_df
661            case 8:
662                class_df = self.eightclass_df
663            case _:
664                raise ValueError("Invalid base. Must be 2, or 8.")
665
666        result_df = class_df[columns]
667        return result_df

This Class manages structural classes for the disulfide bonds contained in the proteusPy disulfide database.

Class builds the internal dictionary mapping disulfides to class names.

Disulfide binary classes are defined using the ± formalism described by Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible binary sidechain torsional combinations. Classes are named per Schmidt's convention. The class_id represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$ where 0 represents negative dihedral angle and 2 a positive angle.

class_id SS_Classname FXN count incidence percentage
00000 -LHSpiral UNK 40943 0.23359 23.359
00002 00002 UNK 9391 0.0535781 5.35781
00020 -LHHook UNK 4844 0.0276363 2.76363
00022 00022 UNK 2426 0.0138409 1.38409
00200 -RHStaple Allosteric 16146 0.092117 9.2117
00202 00202 UNK 1396 0.00796454 0.796454
00220 00220 UNK 7238 0.0412946 4.12946
00222 00222 UNK 6658 0.0379856 3.79856
02000 02000 UNK 7104 0.0405301 4.05301
02002 02002 UNK 8044 0.0458931 4.58931
02020 -LHStaple UNK 3154 0.0179944 1.79944
02022 02022 UNK 1146 0.00653822 0.653822
02200 -RHHook UNK 7115 0.0405929 4.05929
02202 02202 UNK 1021 0.00582507 0.582507
02220 -RHSpiral UNK 8989 0.0512845 5.12845
02222 02222 UNK 7641 0.0435939 4.35939
20000 ±LHSpiral UNK 5007 0.0285662 2.85662
20002 +LHSpiral UNK 1611 0.00919117 0.919117
20020 ±LHHook UNK 1258 0.00717721 0.717721
20022 +LHHook UNK 823 0.00469542 0.469542
20200 ±RHStaple UNK 745 0.00425042 0.425042
20202 +RHStaple UNK 538 0.00306943 0.306943
20220 ±RHHook Catalytic 1907 0.0108799 1.08799
20222 20222 UNK 1159 0.00661239 0.661239
22000 -/+LHHook UNK 3652 0.0208356 2.08356
22002 22002 UNK 2052 0.0117072 1.17072
22020 ±LHStaple UNK 1791 0.0102181 1.02181
22022 +LHStaple UNK 579 0.00330334 0.330334
22200 -/+RHHook UNK 8169 0.0466062 4.66062
22202 +RHHook UNK 895 0.0051062 0.51062
22220 ±RHSpiral UNK 3581 0.0204305 2.04305
22222 +RHSpiral UNK 8254 0.0470912 4.70912
DisulfideClass_Constructor(loader, verbose=True)
 89    def __init__(self, loader, verbose=True) -> None:
 90        self.verbose = verbose
 91        self.binaryclass_dict = {}
 92        self.binaryclass_df = None
 93        self.eightclass_df = None
 94        self.eightclass_dict = {}
 95        self.consensus_binary_list = None
 96        self.consensus_oct_list = None
 97
 98        if self.verbose:
 99            _logger.info(
100                "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE
101            )
102        self.consensus_binary_list = self.load_consensus_file(oct=False)
103
104        if self.verbose:
105            _logger.info(
106                "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE
107            )
108        self.consensus_oct_list = self.load_consensus_file(oct=True)
109
110        self.build_classes(loader)
verbose
binaryclass_dict
binaryclass_df
eightclass_df
eightclass_dict
consensus_binary_list
consensus_oct_list
def load_consensus_file( self, fpath=PosixPath('/Users/egs/repos/proteusPy/proteusPy/data'), oct=True) -> proteusPy.DisulfideList.DisulfideList:
136    def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList:
137        """Load the consensus file from the specified file."""
138
139        res = None
140        if oct:
141            fname = fpath / SS_CONSENSUS_OCT_FILE
142        else:
143            fname = fpath / SS_CONSENSUS_BIN_FILE
144
145        if not fname.exists():
146            _logger.error("Cannot find file %s", fname)
147            raise FileNotFoundError(f"Cannot find file {fname}")
148
149        with open(fname, "rb") as f:
150            res = pickle.load(f)
151        return res

Load the consensus file from the specified file.

def build_class_df(self, class_df, group_df):
153    def build_class_df(self, class_df, group_df):
154        """Build a new DataFrame from the input DataFrames."""
155        ss_id_col = group_df["ss_id"]
156        result_df = pd.concat([class_df, ss_id_col], axis=1)
157        return result_df

Build a new DataFrame from the input DataFrames.

def class_to_sslist(self, clsid: str, base=8) -> numpy.ndarray:
159    def class_to_sslist(self, clsid: str, base=8) -> np.ndarray:
160        """
161        Return the list of disulfides corresponding to the input `clsid`.
162        This list is a list of disulfide identifiers, not the Disulfide objects themselves.
163
164        :param clsid: The class name to extract. Must be a string
165        in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates
166        binary or octant classes, respectively.
167        :type clsid: str
168        :param base: The base class to use, 2 or 8. Default is 8.
169        :type base: int
170        :param verbose: If True, display progress bars, by default False.
171        :type verbose: bool
172        :return: The list of disulfide bonds from the class. NB: this is the list
173        of disulfide identifiers, not the Disulfide objects themselves.
174        :rtype: DisulfideList
175        :raises ValueError: If an invalid base value is provided.
176        :raises KeyError: If the clsid is not found in the dictionary.
177        """
178        cls = clsid[:5]
179
180        if not isinstance(clsid, str):
181            _logger.error("Invalid class ID: %s", clsid)
182            return np.array([])
183
184        match len(clsid):
185            case 6:
186                match clsid[-1]:
187                    case "b":
188                        eightorbin = self.binaryclass_dict
189                    case "o":
190                        eightorbin = self.eightclass_dict
191                    case _:
192                        _logger.error("Invalid class ID suffix: %s", clsid)
193                        return np.array([])
194
195            case 5:
196                match base:
197                    case 8:
198                        eightorbin = self.eightclass_dict
199                    case 2:
200                        eightorbin = self.binaryclass_dict
201                    case _:
202                        _logger.error("Invalid base: %d", base)
203                        return np.array([])
204            case _:
205                _logger.error("Invalid class ID length: %s", clsid)
206                return np.array([])
207
208        try:
209            ss_ids = eightorbin[cls]
210
211        except KeyError:
212            _logger.error("Cannot find key %s in SSBond DB", clsid)
213            return np.array([])
214
215        return ss_ids

Return the list of disulfides corresponding to the input clsid. This list is a list of disulfide identifiers, not the Disulfide objects themselves.

Parameters
  • clsid: The class name to extract. Must be a string in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates binary or octant classes, respectively.
  • base: The base class to use, 2 or 8. Default is 8.
  • verbose: If True, display progress bars, by default False.
Returns

The list of disulfide bonds from the class. NB: this is the list of disulfide identifiers, not the Disulfide objects themselves.

Raises
  • ValueError: If an invalid base value is provided.
  • KeyError: If the clsid is not found in the dictionary.
def list_classes(self, base=2):
217    def list_classes(self, base=2):
218        """
219        List the Disulfide structural classes.
220
221        :param self: The instance of the DisulfideClass_Constructor class.
222        :type self: DisulfideClass_Constructor
223        :param base: The base class to use, 2 or 8.
224        :type base: int
225        :return: None
226        :rtype: None
227        :raises ValueError: If an invalid base value is provided.
228        """
229        match base:
230            case 2:
231                for k, v in enumerate(self.binaryclass_dict):
232                    print(f"Class: |{k}|, |{v}|")
233            case 8:
234                for k, v in enumerate(self.eightclass_dict):
235                    print(f"Class: |{k}|, |{v}|")
236            case _:
237                raise ValueError("Invalid base. Must be 2 or 8.")

List the Disulfide structural classes.

Parameters
  • self: The instance of the DisulfideClass_Constructor class.
  • base: The base class to use, 2 or 8.
Returns

None

Raises
  • ValueError: If an invalid base value is provided.
def concat_dataframes(self, df1, df2):
239    def concat_dataframes(self, df1, df2):
240        """
241        Concatenates columns from one data frame into the other
242        and returns the new result.
243
244        Parameters
245        ----------
246        df1 : pandas.DataFrame
247            The first data frame.
248        df2 : pandas.DataFrame
249            The second data frame.
250
251        Returns
252        -------
253        pandas.DataFrame
254            The concatenated data frame.
255
256        """
257        # Merge the data frames based on the 'SS_Classname' column
258        result = pd.merge(df1, df2, on="class_id")
259
260        return result

Concatenates columns from one data frame into the other and returns the new result.

Parameters

df1 : pandas.DataFrame The first data frame. df2 : pandas.DataFrame The second data frame.

Returns

pandas.DataFrame The concatenated data frame.

def binary_to_class(self, class_str: str, base: int = 8) -> list:
262    def binary_to_class(self, class_str: str, base: int = 8) -> list:
263        """
264        Convert a binary input string to a list of possible class strings based on the specified base.
265
266        Returns a list of all possible combinations of ordinal sections of a unit circle
267        divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise,
268        based on the sign of each angle in the input string.
269
270        :param class_str: A string of length 5, where each character represents the sign
271        of an angle in the range of -180-180 degrees.
272        :type class_str: str
273        :param base: The base class to use, 6 or 8.
274        :type base: int
275        :return: A list of strings of length 5, representing all possible class strings.
276        :rtype: list
277        :raises ValueError: If an invalid base value is provided.
278        """
279        match base:
280            case 6:
281                angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]}
282            case 8:
283                angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]}
284            case _:
285                raise ValueError("Invalid base value. Must be 6 or 8.")
286
287        class_lists = [angle_maps[char] for char in class_str]
288        class_combinations = itertools.product(*class_lists)
289        class_strings = ["".join(combination) for combination in class_combinations]
290        return class_strings

Convert a binary input string to a list of possible class strings based on the specified base.

Returns a list of all possible combinations of ordinal sections of a unit circle divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise, based on the sign of each angle in the input string.

Parameters
  • class_str: A string of length 5, where each character represents the sign of an angle in the range of -180-180 degrees.
  • base: The base class to use, 6 or 8.
Returns

A list of strings of length 5, representing all possible class strings.

Raises
  • ValueError: If an invalid base value is provided.
def build_classes(self, loader) -> None:
292    def build_classes(self, loader) -> None:
293        """
294        Build the internal structures needed for the binary and octant disulfide structural classes
295        based on dihedral angle rules.
296
297        :param loader: The DisulfideLoader object containing the data.
298        :type loader: DisulfideLoader
299        :return: None
300        :rtype: None
301        """
302
303        self.version = __version__
304
305        tors_df = loader.getTorsions()
306
307        if self.verbose:
308            _logger.info("Creating binary SS classes...")
309
310        grouped = self.create_binary_classes(tors_df)
311
312        class_df = pd.read_csv(
313            StringIO(SS_CLASS_DEFINITIONS),
314            dtype={
315                "class_id": "string",
316                "FXN": "string",
317                "SS_Classname": "string",
318            },
319        )
320        class_df["FXN"].str.strip()
321        class_df["SS_Classname"].str.strip()
322        class_df["class_id"].str.strip()
323
324        merged = self.concat_dataframes(class_df, grouped)
325        merged.drop(
326            columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"],
327            inplace=True,
328        )
329
330        classdict = dict(zip(merged["class_id"], merged["ss_id"]))
331        self.binaryclass_dict = classdict
332        self.binaryclass_df = merged.copy()
333
334        if self.verbose:
335            _logger.info("Creating eightfold SS classes...")
336
337        grouped_eightclass = self.create_classes(tors_df, 8)
338        self.eightclass_df = grouped_eightclass.copy()
339        self.eightclass_dict = dict(
340            zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"])
341        )
342
343        if self.verbose:
344            _logger.info("Initialization complete.")
345
346        return

Build the internal structures needed for the binary and octant disulfide structural classes based on dihedral angle rules.

Parameters
  • loader: The DisulfideLoader object containing the data.
Returns

None

def create_binary_classes(self, df) -> pandas.core.frame.DataFrame:
348    def create_binary_classes(self, df) -> pd.DataFrame:
349        """
350        Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping.
351
352        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance',
353        'cb_distance', 'torsion_length', and 'energy'.
354        :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id'
355         is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that
356         grouping, and 'count' is the number of rows in that grouping.
357        """
358        # Create new columns with the sign of each chi column
359        chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"]
360        sign_columns = [col + "_s" for col in chi_columns]
361        df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1)
362
363        # Create a new column with the class ID for each row
364        class_id_column = "class_id"
365        df[class_id_column] = (df[sign_columns] + 1).apply(
366            lambda x: "".join(x.astype(str)), axis=1
367        )
368
369        # Group the DataFrame by the class ID and return the grouped data
370        grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index()
371        grouped["count"] = grouped["ss_id"].apply(len)
372        grouped["incidence"] = grouped["count"] / len(df)
373        grouped["percentage"] = grouped["incidence"] * 100
374
375        return grouped

Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping.

Parameters
  • df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', and 'energy'.
Returns

A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id' is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that grouping, and 'count' is the number of rows in that grouping.

def create_classes(self, df, base=8) -> pandas.core.frame.DataFrame:
377    def create_classes(self, df, base=8) -> pd.DataFrame:
378        """
379        Create a new DataFrame from the input with a 8-class encoding for input 'chi' values.
380
381        The function takes a pandas DataFrame containing the following columns:
382        'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance',
383        'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules:
384
385        1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi.
386        2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group,
387        the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the
388        percentage of incidence.
389
390        :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5',
391                'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho'
392        :return: The grouped DataFrame with the added class column.
393        """
394
395        _df = pd.DataFrame()
396        if base == 6:
397            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
398                _df[col_name + "_t"] = df[col_name].apply(
399                    DisulfideClass_Constructor.get_sixth_quadrant
400                )
401        elif base == 8:
402            for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]:
403                _df[col_name + "_t"] = df[col_name].apply(
404                    DisulfideClass_Constructor.get_eighth_quadrant
405                )
406        else:
407            raise ValueError("Base must be either 6 or 8")
408
409        df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg(
410            "".join, axis=1
411        )
412
413        grouped = df.groupby("class_id").agg({"ss_id": "unique"})
414        grouped["count"] = grouped["ss_id"].str.len()
415        grouped["incidence"] = grouped["count"] / len(df)
416        grouped["percentage"] = grouped["incidence"] * 100
417        grouped.reset_index(inplace=True)
418
419        return grouped

Create a new DataFrame from the input with a 8-class encoding for input 'chi' values.

The function takes a pandas DataFrame containing the following columns: 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules:

  1. A new column named class_id is added, which is the concatenation of the individual class IDs per Chi.
  2. The DataFrame is grouped by the class_id column, and a new DataFrame is returned that shows the unique ss_id values for each group, the count of unique ss_id values, the incidence of each group as a proportion of the total DataFrame, and the percentage of incidence.
Parameters
  • df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho'
Returns

The grouped DataFrame with the added class column.

def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pandas.core.frame.DataFrame:
421    def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame:
422        """
423        Filter the specified class definitions by percentage.
424
425        :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output
426        :param base: An optional integer specifying the class type to filter, defaults to 8
427        :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff
428        :rtype: pandas.DataFrame
429        """
430
431        match base:
432            case 8:
433                df = self.eightclass_df
434            case 2:
435                df = self.binaryclass_df
436            case _:
437                raise ValueError("Invalid base. Must be 6 or 8.")
438
439        return df[df["percentage"] >= cutoff].copy()

Filter the specified class definitions by percentage.

Parameters
  • cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output
  • base: An optional integer specifying the class type to filter, defaults to 8
Returns

A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff

@staticmethod
def get_binary_quadrant(angle_deg):
441    @staticmethod
442    def get_binary_quadrant(angle_deg):
443        """
444        Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments.
445
446        :param angle_deg (float or array-like): The angle in degrees.
447
448        Returns:
449        :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to.
450        """
451        angle_deg = (
452            np.array(angle_deg) % 360
453        )  # Normalize the angle to the range [0, 360)
454
455        if np.isscalar(angle_deg):
456            if angle_deg >= 0 and angle_deg < 180:
457                return str(2)
458
459            if angle_deg >= 180 and angle_deg < 360:
460                return str(0)
461
462            raise ValueError(
463                "Invalid angle value: angle must be in the range [-360, 360)."
464            )
465
466        quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0")
467        return "".join(quadrants)

Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments.

Parameters
  • angle_deg (float or array-like): The angle in degrees.

Returns:

Returns

The binary quadrant (0 or 2) that the angle belongs to.

@staticmethod
def get_sixth_quadrant(angle_deg):
469    @staticmethod
470    def get_sixth_quadrant(angle_deg):
471        """
472        Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments.
473
474        :param angle_deg (float or array-like): The angle in degrees.
475
476        Returns:
477        :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to.
478        """
479        angle_deg = (
480            np.array(angle_deg) % 360
481        )  # Normalize the angle to the range [0, 360)
482
483        if np.isscalar(angle_deg):
484            if angle_deg >= 0 and angle_deg < 60:
485                return str(6)
486            elif angle_deg >= 60 and angle_deg < 120:
487                return str(5)
488            elif angle_deg >= 120 and angle_deg < 180:
489                return str(4)
490            elif angle_deg >= 180 and angle_deg < 240:
491                return str(3)
492            elif angle_deg >= 240 and angle_deg < 300:
493                return str(2)
494            elif angle_deg >= 300 and angle_deg < 360:
495                return str(1)
496            else:
497                raise ValueError(
498                    "Invalid angle value: angle must be in the range [-360, 360)."
499                )
500        else:
501            quadrants = np.empty(angle_deg.shape, dtype=str)
502            quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6"
503            quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5"
504            quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4"
505            quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3"
506            quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2"
507            quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1"
508            return "".join(quadrants)

Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments.

Parameters
  • angle_deg (float or array-like): The angle in degrees.

Returns:

Returns

The sixth quadrant (1 to 6) that the angle belongs to.

@staticmethod
def get_eighth_quadrant(angle_deg):
510    @staticmethod
511    def get_eighth_quadrant(angle_deg):
512        """
513        Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments.
514
515        :param angle_deg (float or array-like): The angle in degrees.
516
517        Returns:
518        :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to.
519        """
520        angle_deg = (
521            np.array(angle_deg) % 360
522        )  # Normalize the angle to the range [0, 360)
523
524        if np.isscalar(angle_deg):
525            if angle_deg >= 0 and angle_deg < 45:
526                return str(8)
527            elif angle_deg >= 45 and angle_deg < 90:
528                return str(7)
529            elif angle_deg >= 90 and angle_deg < 135:
530                return str(6)
531            elif angle_deg >= 135 and angle_deg < 180:
532                return str(5)
533            elif angle_deg >= 180 and angle_deg < 225:
534                return str(4)
535            elif angle_deg >= 225 and angle_deg < 270:
536                return str(3)
537            elif angle_deg >= 270 and angle_deg < 315:
538                return str(2)
539            elif angle_deg >= 315 and angle_deg < 360:
540                return str(1)
541            else:
542                raise ValueError(
543                    "Invalid angle value: angle must be in the range [-360, 360)."
544                )
545        else:
546            quadrants = np.empty(angle_deg.shape, dtype=str)
547            quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8"
548            quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7"
549            quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6"
550            quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5"
551            quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4"
552            quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3"
553            quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2"
554            quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1"
555            return "".join(quadrants)

Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments.

Parameters
  • angle_deg (float or array-like): The angle in degrees.

Returns:

Returns

The eighth quadrant (1 to 8) that the angle belongs to.

@staticmethod
def class_string_from_dihedral(*args, base=8) -> str:
557    @staticmethod
558    def class_string_from_dihedral(*args, base=8) -> str:
559        """
560        Return the class string for a set of dihedral angles, given the base.
561
562        :param args: One or five dihedral angles.
563        :param base: The base class to use, 2, 6, or 8. Defaults to 8.
564        :return: The class string for the input dihedral angles.
565        :rtype: str
566        :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8.
567        """
568        if len(args) not in [1, 5]:
569            raise ValueError("You must enter either 1 or 5 dihedral angles.")
570
571        if base not in [2, 6, 8]:
572            raise ValueError("Invalid base. Must be 2, 6, or 8.")
573
574        angles = np.array(args).flatten()
575
576        if len(angles) == 1:
577            match base:
578                case 2:
579                    return DisulfideClass_Constructor.get_binary_quadrant(angles[0])
580                case 6:
581                    return DisulfideClass_Constructor.get_sixth_quadrant(angles[0])
582                case 8:
583                    return DisulfideClass_Constructor.get_eighth_quadrant(angles[0])
584                case _:
585                    raise ValueError("Invalid base. Must be 2, 6, or 8.")
586
587        elif len(angles) == 5:
588            match base:
589                case 2:
590                    return DisulfideClass_Constructor.get_binary_quadrant(angles)
591                case 6:
592                    return DisulfideClass_Constructor.get_sixth_quadrant(angles)
593                case 8:
594                    return DisulfideClass_Constructor.get_eighth_quadrant(angles)
595                case _:
596                    raise ValueError("Invalid base. Must be 2, 6, or 8.")

Return the class string for a set of dihedral angles, given the base.

Parameters
  • args: One or five dihedral angles.
  • base: The base class to use, 2, 6, or 8. Defaults to 8.
Returns

The class string for the input dihedral angles.

Raises
  • ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8.
def sslist_from_classid(self, cls: str, base=8) -> pandas.core.frame.DataFrame:
598    def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame:
599        """
600        Return the 'ss_id' value in the given DataFrame that corresponds to the
601        input 'cls' string in the class description
602        """
603        if base == 2:
604            df = self.binaryclass_df
605        elif base == 8:
606            df = self.eightclass_df
607        else:
608            raise ValueError("Invalid base. Must be 2 or 8.")
609
610        filtered_df = df[df["class_id"] == cls]
611
612        if len(filtered_df) == 0:
613            return None
614
615        if len(filtered_df) > 1:
616            raise ValueError(f"Multiple rows found for class_id '{cls}'")
617
618        return filtered_df.iloc[0]["ss_id"]

Return the 'ss_id' value in the given DataFrame that corresponds to the input 'cls' string in the class description

def class_to_binary(self, cls_str, base=8):
620    def class_to_binary(self, cls_str, base=8):
621        """
622        Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees
623        into a string of 5 characters, where each character is either '0' if the corresponding input character represents a
624        negative angle or '2' if it represents a positive angle.
625
626        :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees.
627        :param base (int): The base of the ordinal section (6 or 8).
628        :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle.
629        """
630        if base not in [6, 8]:
631            raise ValueError("Base must be either 6 or 8")
632
633        output_str = ""
634        for char in cls_str:
635            if base == 6:
636                if char in ["1", "2", "3"]:
637                    output_str += "2"
638                elif char in ["4", "5", "6"]:
639                    output_str += "0"
640            elif base == 8:
641                if char in ["1", "2", "3", "4"]:
642                    output_str += "2"
643                elif char in ["5", "6", "7", "8"]:
644                    output_str += "0"
645        return output_str

Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees into a string of 5 characters, where each character is either '0' if the corresponding input character represents a negative angle or '2' if it represents a positive angle.

Parameters
  • cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees.
  • base (int): The base of the ordinal section (6 or 8).
Returns

A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle.

def get_class_df(self, base=8):
647    def get_class_df(self, base=8):
648        """
649        Get the Disulfide structural classes DataFrame.
650
651        :param base: The base class to use, either 2 or 8. Defaults to 8.
652        :type base: int
653        :return: A DataFrame containing the class_id, count, incidence, and percentage columns.
654        :rtype: pandas.DataFrame
655        :raises ValueError: If the base is not 2 or 8.
656        """
657        columns = ["class_id", "count", "incidence", "percentage"]
658        match base:
659            case 2:
660                class_df = self.binaryclass_df
661            case 8:
662                class_df = self.eightclass_df
663            case _:
664                raise ValueError("Invalid base. Must be 2, or 8.")
665
666        result_df = class_df[columns]
667        return result_df

Get the Disulfide structural classes DataFrame.

Parameters
  • base: The base class to use, either 2 or 8. Defaults to 8.
Returns

A DataFrame containing the class_id, count, incidence, and percentage columns.

Raises
  • ValueError: If the base is not 2 or 8.