proteusPy.DisulfideClass_Constructor
DisulfideBond Class Analysis Dictionary creation Author: Eric G. Suchanek, PhD. License: BSD Last Modification: 2025-01-22 00:02:43 -egs-
Disulfide Class creation and manipulation. Binary classes using the +/- formalism of Hogg et al. (Biochem, 2006, 45, 7429-7433), are created for all 32 possible classes from the Disulfides extracted. Classes are named per Hogg's convention. This approach is extended to create sixfold and eightfold classes based on the subdividing each dihedral angle chi1 - chi5 into 8 equal segments, effectively quantizing them.
1""" 2DisulfideBond Class Analysis Dictionary creation 3Author: Eric G. Suchanek, PhD. 4License: BSD 5Last Modification: 2025-01-22 00:02:43 -egs- 6 7Disulfide Class creation and manipulation. Binary classes using the +/- formalism of Hogg et al. 8(Biochem, 2006, 45, 7429-7433), are created for all 32 possible classes from the Disulfides 9extracted. Classes are named per Hogg's convention. This approach is extended to create 10sixfold and eightfold classes based on the subdividing each dihedral angle chi1 - chi5 into 118 equal segments, effectively quantizing them. 12""" 13 14# pylint: disable=C0301 15# pylint: disable=C0103 16 17import itertools 18import pickle 19from io import StringIO 20from pathlib import Path 21 22import numpy as np 23import pandas as pd 24 25from proteusPy import __version__ 26from proteusPy.DisulfideList import DisulfideList 27from proteusPy.logger_config import create_logger 28from proteusPy.ProteusGlobals import ( 29 DATA_DIR, 30 SS_CLASS_DEFINITIONS, 31 SS_CONSENSUS_BIN_FILE, 32 SS_CONSENSUS_OCT_FILE, 33) 34 35_logger = create_logger(__name__) 36_logger.setLevel("INFO") 37 38 39class DisulfideClass_Constructor: 40 r""" 41 This Class manages structural classes for the disulfide bonds contained 42 in the proteusPy disulfide database. 43 44 Class builds the internal dictionary mapping disulfides to class names. 45 46 Disulfide binary classes are defined using the ± formalism described by 47 Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible 48 binary sidechain torsional combinations. Classes are named per Schmidt's convention. 49 The ``class_id`` represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$ 50 where *0* represents *negative* dihedral angle and *2* a *positive* angle. 51 52 | class_id | SS_Classname | FXN | count | incidence | percentage | 53 |-----------:|:---------------|:-----------|--------:|------------:|-------------:| 54 | 00000 | -LHSpiral | UNK | 40943 | 0.23359 | 23.359 | 55 | 00002 | 00002 | UNK | 9391 | 0.0535781 | 5.35781 | 56 | 00020 | -LHHook | UNK | 4844 | 0.0276363 | 2.76363 | 57 | 00022 | 00022 | UNK | 2426 | 0.0138409 | 1.38409 | 58 | 00200 | -RHStaple | Allosteric | 16146 | 0.092117 | 9.2117 | 59 | 00202 | 00202 | UNK | 1396 | 0.00796454 | 0.796454 | 60 | 00220 | 00220 | UNK | 7238 | 0.0412946 | 4.12946 | 61 | 00222 | 00222 | UNK | 6658 | 0.0379856 | 3.79856 | 62 | 02000 | 02000 | UNK | 7104 | 0.0405301 | 4.05301 | 63 | 02002 | 02002 | UNK | 8044 | 0.0458931 | 4.58931 | 64 | 02020 | -LHStaple | UNK | 3154 | 0.0179944 | 1.79944 | 65 | 02022 | 02022 | UNK | 1146 | 0.00653822 | 0.653822 | 66 | 02200 | -RHHook | UNK | 7115 | 0.0405929 | 4.05929 | 67 | 02202 | 02202 | UNK | 1021 | 0.00582507 | 0.582507 | 68 | 02220 | -RHSpiral | UNK | 8989 | 0.0512845 | 5.12845 | 69 | 02222 | 02222 | UNK | 7641 | 0.0435939 | 4.35939 | 70 | 20000 | ±LHSpiral | UNK | 5007 | 0.0285662 | 2.85662 | 71 | 20002 | +LHSpiral | UNK | 1611 | 0.00919117 | 0.919117 | 72 | 20020 | ±LHHook | UNK | 1258 | 0.00717721 | 0.717721 | 73 | 20022 | +LHHook | UNK | 823 | 0.00469542 | 0.469542 | 74 | 20200 | ±RHStaple | UNK | 745 | 0.00425042 | 0.425042 | 75 | 20202 | +RHStaple | UNK | 538 | 0.00306943 | 0.306943 | 76 | 20220 | ±RHHook | Catalytic | 1907 | 0.0108799 | 1.08799 | 77 | 20222 | 20222 | UNK | 1159 | 0.00661239 | 0.661239 | 78 | 22000 | -/+LHHook | UNK | 3652 | 0.0208356 | 2.08356 | 79 | 22002 | 22002 | UNK | 2052 | 0.0117072 | 1.17072 | 80 | 22020 | ±LHStaple | UNK | 1791 | 0.0102181 | 1.02181 | 81 | 22022 | +LHStaple | UNK | 579 | 0.00330334 | 0.330334 | 82 | 22200 | -/+RHHook | UNK | 8169 | 0.0466062 | 4.66062 | 83 | 22202 | +RHHook | UNK | 895 | 0.0051062 | 0.51062 | 84 | 22220 | ±RHSpiral | UNK | 3581 | 0.0204305 | 2.04305 | 85 | 22222 | +RHSpiral | UNK | 8254 | 0.0470912 | 4.70912 | 86 """ 87 88 def __init__(self, loader, verbose=True) -> None: 89 self.verbose = verbose 90 self.binaryclass_dict = {} 91 self.binaryclass_df = None 92 self.eightclass_df = None 93 self.eightclass_dict = {} 94 self.consensus_binary_list = None 95 self.consensus_oct_list = None 96 97 if self.verbose: 98 _logger.info( 99 "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE 100 ) 101 self.consensus_binary_list = self.load_consensus_file(oct=False) 102 103 if self.verbose: 104 _logger.info( 105 "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE 106 ) 107 self.consensus_oct_list = self.load_consensus_file(oct=True) 108 109 self.build_classes(loader) 110 111 def __getitem__(self, item: str) -> np.ndarray: 112 """ 113 Implements indexing against a class ID string. 114 115 Return an array of disulfide IDs given the input Class ID string. 116 117 :param item: The class ID string to index. 118 :type item: str 119 :return: An array of disulfide IDs corresponding to the class ID. 120 :rtype: np.ndarray 121 :raises ValueError: If an integer index is provided. 122 :raises DisulfideException: If the class ID is invalid. 123 """ 124 disulfides = None 125 126 if isinstance(item, int): 127 raise ValueError("Integer indexing not supported. Use a string key.") 128 129 if isinstance(item, str): 130 disulfides = self.class_to_sslist(item) 131 return disulfides 132 133 return disulfides 134 135 def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList: 136 """Load the consensus file from the specified file.""" 137 138 res = None 139 if oct: 140 fname = fpath / SS_CONSENSUS_OCT_FILE 141 else: 142 fname = fpath / SS_CONSENSUS_BIN_FILE 143 144 if not fname.exists(): 145 _logger.error("Cannot find file %s", fname) 146 raise FileNotFoundError(f"Cannot find file {fname}") 147 148 with open(fname, "rb") as f: 149 res = pickle.load(f) 150 return res 151 152 def build_class_df(self, class_df, group_df): 153 """Build a new DataFrame from the input DataFrames.""" 154 ss_id_col = group_df["ss_id"] 155 result_df = pd.concat([class_df, ss_id_col], axis=1) 156 return result_df 157 158 def class_to_sslist(self, clsid: str, base=8) -> np.ndarray: 159 """ 160 Return the list of disulfides corresponding to the input `clsid`. 161 This list is a list of disulfide identifiers, not the Disulfide objects themselves. 162 163 :param clsid: The class name to extract. Must be a string 164 in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates 165 binary or octant classes, respectively. 166 :type clsid: str 167 :param base: The base class to use, 2 or 8. Default is 8. 168 :type base: int 169 :param verbose: If True, display progress bars, by default False. 170 :type verbose: bool 171 :return: The list of disulfide bonds from the class. NB: this is the list 172 of disulfide identifiers, not the Disulfide objects themselves. 173 :rtype: DisulfideList 174 :raises ValueError: If an invalid base value is provided. 175 :raises KeyError: If the clsid is not found in the dictionary. 176 """ 177 cls = clsid[:5] 178 179 if not isinstance(clsid, str): 180 _logger.error("Invalid class ID: %s", clsid) 181 return np.array([]) 182 183 match len(clsid): 184 case 6: 185 match clsid[-1]: 186 case "b": 187 eightorbin = self.binaryclass_dict 188 case "o": 189 eightorbin = self.eightclass_dict 190 case _: 191 _logger.error("Invalid class ID suffix: %s", clsid) 192 return np.array([]) 193 194 case 5: 195 match base: 196 case 8: 197 eightorbin = self.eightclass_dict 198 case 2: 199 eightorbin = self.binaryclass_dict 200 case _: 201 _logger.error("Invalid base: %d", base) 202 return np.array([]) 203 case _: 204 _logger.error("Invalid class ID length: %s", clsid) 205 return np.array([]) 206 207 try: 208 ss_ids = eightorbin[cls] 209 210 except KeyError: 211 _logger.error("Cannot find key %s in SSBond DB", clsid) 212 return np.array([]) 213 214 return ss_ids 215 216 def list_classes(self, base=2): 217 """ 218 List the Disulfide structural classes. 219 220 :param self: The instance of the DisulfideClass_Constructor class. 221 :type self: DisulfideClass_Constructor 222 :param base: The base class to use, 2 or 8. 223 :type base: int 224 :return: None 225 :rtype: None 226 :raises ValueError: If an invalid base value is provided. 227 """ 228 match base: 229 case 2: 230 for k, v in enumerate(self.binaryclass_dict): 231 print(f"Class: |{k}|, |{v}|") 232 case 8: 233 for k, v in enumerate(self.eightclass_dict): 234 print(f"Class: |{k}|, |{v}|") 235 case _: 236 raise ValueError("Invalid base. Must be 2 or 8.") 237 238 def concat_dataframes(self, df1, df2): 239 """ 240 Concatenates columns from one data frame into the other 241 and returns the new result. 242 243 Parameters 244 ---------- 245 df1 : pandas.DataFrame 246 The first data frame. 247 df2 : pandas.DataFrame 248 The second data frame. 249 250 Returns 251 ------- 252 pandas.DataFrame 253 The concatenated data frame. 254 255 """ 256 # Merge the data frames based on the 'SS_Classname' column 257 result = pd.merge(df1, df2, on="class_id") 258 259 return result 260 261 def binary_to_class(self, class_str: str, base: int = 8) -> list: 262 """ 263 Convert a binary input string to a list of possible class strings based on the specified base. 264 265 Returns a list of all possible combinations of ordinal sections of a unit circle 266 divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise, 267 based on the sign of each angle in the input string. 268 269 :param class_str: A string of length 5, where each character represents the sign 270 of an angle in the range of -180-180 degrees. 271 :type class_str: str 272 :param base: The base class to use, 6 or 8. 273 :type base: int 274 :return: A list of strings of length 5, representing all possible class strings. 275 :rtype: list 276 :raises ValueError: If an invalid base value is provided. 277 """ 278 match base: 279 case 6: 280 angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]} 281 case 8: 282 angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]} 283 case _: 284 raise ValueError("Invalid base value. Must be 6 or 8.") 285 286 class_lists = [angle_maps[char] for char in class_str] 287 class_combinations = itertools.product(*class_lists) 288 class_strings = ["".join(combination) for combination in class_combinations] 289 return class_strings 290 291 def build_classes(self, loader) -> None: 292 """ 293 Build the internal structures needed for the binary and octant disulfide structural classes 294 based on dihedral angle rules. 295 296 :param loader: The DisulfideLoader object containing the data. 297 :type loader: DisulfideLoader 298 :return: None 299 :rtype: None 300 """ 301 302 self.version = __version__ 303 304 tors_df = loader.getTorsions() 305 306 if self.verbose: 307 _logger.info("Creating binary SS classes...") 308 309 grouped = self.create_binary_classes(tors_df) 310 311 class_df = pd.read_csv( 312 StringIO(SS_CLASS_DEFINITIONS), 313 dtype={ 314 "class_id": "string", 315 "FXN": "string", 316 "SS_Classname": "string", 317 }, 318 ) 319 class_df["FXN"].str.strip() 320 class_df["SS_Classname"].str.strip() 321 class_df["class_id"].str.strip() 322 323 merged = self.concat_dataframes(class_df, grouped) 324 merged.drop( 325 columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"], 326 inplace=True, 327 ) 328 329 classdict = dict(zip(merged["class_id"], merged["ss_id"])) 330 self.binaryclass_dict = classdict 331 self.binaryclass_df = merged.copy() 332 333 if self.verbose: 334 _logger.info("Creating eightfold SS classes...") 335 336 grouped_eightclass = self.create_classes(tors_df, 8) 337 self.eightclass_df = grouped_eightclass.copy() 338 self.eightclass_dict = dict( 339 zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"]) 340 ) 341 342 if self.verbose: 343 _logger.info("Initialization complete.") 344 345 return 346 347 def create_binary_classes(self, df) -> pd.DataFrame: 348 """ 349 Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping. 350 351 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 352 'cb_distance', 'torsion_length', and 'energy'. 353 :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id' 354 is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that 355 grouping, and 'count' is the number of rows in that grouping. 356 """ 357 # Create new columns with the sign of each chi column 358 chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"] 359 sign_columns = [col + "_s" for col in chi_columns] 360 df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1) 361 362 # Create a new column with the class ID for each row 363 class_id_column = "class_id" 364 df[class_id_column] = (df[sign_columns] + 1).apply( 365 lambda x: "".join(x.astype(str)), axis=1 366 ) 367 368 # Group the DataFrame by the class ID and return the grouped data 369 grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index() 370 grouped["count"] = grouped["ss_id"].apply(len) 371 grouped["incidence"] = grouped["count"] / len(df) 372 grouped["percentage"] = grouped["incidence"] * 100 373 374 return grouped 375 376 def create_classes(self, df, base=8) -> pd.DataFrame: 377 """ 378 Create a new DataFrame from the input with a 8-class encoding for input 'chi' values. 379 380 The function takes a pandas DataFrame containing the following columns: 381 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 382 'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules: 383 384 1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi. 385 2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group, 386 the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the 387 percentage of incidence. 388 389 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 390 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho' 391 :return: The grouped DataFrame with the added class column. 392 """ 393 394 _df = pd.DataFrame() 395 if base == 6: 396 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 397 _df[col_name + "_t"] = df[col_name].apply( 398 DisulfideClass_Constructor.get_sixth_quadrant 399 ) 400 elif base == 8: 401 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 402 _df[col_name + "_t"] = df[col_name].apply( 403 DisulfideClass_Constructor.get_eighth_quadrant 404 ) 405 else: 406 raise ValueError("Base must be either 6 or 8") 407 408 df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg( 409 "".join, axis=1 410 ) 411 412 grouped = df.groupby("class_id").agg({"ss_id": "unique"}) 413 grouped["count"] = grouped["ss_id"].str.len() 414 grouped["incidence"] = grouped["count"] / len(df) 415 grouped["percentage"] = grouped["incidence"] * 100 416 grouped.reset_index(inplace=True) 417 418 return grouped 419 420 def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame: 421 """ 422 Filter the specified class definitions by percentage. 423 424 :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output 425 :param base: An optional integer specifying the class type to filter, defaults to 8 426 :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff 427 :rtype: pandas.DataFrame 428 """ 429 430 match base: 431 case 8: 432 df = self.eightclass_df 433 case 2: 434 df = self.binaryclass_df 435 case _: 436 raise ValueError("Invalid base. Must be 6 or 8.") 437 438 return df[df["percentage"] >= cutoff].copy() 439 440 @staticmethod 441 def get_binary_quadrant(angle_deg): 442 """ 443 Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments. 444 445 :param angle_deg (float or array-like): The angle in degrees. 446 447 Returns: 448 :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to. 449 """ 450 angle_deg = ( 451 np.array(angle_deg) % 360 452 ) # Normalize the angle to the range [0, 360) 453 454 if np.isscalar(angle_deg): 455 if angle_deg >= 0 and angle_deg < 180: 456 return str(2) 457 458 if angle_deg >= 180 and angle_deg < 360: 459 return str(0) 460 461 raise ValueError( 462 "Invalid angle value: angle must be in the range [-360, 360)." 463 ) 464 465 quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0") 466 return "".join(quadrants) 467 468 @staticmethod 469 def get_sixth_quadrant(angle_deg): 470 """ 471 Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments. 472 473 :param angle_deg (float or array-like): The angle in degrees. 474 475 Returns: 476 :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to. 477 """ 478 angle_deg = ( 479 np.array(angle_deg) % 360 480 ) # Normalize the angle to the range [0, 360) 481 482 if np.isscalar(angle_deg): 483 if angle_deg >= 0 and angle_deg < 60: 484 return str(6) 485 elif angle_deg >= 60 and angle_deg < 120: 486 return str(5) 487 elif angle_deg >= 120 and angle_deg < 180: 488 return str(4) 489 elif angle_deg >= 180 and angle_deg < 240: 490 return str(3) 491 elif angle_deg >= 240 and angle_deg < 300: 492 return str(2) 493 elif angle_deg >= 300 and angle_deg < 360: 494 return str(1) 495 else: 496 raise ValueError( 497 "Invalid angle value: angle must be in the range [-360, 360)." 498 ) 499 else: 500 quadrants = np.empty(angle_deg.shape, dtype=str) 501 quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6" 502 quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5" 503 quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4" 504 quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3" 505 quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2" 506 quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1" 507 return "".join(quadrants) 508 509 @staticmethod 510 def get_eighth_quadrant(angle_deg): 511 """ 512 Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments. 513 514 :param angle_deg (float or array-like): The angle in degrees. 515 516 Returns: 517 :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to. 518 """ 519 angle_deg = ( 520 np.array(angle_deg) % 360 521 ) # Normalize the angle to the range [0, 360) 522 523 if np.isscalar(angle_deg): 524 if angle_deg >= 0 and angle_deg < 45: 525 return str(8) 526 elif angle_deg >= 45 and angle_deg < 90: 527 return str(7) 528 elif angle_deg >= 90 and angle_deg < 135: 529 return str(6) 530 elif angle_deg >= 135 and angle_deg < 180: 531 return str(5) 532 elif angle_deg >= 180 and angle_deg < 225: 533 return str(4) 534 elif angle_deg >= 225 and angle_deg < 270: 535 return str(3) 536 elif angle_deg >= 270 and angle_deg < 315: 537 return str(2) 538 elif angle_deg >= 315 and angle_deg < 360: 539 return str(1) 540 else: 541 raise ValueError( 542 "Invalid angle value: angle must be in the range [-360, 360)." 543 ) 544 else: 545 quadrants = np.empty(angle_deg.shape, dtype=str) 546 quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8" 547 quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7" 548 quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6" 549 quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5" 550 quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4" 551 quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3" 552 quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2" 553 quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1" 554 return "".join(quadrants) 555 556 @staticmethod 557 def class_string_from_dihedral(*args, base=8) -> str: 558 """ 559 Return the class string for a set of dihedral angles, given the base. 560 561 :param args: One or five dihedral angles. 562 :param base: The base class to use, 2, 6, or 8. Defaults to 8. 563 :return: The class string for the input dihedral angles. 564 :rtype: str 565 :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8. 566 """ 567 if len(args) not in [1, 5]: 568 raise ValueError("You must enter either 1 or 5 dihedral angles.") 569 570 if base not in [2, 6, 8]: 571 raise ValueError("Invalid base. Must be 2, 6, or 8.") 572 573 angles = np.array(args).flatten() 574 575 if len(angles) == 1: 576 match base: 577 case 2: 578 return DisulfideClass_Constructor.get_binary_quadrant(angles[0]) 579 case 6: 580 return DisulfideClass_Constructor.get_sixth_quadrant(angles[0]) 581 case 8: 582 return DisulfideClass_Constructor.get_eighth_quadrant(angles[0]) 583 case _: 584 raise ValueError("Invalid base. Must be 2, 6, or 8.") 585 586 elif len(angles) == 5: 587 match base: 588 case 2: 589 return DisulfideClass_Constructor.get_binary_quadrant(angles) 590 case 6: 591 return DisulfideClass_Constructor.get_sixth_quadrant(angles) 592 case 8: 593 return DisulfideClass_Constructor.get_eighth_quadrant(angles) 594 case _: 595 raise ValueError("Invalid base. Must be 2, 6, or 8.") 596 597 def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame: 598 """ 599 Return the 'ss_id' value in the given DataFrame that corresponds to the 600 input 'cls' string in the class description 601 """ 602 if base == 2: 603 df = self.binaryclass_df 604 elif base == 8: 605 df = self.eightclass_df 606 else: 607 raise ValueError("Invalid base. Must be 2 or 8.") 608 609 filtered_df = df[df["class_id"] == cls] 610 611 if len(filtered_df) == 0: 612 return None 613 614 if len(filtered_df) > 1: 615 raise ValueError(f"Multiple rows found for class_id '{cls}'") 616 617 return filtered_df.iloc[0]["ss_id"] 618 619 def class_to_binary(self, cls_str, base=8): 620 """ 621 Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees 622 into a string of 5 characters, where each character is either '0' if the corresponding input character represents a 623 negative angle or '2' if it represents a positive angle. 624 625 :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees. 626 :param base (int): The base of the ordinal section (6 or 8). 627 :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle. 628 """ 629 if base not in [6, 8]: 630 raise ValueError("Base must be either 6 or 8") 631 632 output_str = "" 633 for char in cls_str: 634 if base == 6: 635 if char in ["1", "2", "3"]: 636 output_str += "2" 637 elif char in ["4", "5", "6"]: 638 output_str += "0" 639 elif base == 8: 640 if char in ["1", "2", "3", "4"]: 641 output_str += "2" 642 elif char in ["5", "6", "7", "8"]: 643 output_str += "0" 644 return output_str 645 646 def get_class_df(self, base=8): 647 """ 648 Get the Disulfide structural classes DataFrame. 649 650 :param base: The base class to use, either 2 or 8. Defaults to 8. 651 :type base: int 652 :return: A DataFrame containing the class_id, count, incidence, and percentage columns. 653 :rtype: pandas.DataFrame 654 :raises ValueError: If the base is not 2 or 8. 655 """ 656 columns = ["class_id", "count", "incidence", "percentage"] 657 match base: 658 case 2: 659 class_df = self.binaryclass_df 660 case 8: 661 class_df = self.eightclass_df 662 case _: 663 raise ValueError("Invalid base. Must be 2, or 8.") 664 665 result_df = class_df[columns] 666 return result_df 667 668 669# class definition ends 670 671# end of file
40class DisulfideClass_Constructor: 41 r""" 42 This Class manages structural classes for the disulfide bonds contained 43 in the proteusPy disulfide database. 44 45 Class builds the internal dictionary mapping disulfides to class names. 46 47 Disulfide binary classes are defined using the ± formalism described by 48 Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible 49 binary sidechain torsional combinations. Classes are named per Schmidt's convention. 50 The ``class_id`` represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$ 51 where *0* represents *negative* dihedral angle and *2* a *positive* angle. 52 53 | class_id | SS_Classname | FXN | count | incidence | percentage | 54 |-----------:|:---------------|:-----------|--------:|------------:|-------------:| 55 | 00000 | -LHSpiral | UNK | 40943 | 0.23359 | 23.359 | 56 | 00002 | 00002 | UNK | 9391 | 0.0535781 | 5.35781 | 57 | 00020 | -LHHook | UNK | 4844 | 0.0276363 | 2.76363 | 58 | 00022 | 00022 | UNK | 2426 | 0.0138409 | 1.38409 | 59 | 00200 | -RHStaple | Allosteric | 16146 | 0.092117 | 9.2117 | 60 | 00202 | 00202 | UNK | 1396 | 0.00796454 | 0.796454 | 61 | 00220 | 00220 | UNK | 7238 | 0.0412946 | 4.12946 | 62 | 00222 | 00222 | UNK | 6658 | 0.0379856 | 3.79856 | 63 | 02000 | 02000 | UNK | 7104 | 0.0405301 | 4.05301 | 64 | 02002 | 02002 | UNK | 8044 | 0.0458931 | 4.58931 | 65 | 02020 | -LHStaple | UNK | 3154 | 0.0179944 | 1.79944 | 66 | 02022 | 02022 | UNK | 1146 | 0.00653822 | 0.653822 | 67 | 02200 | -RHHook | UNK | 7115 | 0.0405929 | 4.05929 | 68 | 02202 | 02202 | UNK | 1021 | 0.00582507 | 0.582507 | 69 | 02220 | -RHSpiral | UNK | 8989 | 0.0512845 | 5.12845 | 70 | 02222 | 02222 | UNK | 7641 | 0.0435939 | 4.35939 | 71 | 20000 | ±LHSpiral | UNK | 5007 | 0.0285662 | 2.85662 | 72 | 20002 | +LHSpiral | UNK | 1611 | 0.00919117 | 0.919117 | 73 | 20020 | ±LHHook | UNK | 1258 | 0.00717721 | 0.717721 | 74 | 20022 | +LHHook | UNK | 823 | 0.00469542 | 0.469542 | 75 | 20200 | ±RHStaple | UNK | 745 | 0.00425042 | 0.425042 | 76 | 20202 | +RHStaple | UNK | 538 | 0.00306943 | 0.306943 | 77 | 20220 | ±RHHook | Catalytic | 1907 | 0.0108799 | 1.08799 | 78 | 20222 | 20222 | UNK | 1159 | 0.00661239 | 0.661239 | 79 | 22000 | -/+LHHook | UNK | 3652 | 0.0208356 | 2.08356 | 80 | 22002 | 22002 | UNK | 2052 | 0.0117072 | 1.17072 | 81 | 22020 | ±LHStaple | UNK | 1791 | 0.0102181 | 1.02181 | 82 | 22022 | +LHStaple | UNK | 579 | 0.00330334 | 0.330334 | 83 | 22200 | -/+RHHook | UNK | 8169 | 0.0466062 | 4.66062 | 84 | 22202 | +RHHook | UNK | 895 | 0.0051062 | 0.51062 | 85 | 22220 | ±RHSpiral | UNK | 3581 | 0.0204305 | 2.04305 | 86 | 22222 | +RHSpiral | UNK | 8254 | 0.0470912 | 4.70912 | 87 """ 88 89 def __init__(self, loader, verbose=True) -> None: 90 self.verbose = verbose 91 self.binaryclass_dict = {} 92 self.binaryclass_df = None 93 self.eightclass_df = None 94 self.eightclass_dict = {} 95 self.consensus_binary_list = None 96 self.consensus_oct_list = None 97 98 if self.verbose: 99 _logger.info( 100 "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE 101 ) 102 self.consensus_binary_list = self.load_consensus_file(oct=False) 103 104 if self.verbose: 105 _logger.info( 106 "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE 107 ) 108 self.consensus_oct_list = self.load_consensus_file(oct=True) 109 110 self.build_classes(loader) 111 112 def __getitem__(self, item: str) -> np.ndarray: 113 """ 114 Implements indexing against a class ID string. 115 116 Return an array of disulfide IDs given the input Class ID string. 117 118 :param item: The class ID string to index. 119 :type item: str 120 :return: An array of disulfide IDs corresponding to the class ID. 121 :rtype: np.ndarray 122 :raises ValueError: If an integer index is provided. 123 :raises DisulfideException: If the class ID is invalid. 124 """ 125 disulfides = None 126 127 if isinstance(item, int): 128 raise ValueError("Integer indexing not supported. Use a string key.") 129 130 if isinstance(item, str): 131 disulfides = self.class_to_sslist(item) 132 return disulfides 133 134 return disulfides 135 136 def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList: 137 """Load the consensus file from the specified file.""" 138 139 res = None 140 if oct: 141 fname = fpath / SS_CONSENSUS_OCT_FILE 142 else: 143 fname = fpath / SS_CONSENSUS_BIN_FILE 144 145 if not fname.exists(): 146 _logger.error("Cannot find file %s", fname) 147 raise FileNotFoundError(f"Cannot find file {fname}") 148 149 with open(fname, "rb") as f: 150 res = pickle.load(f) 151 return res 152 153 def build_class_df(self, class_df, group_df): 154 """Build a new DataFrame from the input DataFrames.""" 155 ss_id_col = group_df["ss_id"] 156 result_df = pd.concat([class_df, ss_id_col], axis=1) 157 return result_df 158 159 def class_to_sslist(self, clsid: str, base=8) -> np.ndarray: 160 """ 161 Return the list of disulfides corresponding to the input `clsid`. 162 This list is a list of disulfide identifiers, not the Disulfide objects themselves. 163 164 :param clsid: The class name to extract. Must be a string 165 in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates 166 binary or octant classes, respectively. 167 :type clsid: str 168 :param base: The base class to use, 2 or 8. Default is 8. 169 :type base: int 170 :param verbose: If True, display progress bars, by default False. 171 :type verbose: bool 172 :return: The list of disulfide bonds from the class. NB: this is the list 173 of disulfide identifiers, not the Disulfide objects themselves. 174 :rtype: DisulfideList 175 :raises ValueError: If an invalid base value is provided. 176 :raises KeyError: If the clsid is not found in the dictionary. 177 """ 178 cls = clsid[:5] 179 180 if not isinstance(clsid, str): 181 _logger.error("Invalid class ID: %s", clsid) 182 return np.array([]) 183 184 match len(clsid): 185 case 6: 186 match clsid[-1]: 187 case "b": 188 eightorbin = self.binaryclass_dict 189 case "o": 190 eightorbin = self.eightclass_dict 191 case _: 192 _logger.error("Invalid class ID suffix: %s", clsid) 193 return np.array([]) 194 195 case 5: 196 match base: 197 case 8: 198 eightorbin = self.eightclass_dict 199 case 2: 200 eightorbin = self.binaryclass_dict 201 case _: 202 _logger.error("Invalid base: %d", base) 203 return np.array([]) 204 case _: 205 _logger.error("Invalid class ID length: %s", clsid) 206 return np.array([]) 207 208 try: 209 ss_ids = eightorbin[cls] 210 211 except KeyError: 212 _logger.error("Cannot find key %s in SSBond DB", clsid) 213 return np.array([]) 214 215 return ss_ids 216 217 def list_classes(self, base=2): 218 """ 219 List the Disulfide structural classes. 220 221 :param self: The instance of the DisulfideClass_Constructor class. 222 :type self: DisulfideClass_Constructor 223 :param base: The base class to use, 2 or 8. 224 :type base: int 225 :return: None 226 :rtype: None 227 :raises ValueError: If an invalid base value is provided. 228 """ 229 match base: 230 case 2: 231 for k, v in enumerate(self.binaryclass_dict): 232 print(f"Class: |{k}|, |{v}|") 233 case 8: 234 for k, v in enumerate(self.eightclass_dict): 235 print(f"Class: |{k}|, |{v}|") 236 case _: 237 raise ValueError("Invalid base. Must be 2 or 8.") 238 239 def concat_dataframes(self, df1, df2): 240 """ 241 Concatenates columns from one data frame into the other 242 and returns the new result. 243 244 Parameters 245 ---------- 246 df1 : pandas.DataFrame 247 The first data frame. 248 df2 : pandas.DataFrame 249 The second data frame. 250 251 Returns 252 ------- 253 pandas.DataFrame 254 The concatenated data frame. 255 256 """ 257 # Merge the data frames based on the 'SS_Classname' column 258 result = pd.merge(df1, df2, on="class_id") 259 260 return result 261 262 def binary_to_class(self, class_str: str, base: int = 8) -> list: 263 """ 264 Convert a binary input string to a list of possible class strings based on the specified base. 265 266 Returns a list of all possible combinations of ordinal sections of a unit circle 267 divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise, 268 based on the sign of each angle in the input string. 269 270 :param class_str: A string of length 5, where each character represents the sign 271 of an angle in the range of -180-180 degrees. 272 :type class_str: str 273 :param base: The base class to use, 6 or 8. 274 :type base: int 275 :return: A list of strings of length 5, representing all possible class strings. 276 :rtype: list 277 :raises ValueError: If an invalid base value is provided. 278 """ 279 match base: 280 case 6: 281 angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]} 282 case 8: 283 angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]} 284 case _: 285 raise ValueError("Invalid base value. Must be 6 or 8.") 286 287 class_lists = [angle_maps[char] for char in class_str] 288 class_combinations = itertools.product(*class_lists) 289 class_strings = ["".join(combination) for combination in class_combinations] 290 return class_strings 291 292 def build_classes(self, loader) -> None: 293 """ 294 Build the internal structures needed for the binary and octant disulfide structural classes 295 based on dihedral angle rules. 296 297 :param loader: The DisulfideLoader object containing the data. 298 :type loader: DisulfideLoader 299 :return: None 300 :rtype: None 301 """ 302 303 self.version = __version__ 304 305 tors_df = loader.getTorsions() 306 307 if self.verbose: 308 _logger.info("Creating binary SS classes...") 309 310 grouped = self.create_binary_classes(tors_df) 311 312 class_df = pd.read_csv( 313 StringIO(SS_CLASS_DEFINITIONS), 314 dtype={ 315 "class_id": "string", 316 "FXN": "string", 317 "SS_Classname": "string", 318 }, 319 ) 320 class_df["FXN"].str.strip() 321 class_df["SS_Classname"].str.strip() 322 class_df["class_id"].str.strip() 323 324 merged = self.concat_dataframes(class_df, grouped) 325 merged.drop( 326 columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"], 327 inplace=True, 328 ) 329 330 classdict = dict(zip(merged["class_id"], merged["ss_id"])) 331 self.binaryclass_dict = classdict 332 self.binaryclass_df = merged.copy() 333 334 if self.verbose: 335 _logger.info("Creating eightfold SS classes...") 336 337 grouped_eightclass = self.create_classes(tors_df, 8) 338 self.eightclass_df = grouped_eightclass.copy() 339 self.eightclass_dict = dict( 340 zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"]) 341 ) 342 343 if self.verbose: 344 _logger.info("Initialization complete.") 345 346 return 347 348 def create_binary_classes(self, df) -> pd.DataFrame: 349 """ 350 Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping. 351 352 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 353 'cb_distance', 'torsion_length', and 'energy'. 354 :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id' 355 is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that 356 grouping, and 'count' is the number of rows in that grouping. 357 """ 358 # Create new columns with the sign of each chi column 359 chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"] 360 sign_columns = [col + "_s" for col in chi_columns] 361 df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1) 362 363 # Create a new column with the class ID for each row 364 class_id_column = "class_id" 365 df[class_id_column] = (df[sign_columns] + 1).apply( 366 lambda x: "".join(x.astype(str)), axis=1 367 ) 368 369 # Group the DataFrame by the class ID and return the grouped data 370 grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index() 371 grouped["count"] = grouped["ss_id"].apply(len) 372 grouped["incidence"] = grouped["count"] / len(df) 373 grouped["percentage"] = grouped["incidence"] * 100 374 375 return grouped 376 377 def create_classes(self, df, base=8) -> pd.DataFrame: 378 """ 379 Create a new DataFrame from the input with a 8-class encoding for input 'chi' values. 380 381 The function takes a pandas DataFrame containing the following columns: 382 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 383 'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules: 384 385 1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi. 386 2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group, 387 the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the 388 percentage of incidence. 389 390 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 391 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho' 392 :return: The grouped DataFrame with the added class column. 393 """ 394 395 _df = pd.DataFrame() 396 if base == 6: 397 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 398 _df[col_name + "_t"] = df[col_name].apply( 399 DisulfideClass_Constructor.get_sixth_quadrant 400 ) 401 elif base == 8: 402 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 403 _df[col_name + "_t"] = df[col_name].apply( 404 DisulfideClass_Constructor.get_eighth_quadrant 405 ) 406 else: 407 raise ValueError("Base must be either 6 or 8") 408 409 df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg( 410 "".join, axis=1 411 ) 412 413 grouped = df.groupby("class_id").agg({"ss_id": "unique"}) 414 grouped["count"] = grouped["ss_id"].str.len() 415 grouped["incidence"] = grouped["count"] / len(df) 416 grouped["percentage"] = grouped["incidence"] * 100 417 grouped.reset_index(inplace=True) 418 419 return grouped 420 421 def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame: 422 """ 423 Filter the specified class definitions by percentage. 424 425 :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output 426 :param base: An optional integer specifying the class type to filter, defaults to 8 427 :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff 428 :rtype: pandas.DataFrame 429 """ 430 431 match base: 432 case 8: 433 df = self.eightclass_df 434 case 2: 435 df = self.binaryclass_df 436 case _: 437 raise ValueError("Invalid base. Must be 6 or 8.") 438 439 return df[df["percentage"] >= cutoff].copy() 440 441 @staticmethod 442 def get_binary_quadrant(angle_deg): 443 """ 444 Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments. 445 446 :param angle_deg (float or array-like): The angle in degrees. 447 448 Returns: 449 :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to. 450 """ 451 angle_deg = ( 452 np.array(angle_deg) % 360 453 ) # Normalize the angle to the range [0, 360) 454 455 if np.isscalar(angle_deg): 456 if angle_deg >= 0 and angle_deg < 180: 457 return str(2) 458 459 if angle_deg >= 180 and angle_deg < 360: 460 return str(0) 461 462 raise ValueError( 463 "Invalid angle value: angle must be in the range [-360, 360)." 464 ) 465 466 quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0") 467 return "".join(quadrants) 468 469 @staticmethod 470 def get_sixth_quadrant(angle_deg): 471 """ 472 Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments. 473 474 :param angle_deg (float or array-like): The angle in degrees. 475 476 Returns: 477 :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to. 478 """ 479 angle_deg = ( 480 np.array(angle_deg) % 360 481 ) # Normalize the angle to the range [0, 360) 482 483 if np.isscalar(angle_deg): 484 if angle_deg >= 0 and angle_deg < 60: 485 return str(6) 486 elif angle_deg >= 60 and angle_deg < 120: 487 return str(5) 488 elif angle_deg >= 120 and angle_deg < 180: 489 return str(4) 490 elif angle_deg >= 180 and angle_deg < 240: 491 return str(3) 492 elif angle_deg >= 240 and angle_deg < 300: 493 return str(2) 494 elif angle_deg >= 300 and angle_deg < 360: 495 return str(1) 496 else: 497 raise ValueError( 498 "Invalid angle value: angle must be in the range [-360, 360)." 499 ) 500 else: 501 quadrants = np.empty(angle_deg.shape, dtype=str) 502 quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6" 503 quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5" 504 quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4" 505 quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3" 506 quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2" 507 quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1" 508 return "".join(quadrants) 509 510 @staticmethod 511 def get_eighth_quadrant(angle_deg): 512 """ 513 Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments. 514 515 :param angle_deg (float or array-like): The angle in degrees. 516 517 Returns: 518 :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to. 519 """ 520 angle_deg = ( 521 np.array(angle_deg) % 360 522 ) # Normalize the angle to the range [0, 360) 523 524 if np.isscalar(angle_deg): 525 if angle_deg >= 0 and angle_deg < 45: 526 return str(8) 527 elif angle_deg >= 45 and angle_deg < 90: 528 return str(7) 529 elif angle_deg >= 90 and angle_deg < 135: 530 return str(6) 531 elif angle_deg >= 135 and angle_deg < 180: 532 return str(5) 533 elif angle_deg >= 180 and angle_deg < 225: 534 return str(4) 535 elif angle_deg >= 225 and angle_deg < 270: 536 return str(3) 537 elif angle_deg >= 270 and angle_deg < 315: 538 return str(2) 539 elif angle_deg >= 315 and angle_deg < 360: 540 return str(1) 541 else: 542 raise ValueError( 543 "Invalid angle value: angle must be in the range [-360, 360)." 544 ) 545 else: 546 quadrants = np.empty(angle_deg.shape, dtype=str) 547 quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8" 548 quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7" 549 quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6" 550 quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5" 551 quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4" 552 quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3" 553 quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2" 554 quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1" 555 return "".join(quadrants) 556 557 @staticmethod 558 def class_string_from_dihedral(*args, base=8) -> str: 559 """ 560 Return the class string for a set of dihedral angles, given the base. 561 562 :param args: One or five dihedral angles. 563 :param base: The base class to use, 2, 6, or 8. Defaults to 8. 564 :return: The class string for the input dihedral angles. 565 :rtype: str 566 :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8. 567 """ 568 if len(args) not in [1, 5]: 569 raise ValueError("You must enter either 1 or 5 dihedral angles.") 570 571 if base not in [2, 6, 8]: 572 raise ValueError("Invalid base. Must be 2, 6, or 8.") 573 574 angles = np.array(args).flatten() 575 576 if len(angles) == 1: 577 match base: 578 case 2: 579 return DisulfideClass_Constructor.get_binary_quadrant(angles[0]) 580 case 6: 581 return DisulfideClass_Constructor.get_sixth_quadrant(angles[0]) 582 case 8: 583 return DisulfideClass_Constructor.get_eighth_quadrant(angles[0]) 584 case _: 585 raise ValueError("Invalid base. Must be 2, 6, or 8.") 586 587 elif len(angles) == 5: 588 match base: 589 case 2: 590 return DisulfideClass_Constructor.get_binary_quadrant(angles) 591 case 6: 592 return DisulfideClass_Constructor.get_sixth_quadrant(angles) 593 case 8: 594 return DisulfideClass_Constructor.get_eighth_quadrant(angles) 595 case _: 596 raise ValueError("Invalid base. Must be 2, 6, or 8.") 597 598 def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame: 599 """ 600 Return the 'ss_id' value in the given DataFrame that corresponds to the 601 input 'cls' string in the class description 602 """ 603 if base == 2: 604 df = self.binaryclass_df 605 elif base == 8: 606 df = self.eightclass_df 607 else: 608 raise ValueError("Invalid base. Must be 2 or 8.") 609 610 filtered_df = df[df["class_id"] == cls] 611 612 if len(filtered_df) == 0: 613 return None 614 615 if len(filtered_df) > 1: 616 raise ValueError(f"Multiple rows found for class_id '{cls}'") 617 618 return filtered_df.iloc[0]["ss_id"] 619 620 def class_to_binary(self, cls_str, base=8): 621 """ 622 Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees 623 into a string of 5 characters, where each character is either '0' if the corresponding input character represents a 624 negative angle or '2' if it represents a positive angle. 625 626 :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees. 627 :param base (int): The base of the ordinal section (6 or 8). 628 :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle. 629 """ 630 if base not in [6, 8]: 631 raise ValueError("Base must be either 6 or 8") 632 633 output_str = "" 634 for char in cls_str: 635 if base == 6: 636 if char in ["1", "2", "3"]: 637 output_str += "2" 638 elif char in ["4", "5", "6"]: 639 output_str += "0" 640 elif base == 8: 641 if char in ["1", "2", "3", "4"]: 642 output_str += "2" 643 elif char in ["5", "6", "7", "8"]: 644 output_str += "0" 645 return output_str 646 647 def get_class_df(self, base=8): 648 """ 649 Get the Disulfide structural classes DataFrame. 650 651 :param base: The base class to use, either 2 or 8. Defaults to 8. 652 :type base: int 653 :return: A DataFrame containing the class_id, count, incidence, and percentage columns. 654 :rtype: pandas.DataFrame 655 :raises ValueError: If the base is not 2 or 8. 656 """ 657 columns = ["class_id", "count", "incidence", "percentage"] 658 match base: 659 case 2: 660 class_df = self.binaryclass_df 661 case 8: 662 class_df = self.eightclass_df 663 case _: 664 raise ValueError("Invalid base. Must be 2, or 8.") 665 666 result_df = class_df[columns] 667 return result_df
This Class manages structural classes for the disulfide bonds contained in the proteusPy disulfide database.
Class builds the internal dictionary mapping disulfides to class names.
Disulfide binary classes are defined using the ± formalism described by
Schmidt et al. (Biochem, 2006, 45, 7429-7433), across all 32 (2^5), possible
binary sidechain torsional combinations. Classes are named per Schmidt's convention.
The class_id
represents the sign of each dihedral angle $\chi_{1} - \chi_{1'}$
where 0 represents negative dihedral angle and 2 a positive angle.
class_id | SS_Classname | FXN | count | incidence | percentage |
---|---|---|---|---|---|
00000 | -LHSpiral | UNK | 40943 | 0.23359 | 23.359 |
00002 | 00002 | UNK | 9391 | 0.0535781 | 5.35781 |
00020 | -LHHook | UNK | 4844 | 0.0276363 | 2.76363 |
00022 | 00022 | UNK | 2426 | 0.0138409 | 1.38409 |
00200 | -RHStaple | Allosteric | 16146 | 0.092117 | 9.2117 |
00202 | 00202 | UNK | 1396 | 0.00796454 | 0.796454 |
00220 | 00220 | UNK | 7238 | 0.0412946 | 4.12946 |
00222 | 00222 | UNK | 6658 | 0.0379856 | 3.79856 |
02000 | 02000 | UNK | 7104 | 0.0405301 | 4.05301 |
02002 | 02002 | UNK | 8044 | 0.0458931 | 4.58931 |
02020 | -LHStaple | UNK | 3154 | 0.0179944 | 1.79944 |
02022 | 02022 | UNK | 1146 | 0.00653822 | 0.653822 |
02200 | -RHHook | UNK | 7115 | 0.0405929 | 4.05929 |
02202 | 02202 | UNK | 1021 | 0.00582507 | 0.582507 |
02220 | -RHSpiral | UNK | 8989 | 0.0512845 | 5.12845 |
02222 | 02222 | UNK | 7641 | 0.0435939 | 4.35939 |
20000 | ±LHSpiral | UNK | 5007 | 0.0285662 | 2.85662 |
20002 | +LHSpiral | UNK | 1611 | 0.00919117 | 0.919117 |
20020 | ±LHHook | UNK | 1258 | 0.00717721 | 0.717721 |
20022 | +LHHook | UNK | 823 | 0.00469542 | 0.469542 |
20200 | ±RHStaple | UNK | 745 | 0.00425042 | 0.425042 |
20202 | +RHStaple | UNK | 538 | 0.00306943 | 0.306943 |
20220 | ±RHHook | Catalytic | 1907 | 0.0108799 | 1.08799 |
20222 | 20222 | UNK | 1159 | 0.00661239 | 0.661239 |
22000 | -/+LHHook | UNK | 3652 | 0.0208356 | 2.08356 |
22002 | 22002 | UNK | 2052 | 0.0117072 | 1.17072 |
22020 | ±LHStaple | UNK | 1791 | 0.0102181 | 1.02181 |
22022 | +LHStaple | UNK | 579 | 0.00330334 | 0.330334 |
22200 | -/+RHHook | UNK | 8169 | 0.0466062 | 4.66062 |
22202 | +RHHook | UNK | 895 | 0.0051062 | 0.51062 |
22220 | ±RHSpiral | UNK | 3581 | 0.0204305 | 2.04305 |
22222 | +RHSpiral | UNK | 8254 | 0.0470912 | 4.70912 |
89 def __init__(self, loader, verbose=True) -> None: 90 self.verbose = verbose 91 self.binaryclass_dict = {} 92 self.binaryclass_df = None 93 self.eightclass_df = None 94 self.eightclass_dict = {} 95 self.consensus_binary_list = None 96 self.consensus_oct_list = None 97 98 if self.verbose: 99 _logger.info( 100 "Loading binary consensus structure list from %s", SS_CONSENSUS_BIN_FILE 101 ) 102 self.consensus_binary_list = self.load_consensus_file(oct=False) 103 104 if self.verbose: 105 _logger.info( 106 "Loading octant consensus structure list from %s", SS_CONSENSUS_OCT_FILE 107 ) 108 self.consensus_oct_list = self.load_consensus_file(oct=True) 109 110 self.build_classes(loader)
136 def load_consensus_file(self, fpath=Path(DATA_DIR), oct=True) -> DisulfideList: 137 """Load the consensus file from the specified file.""" 138 139 res = None 140 if oct: 141 fname = fpath / SS_CONSENSUS_OCT_FILE 142 else: 143 fname = fpath / SS_CONSENSUS_BIN_FILE 144 145 if not fname.exists(): 146 _logger.error("Cannot find file %s", fname) 147 raise FileNotFoundError(f"Cannot find file {fname}") 148 149 with open(fname, "rb") as f: 150 res = pickle.load(f) 151 return res
Load the consensus file from the specified file.
153 def build_class_df(self, class_df, group_df): 154 """Build a new DataFrame from the input DataFrames.""" 155 ss_id_col = group_df["ss_id"] 156 result_df = pd.concat([class_df, ss_id_col], axis=1) 157 return result_df
Build a new DataFrame from the input DataFrames.
159 def class_to_sslist(self, clsid: str, base=8) -> np.ndarray: 160 """ 161 Return the list of disulfides corresponding to the input `clsid`. 162 This list is a list of disulfide identifiers, not the Disulfide objects themselves. 163 164 :param clsid: The class name to extract. Must be a string 165 in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates 166 binary or octant classes, respectively. 167 :type clsid: str 168 :param base: The base class to use, 2 or 8. Default is 8. 169 :type base: int 170 :param verbose: If True, display progress bars, by default False. 171 :type verbose: bool 172 :return: The list of disulfide bonds from the class. NB: this is the list 173 of disulfide identifiers, not the Disulfide objects themselves. 174 :rtype: DisulfideList 175 :raises ValueError: If an invalid base value is provided. 176 :raises KeyError: If the clsid is not found in the dictionary. 177 """ 178 cls = clsid[:5] 179 180 if not isinstance(clsid, str): 181 _logger.error("Invalid class ID: %s", clsid) 182 return np.array([]) 183 184 match len(clsid): 185 case 6: 186 match clsid[-1]: 187 case "b": 188 eightorbin = self.binaryclass_dict 189 case "o": 190 eightorbin = self.eightclass_dict 191 case _: 192 _logger.error("Invalid class ID suffix: %s", clsid) 193 return np.array([]) 194 195 case 5: 196 match base: 197 case 8: 198 eightorbin = self.eightclass_dict 199 case 2: 200 eightorbin = self.binaryclass_dict 201 case _: 202 _logger.error("Invalid base: %d", base) 203 return np.array([]) 204 case _: 205 _logger.error("Invalid class ID length: %s", clsid) 206 return np.array([]) 207 208 try: 209 ss_ids = eightorbin[cls] 210 211 except KeyError: 212 _logger.error("Cannot find key %s in SSBond DB", clsid) 213 return np.array([]) 214 215 return ss_ids
Return the list of disulfides corresponding to the input clsid
.
This list is a list of disulfide identifiers, not the Disulfide objects themselves.
Parameters
- clsid: The class name to extract. Must be a string in the format '11111' or '11111b' or '11111o'. The suffix 'b' or 'o' indicates binary or octant classes, respectively.
- base: The base class to use, 2 or 8. Default is 8.
- verbose: If True, display progress bars, by default False.
Returns
The list of disulfide bonds from the class. NB: this is the list of disulfide identifiers, not the Disulfide objects themselves.
Raises
- ValueError: If an invalid base value is provided.
- KeyError: If the clsid is not found in the dictionary.
217 def list_classes(self, base=2): 218 """ 219 List the Disulfide structural classes. 220 221 :param self: The instance of the DisulfideClass_Constructor class. 222 :type self: DisulfideClass_Constructor 223 :param base: The base class to use, 2 or 8. 224 :type base: int 225 :return: None 226 :rtype: None 227 :raises ValueError: If an invalid base value is provided. 228 """ 229 match base: 230 case 2: 231 for k, v in enumerate(self.binaryclass_dict): 232 print(f"Class: |{k}|, |{v}|") 233 case 8: 234 for k, v in enumerate(self.eightclass_dict): 235 print(f"Class: |{k}|, |{v}|") 236 case _: 237 raise ValueError("Invalid base. Must be 2 or 8.")
List the Disulfide structural classes.
Parameters
- self: The instance of the DisulfideClass_Constructor class.
- base: The base class to use, 2 or 8.
Returns
None
Raises
- ValueError: If an invalid base value is provided.
239 def concat_dataframes(self, df1, df2): 240 """ 241 Concatenates columns from one data frame into the other 242 and returns the new result. 243 244 Parameters 245 ---------- 246 df1 : pandas.DataFrame 247 The first data frame. 248 df2 : pandas.DataFrame 249 The second data frame. 250 251 Returns 252 ------- 253 pandas.DataFrame 254 The concatenated data frame. 255 256 """ 257 # Merge the data frames based on the 'SS_Classname' column 258 result = pd.merge(df1, df2, on="class_id") 259 260 return result
Concatenates columns from one data frame into the other and returns the new result.
Parameters
df1 : pandas.DataFrame The first data frame. df2 : pandas.DataFrame The second data frame.
Returns
pandas.DataFrame The concatenated data frame.
262 def binary_to_class(self, class_str: str, base: int = 8) -> list: 263 """ 264 Convert a binary input string to a list of possible class strings based on the specified base. 265 266 Returns a list of all possible combinations of ordinal sections of a unit circle 267 divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise, 268 based on the sign of each angle in the input string. 269 270 :param class_str: A string of length 5, where each character represents the sign 271 of an angle in the range of -180-180 degrees. 272 :type class_str: str 273 :param base: The base class to use, 6 or 8. 274 :type base: int 275 :return: A list of strings of length 5, representing all possible class strings. 276 :rtype: list 277 :raises ValueError: If an invalid base value is provided. 278 """ 279 match base: 280 case 6: 281 angle_maps = {"0": ["4", "5", "6"], "2": ["1", "2", "3"]} 282 case 8: 283 angle_maps = {"0": ["5", "6", "7", "8"], "2": ["1", "2", "3", "4"]} 284 case _: 285 raise ValueError("Invalid base value. Must be 6 or 8.") 286 287 class_lists = [angle_maps[char] for char in class_str] 288 class_combinations = itertools.product(*class_lists) 289 class_strings = ["".join(combination) for combination in class_combinations] 290 return class_strings
Convert a binary input string to a list of possible class strings based on the specified base.
Returns a list of all possible combinations of ordinal sections of a unit circle divided into the specified number of equal segments, originating at 0 degrees, rotating counterclockwise, based on the sign of each angle in the input string.
Parameters
- class_str: A string of length 5, where each character represents the sign of an angle in the range of -180-180 degrees.
- base: The base class to use, 6 or 8.
Returns
A list of strings of length 5, representing all possible class strings.
Raises
- ValueError: If an invalid base value is provided.
292 def build_classes(self, loader) -> None: 293 """ 294 Build the internal structures needed for the binary and octant disulfide structural classes 295 based on dihedral angle rules. 296 297 :param loader: The DisulfideLoader object containing the data. 298 :type loader: DisulfideLoader 299 :return: None 300 :rtype: None 301 """ 302 303 self.version = __version__ 304 305 tors_df = loader.getTorsions() 306 307 if self.verbose: 308 _logger.info("Creating binary SS classes...") 309 310 grouped = self.create_binary_classes(tors_df) 311 312 class_df = pd.read_csv( 313 StringIO(SS_CLASS_DEFINITIONS), 314 dtype={ 315 "class_id": "string", 316 "FXN": "string", 317 "SS_Classname": "string", 318 }, 319 ) 320 class_df["FXN"].str.strip() 321 class_df["SS_Classname"].str.strip() 322 class_df["class_id"].str.strip() 323 324 merged = self.concat_dataframes(class_df, grouped) 325 merged.drop( 326 columns=["Idx", "chi1_s", "chi2_s", "chi3_s", "chi4_s", "chi5_s"], 327 inplace=True, 328 ) 329 330 classdict = dict(zip(merged["class_id"], merged["ss_id"])) 331 self.binaryclass_dict = classdict 332 self.binaryclass_df = merged.copy() 333 334 if self.verbose: 335 _logger.info("Creating eightfold SS classes...") 336 337 grouped_eightclass = self.create_classes(tors_df, 8) 338 self.eightclass_df = grouped_eightclass.copy() 339 self.eightclass_dict = dict( 340 zip(grouped_eightclass["class_id"], grouped_eightclass["ss_id"]) 341 ) 342 343 if self.verbose: 344 _logger.info("Initialization complete.") 345 346 return
Build the internal structures needed for the binary and octant disulfide structural classes based on dihedral angle rules.
Parameters
- loader: The DisulfideLoader object containing the data.
Returns
None
348 def create_binary_classes(self, df) -> pd.DataFrame: 349 """ 350 Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping. 351 352 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 353 'cb_distance', 'torsion_length', and 'energy'. 354 :return: A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id' 355 is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that 356 grouping, and 'count' is the number of rows in that grouping. 357 """ 358 # Create new columns with the sign of each chi column 359 chi_columns = ["chi1", "chi2", "chi3", "chi4", "chi5"] 360 sign_columns = [col + "_s" for col in chi_columns] 361 df[sign_columns] = df[chi_columns].applymap(lambda x: 1 if x >= 0 else -1) 362 363 # Create a new column with the class ID for each row 364 class_id_column = "class_id" 365 df[class_id_column] = (df[sign_columns] + 1).apply( 366 lambda x: "".join(x.astype(str)), axis=1 367 ) 368 369 # Group the DataFrame by the class ID and return the grouped data 370 grouped = df.groupby(class_id_column)["ss_id"].unique().reset_index() 371 grouped["count"] = grouped["ss_id"].apply(len) 372 grouped["incidence"] = grouped["count"] / len(df) 373 grouped["percentage"] = grouped["incidence"] * 100 374 375 return grouped
Group the DataFrame by the sign of the chi columns and create a new class ID column for each unique grouping.
Parameters
- df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', and 'energy'.
Returns
A pandas DataFrame containing columns 'class_id', 'ss_id', and 'count', where 'class_id' is a unique identifier for each grouping of chi signs, 'ss_id' is a list of all 'ss_id' values in that grouping, and 'count' is the number of rows in that grouping.
377 def create_classes(self, df, base=8) -> pd.DataFrame: 378 """ 379 Create a new DataFrame from the input with a 8-class encoding for input 'chi' values. 380 381 The function takes a pandas DataFrame containing the following columns: 382 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 383 'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules: 384 385 1. A new column named `class_id` is added, which is the concatenation of the individual class IDs per Chi. 386 2. The DataFrame is grouped by the `class_id` column, and a new DataFrame is returned that shows the unique `ss_id` values for each group, 387 the count of unique `ss_id` values, the incidence of each group as a proportion of the total DataFrame, and the 388 percentage of incidence. 389 390 :param df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 391 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho' 392 :return: The grouped DataFrame with the added class column. 393 """ 394 395 _df = pd.DataFrame() 396 if base == 6: 397 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 398 _df[col_name + "_t"] = df[col_name].apply( 399 DisulfideClass_Constructor.get_sixth_quadrant 400 ) 401 elif base == 8: 402 for col_name in ["chi1", "chi2", "chi3", "chi4", "chi5"]: 403 _df[col_name + "_t"] = df[col_name].apply( 404 DisulfideClass_Constructor.get_eighth_quadrant 405 ) 406 else: 407 raise ValueError("Base must be either 6 or 8") 408 409 df["class_id"] = _df[["chi1_t", "chi2_t", "chi3_t", "chi4_t", "chi5_t"]].agg( 410 "".join, axis=1 411 ) 412 413 grouped = df.groupby("class_id").agg({"ss_id": "unique"}) 414 grouped["count"] = grouped["ss_id"].str.len() 415 grouped["incidence"] = grouped["count"] / len(df) 416 grouped["percentage"] = grouped["incidence"] * 100 417 grouped.reset_index(inplace=True) 418 419 return grouped
Create a new DataFrame from the input with a 8-class encoding for input 'chi' values.
The function takes a pandas DataFrame containing the following columns: 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho', and adds a class ID column based on the following rules:
- A new column named
class_id
is added, which is the concatenation of the individual class IDs per Chi. - The DataFrame is grouped by the
class_id
column, and a new DataFrame is returned that shows the uniquess_id
values for each group, the count of uniquess_id
values, the incidence of each group as a proportion of the total DataFrame, and the percentage of incidence.
Parameters
- df: A pandas DataFrame containing columns 'ss_id', 'chi1', 'chi2', 'chi3', 'chi4', 'chi5', 'ca_distance', 'cb_distance', 'torsion_length', 'energy', and 'rho'
Returns
The grouped DataFrame with the added class column.
421 def filter_class_by_percentage(self, cutoff: float, base: int = 8) -> pd.DataFrame: 422 """ 423 Filter the specified class definitions by percentage. 424 425 :param cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output 426 :param base: An optional integer specifying the class type to filter, defaults to 8 427 :return: A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff 428 :rtype: pandas.DataFrame 429 """ 430 431 match base: 432 case 8: 433 df = self.eightclass_df 434 case 2: 435 df = self.binaryclass_df 436 case _: 437 raise ValueError("Invalid base. Must be 6 or 8.") 438 439 return df[df["percentage"] >= cutoff].copy()
Filter the specified class definitions by percentage.
Parameters
- cutoff: A numeric value specifying the minimum percentage required for a row to be included in the output
- base: An optional integer specifying the class type to filter, defaults to 8
Returns
A new Pandas DataFrame containing only rows where the percentage is greater than or equal to the cutoff
441 @staticmethod 442 def get_binary_quadrant(angle_deg): 443 """ 444 Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments. 445 446 :param angle_deg (float or array-like): The angle in degrees. 447 448 Returns: 449 :return str or array-like: The binary quadrant (0 or 2) that the angle belongs to. 450 """ 451 angle_deg = ( 452 np.array(angle_deg) % 360 453 ) # Normalize the angle to the range [0, 360) 454 455 if np.isscalar(angle_deg): 456 if angle_deg >= 0 and angle_deg < 180: 457 return str(2) 458 459 if angle_deg >= 180 and angle_deg < 360: 460 return str(0) 461 462 raise ValueError( 463 "Invalid angle value: angle must be in the range [-360, 360)." 464 ) 465 466 quadrants = np.where((angle_deg >= 0) & (angle_deg < 180), "2", "0") 467 return "".join(quadrants)
Return the binary quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 2 equal segments.
Parameters
- angle_deg (float or array-like): The angle in degrees.
Returns:
Returns
The binary quadrant (0 or 2) that the angle belongs to.
469 @staticmethod 470 def get_sixth_quadrant(angle_deg): 471 """ 472 Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments. 473 474 :param angle_deg (float or array-like): The angle in degrees. 475 476 Returns: 477 :return str or array-like: The sixth quadrant (1 to 6) that the angle belongs to. 478 """ 479 angle_deg = ( 480 np.array(angle_deg) % 360 481 ) # Normalize the angle to the range [0, 360) 482 483 if np.isscalar(angle_deg): 484 if angle_deg >= 0 and angle_deg < 60: 485 return str(6) 486 elif angle_deg >= 60 and angle_deg < 120: 487 return str(5) 488 elif angle_deg >= 120 and angle_deg < 180: 489 return str(4) 490 elif angle_deg >= 180 and angle_deg < 240: 491 return str(3) 492 elif angle_deg >= 240 and angle_deg < 300: 493 return str(2) 494 elif angle_deg >= 300 and angle_deg < 360: 495 return str(1) 496 else: 497 raise ValueError( 498 "Invalid angle value: angle must be in the range [-360, 360)." 499 ) 500 else: 501 quadrants = np.empty(angle_deg.shape, dtype=str) 502 quadrants[(angle_deg >= 0) & (angle_deg < 60)] = "6" 503 quadrants[(angle_deg >= 60) & (angle_deg < 120)] = "5" 504 quadrants[(angle_deg >= 120) & (angle_deg < 180)] = "4" 505 quadrants[(angle_deg >= 180) & (angle_deg < 240)] = "3" 506 quadrants[(angle_deg >= 240) & (angle_deg < 300)] = "2" 507 quadrants[(angle_deg >= 300) & (angle_deg < 360)] = "1" 508 return "".join(quadrants)
Return the sixth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 6 equal segments.
Parameters
- angle_deg (float or array-like): The angle in degrees.
Returns:
Returns
The sixth quadrant (1 to 6) that the angle belongs to.
510 @staticmethod 511 def get_eighth_quadrant(angle_deg): 512 """ 513 Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments. 514 515 :param angle_deg (float or array-like): The angle in degrees. 516 517 Returns: 518 :return str or array-like: The eighth quadrant (1 to 8) that the angle belongs to. 519 """ 520 angle_deg = ( 521 np.array(angle_deg) % 360 522 ) # Normalize the angle to the range [0, 360) 523 524 if np.isscalar(angle_deg): 525 if angle_deg >= 0 and angle_deg < 45: 526 return str(8) 527 elif angle_deg >= 45 and angle_deg < 90: 528 return str(7) 529 elif angle_deg >= 90 and angle_deg < 135: 530 return str(6) 531 elif angle_deg >= 135 and angle_deg < 180: 532 return str(5) 533 elif angle_deg >= 180 and angle_deg < 225: 534 return str(4) 535 elif angle_deg >= 225 and angle_deg < 270: 536 return str(3) 537 elif angle_deg >= 270 and angle_deg < 315: 538 return str(2) 539 elif angle_deg >= 315 and angle_deg < 360: 540 return str(1) 541 else: 542 raise ValueError( 543 "Invalid angle value: angle must be in the range [-360, 360)." 544 ) 545 else: 546 quadrants = np.empty(angle_deg.shape, dtype=str) 547 quadrants[(angle_deg >= 0) & (angle_deg < 45)] = "8" 548 quadrants[(angle_deg >= 45) & (angle_deg < 90)] = "7" 549 quadrants[(angle_deg >= 90) & (angle_deg < 135)] = "6" 550 quadrants[(angle_deg >= 135) & (angle_deg < 180)] = "5" 551 quadrants[(angle_deg >= 180) & (angle_deg < 225)] = "4" 552 quadrants[(angle_deg >= 225) & (angle_deg < 270)] = "3" 553 quadrants[(angle_deg >= 270) & (angle_deg < 315)] = "2" 554 quadrants[(angle_deg >= 315) & (angle_deg < 360)] = "1" 555 return "".join(quadrants)
Return the eighth quadrant in which an angle in degrees lies if the area is described by dividing a unit circle into 8 equal segments.
Parameters
- angle_deg (float or array-like): The angle in degrees.
Returns:
Returns
The eighth quadrant (1 to 8) that the angle belongs to.
557 @staticmethod 558 def class_string_from_dihedral(*args, base=8) -> str: 559 """ 560 Return the class string for a set of dihedral angles, given the base. 561 562 :param args: One or five dihedral angles. 563 :param base: The base class to use, 2, 6, or 8. Defaults to 8. 564 :return: The class string for the input dihedral angles. 565 :rtype: str 566 :raises ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8. 567 """ 568 if len(args) not in [1, 5]: 569 raise ValueError("You must enter either 1 or 5 dihedral angles.") 570 571 if base not in [2, 6, 8]: 572 raise ValueError("Invalid base. Must be 2, 6, or 8.") 573 574 angles = np.array(args).flatten() 575 576 if len(angles) == 1: 577 match base: 578 case 2: 579 return DisulfideClass_Constructor.get_binary_quadrant(angles[0]) 580 case 6: 581 return DisulfideClass_Constructor.get_sixth_quadrant(angles[0]) 582 case 8: 583 return DisulfideClass_Constructor.get_eighth_quadrant(angles[0]) 584 case _: 585 raise ValueError("Invalid base. Must be 2, 6, or 8.") 586 587 elif len(angles) == 5: 588 match base: 589 case 2: 590 return DisulfideClass_Constructor.get_binary_quadrant(angles) 591 case 6: 592 return DisulfideClass_Constructor.get_sixth_quadrant(angles) 593 case 8: 594 return DisulfideClass_Constructor.get_eighth_quadrant(angles) 595 case _: 596 raise ValueError("Invalid base. Must be 2, 6, or 8.")
Return the class string for a set of dihedral angles, given the base.
Parameters
- args: One or five dihedral angles.
- base: The base class to use, 2, 6, or 8. Defaults to 8.
Returns
The class string for the input dihedral angles.
Raises
- ValueError: If the number of dihedral angles is not 1 or 5, or if the base is not 2, 6, or 8.
598 def sslist_from_classid(self, cls: str, base=8) -> pd.DataFrame: 599 """ 600 Return the 'ss_id' value in the given DataFrame that corresponds to the 601 input 'cls' string in the class description 602 """ 603 if base == 2: 604 df = self.binaryclass_df 605 elif base == 8: 606 df = self.eightclass_df 607 else: 608 raise ValueError("Invalid base. Must be 2 or 8.") 609 610 filtered_df = df[df["class_id"] == cls] 611 612 if len(filtered_df) == 0: 613 return None 614 615 if len(filtered_df) > 1: 616 raise ValueError(f"Multiple rows found for class_id '{cls}'") 617 618 return filtered_df.iloc[0]["ss_id"]
Return the 'ss_id' value in the given DataFrame that corresponds to the input 'cls' string in the class description
620 def class_to_binary(self, cls_str, base=8): 621 """ 622 Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees 623 into a string of 5 characters, where each character is either '0' if the corresponding input character represents a 624 negative angle or '2' if it represents a positive angle. 625 626 :param cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees. 627 :param base (int): The base of the ordinal section (6 or 8). 628 :return str: A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle. 629 """ 630 if base not in [6, 8]: 631 raise ValueError("Base must be either 6 or 8") 632 633 output_str = "" 634 for char in cls_str: 635 if base == 6: 636 if char in ["1", "2", "3"]: 637 output_str += "2" 638 elif char in ["4", "5", "6"]: 639 output_str += "0" 640 elif base == 8: 641 if char in ["1", "2", "3", "4"]: 642 output_str += "2" 643 elif char in ["5", "6", "7", "8"]: 644 output_str += "0" 645 return output_str
Return a string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees into a string of 5 characters, where each character is either '0' if the corresponding input character represents a negative angle or '2' if it represents a positive angle.
Parameters
- cls_str (str): A string of length 5 representing the ordinal section of a unit circle for an angle in range -180-180 degrees.
- base (int): The base of the ordinal section (6 or 8).
Returns
A string of length 5, where each character is either '0' or '2', representing the sign of the corresponding input angle.
647 def get_class_df(self, base=8): 648 """ 649 Get the Disulfide structural classes DataFrame. 650 651 :param base: The base class to use, either 2 or 8. Defaults to 8. 652 :type base: int 653 :return: A DataFrame containing the class_id, count, incidence, and percentage columns. 654 :rtype: pandas.DataFrame 655 :raises ValueError: If the base is not 2 or 8. 656 """ 657 columns = ["class_id", "count", "incidence", "percentage"] 658 match base: 659 case 2: 660 class_df = self.binaryclass_df 661 case 8: 662 class_df = self.eightclass_df 663 case _: 664 raise ValueError("Invalid base. Must be 2, or 8.") 665 666 result_df = class_df[columns] 667 return result_df
Get the Disulfide structural classes DataFrame.
Parameters
- base: The base class to use, either 2 or 8. Defaults to 8.
Returns
A DataFrame containing the class_id, count, incidence, and percentage columns.
Raises
- ValueError: If the base is not 2 or 8.