Hi, need to submit a 4000 words essay on the topic Designing a database to manage references for a protein structure data set using MySQL.
Download file to see previous pages…
The database should be able to perform frequent searches on structure resolution, author name and initials, institution name, and structure release date. It should also be able to frequently retrieve the information about journal articles, PDB code, and name of given protein structures. It should be able provide data in specified order for example, as per the resolution, or as per the number of articles and institution, or both resolution and number of articles and institution, or even as per a specified institution, etc.
The database is for storing and managing data regarding macromolecular structures that are partly derived from the Protein Data Bank (PDB). The data is provided by organizations that deposit, process, and distribute the information about protein sequences. Due to the complexity of this data care needs to be taken to ensure minimum data errors like missing data, size, alignment, propagation, ambiguity, and labeling.
The application should be designed such that, on a user level it provides data management across the various database domains shared, by using a schema that allows the required data processing. The data should be retrieved, modified, and saved from tables using queries. The data is manipulated through applications that access the database in the database management system. The data model of the database defines data structure and behavior.
Different aspects of the database are considered for creating models such as logical and physical model diagrams. It also gives detailed specifications of the attributes, rows and columns for tables, and files used to populate the database.
2. Logical schema of the database
The logical model is used to document the data. The defined schema components represent the navigation in the schema diagram. The logical schema is constructed as a model independent of the management system and other physical considerations.
The logical schema for the relational database design of the current database can be derived using normalization. Applying the normalization methods such as 1NF, 2NF, and 3NF obtains the physical schema result.
2.1 The Tables and their normalization process
I. 1st Normal form (1NF):
Table : Article
Here, the entity Article has multiple authors as more than one people can author one articleabout various protein data structures. Therefore, to reduce redundancy by normalization 2NF is performed.
II. 2nd Normal form (2NF):
Table : Article
Here, the entity Authors with its attributes have been created for the articles written by them. To further reduce redundancy, 3NF is performed.
III. 3rd Normal form (3NF):
Table : Article
Here, the protein table has been created for the entity Protein to reduce the redundancy. This was because the attribute for Authors had redundancy regarding the protein structure. many authors can work on the same protein structure and therefore this was necessary.
The logical structure of a database can be given graphically using an entity relationship (E-R) diagram.