Security Methods
Access Restriction
Query Set Restriction
Microaggregation
Data Perturbation
Output Perturbation
Auditing
Random Sampling
Access Restriction
Databases normally have different access levels for different types of users
User ID and passwords are the most common methods for restricting access
In a medical database:
Doctors/Healthcare Representative – full access to information
Researchers – only access to partial information (e.g. aggregate information)
24 trang |
Chia sẻ: candy98 | Lượt xem: 731 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Bài giảng Bảo mật CSDL - Chap 6: Security Methods for Statistical Databases, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Security Methods for Statistical DatabasesIntroductionStatistical Databases containing medical information are often used for researchSome of the data is protected by laws to help protect the privacy of the patientProper security precautions must be implemented to comply with laws and respect the sensitivity of the dataAccuracy vs. ConfidentialityAccuracy – Researchers want to extract accurate and meaningful dataConfidentiality – Patients, laws and database administrators want to maintain the privacy of patients and the confidentiality of their informationLawsHealth Insurance Portability and Accountability Act – HIPAA (Privacy Rule)Covered organizations must comply by April 14, 2003Designed to improve efficiency of healthcare system by using electronic exchange of data and maintaining security Covered entities (health plans, healthcare clearinghouses, healthcare providers) may not use or disclose protected information except as permitted or requiredPrivacy Rule establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current regulations and security precautionsHIPAA ComplianceCompanies offer 3rd Party Certification of covered entitiesSuch companies will check your company and associating companies for compliance with HIPAACan help with rapid implementation and compliance to HIPAA regulationsTypes of Statistical DatabasesStatic – a static database is made once and never changesExample: U.S. CensusDynamic – changes continuously to reflect real-time dataExample: most online research databasesSecurity MethodsAccess RestrictionQuery Set RestrictionMicroaggregationData PerturbationOutput PerturbationAuditingRandom SamplingAccess RestrictionDatabases normally have different access levels for different types of usersUser ID and passwords are the most common methods for restricting accessIn a medical database:Doctors/Healthcare Representative – full access to informationResearchers – only access to partial information (e.g. aggregate information)Query Set RestrictionA query-set size control can limit the number of records that must be in the result setAllows the query results to be displayed only if the size of the query set satisfies the conditionSetting a minimum query-set size can help protect against the disclosure of individual dataQuery Set RestrictionLet K represents the minimum number or records to be present for the query setLet R represents the size of the query setThe query set can only be displayed ifK RQuery Set RestrictionMicroaggregationRaw (individual) data is grouped into small aggregates before publicationThe average value of the group replaces each value of the individualData with the most similarities are grouped together to maintain data accuracyHelps to prevent disclosure of individual dataMicroaggregationNational Agricultural Statistics Service (NASS) publishes data about farmsTo protect against data disclosure, data is only released at the county levelFarms in each county are averaged together to maintain as much purity, yet still protect against disclosureMicroaggregationMicroaggregationData PerturbationPerturbed data is raw data with noise addedPro: With perturbed databases, if unauthorized data is accessed, the true value is not disclosed Con: Data perturbation runs the risk of presenting biased dataData PerturbationOutput PerturbationInstead of the raw data being transformed as in Data Perturbation, only the output or query results are perturbedThe bias problem is less severe than with data perturbationOutput PerturbationQueryQueryResultsResultsAuditingAuditing is the process of keeping track of all queries made by each userUsually done with up-to-date logsEach time a user issues a query, the log is checked to see if the user is querying the database maliciouslyRandom SamplingOnly a sample of the records meeting the requirements of the query are shownMust maintain consistency by giving exact same results to the same queryWeakness - Logical equivalent queries can result in a different query setComparison MethodsSecurity – possibility of exact disclosure, partial disclosure, robustnessRichness of Information – amount of non-confidential information eliminated, bias, precision, consistencyCosts – initial implementation cost, processing overhead per query, user educationThe following criteria are used to determine the most effective methods of statistical database security:A Comparison of MethodsMethodSecurityRichness of InformationCostsQuery-set RestrictionLowLow1LowMicroaggregationModerateModerateModerateData PerturbationHighHigh-ModerateLowOutput PerturbationModerateModerate-lowLowAuditingModerate-LowModerateHighSamplingModerateModerate-LowModerate1 Quality is low because a lot of information can be eliminated if the query does not meet the requirementsSourcesThis presentation is posted on Nabil R. ; Wortmann, John C.; Security-Control Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December 1989 ( HIPAA – ( incur Bernstein, Stephen W.; Impact of HIPAA on BioTech/Pharma Research: Rules of the Road ( Bureau; 3rd Party Testing (