Thursday, July 18, 2019

Record Merging and Matching and Data Mining

Record Merging and Matching and Data Mining



It frequently happens that different databases with personal information are combined to produce new data structures. Such combinations may be made in two ways. First, the records in two databases may be merged to produce new composite records. For instance, a credit card company may request information about its prospective customers from various databases (e.g., financial, medical, insurance), which are then combined into one large record. This combined record is clearly much more privacy-sensitive than the records that compose it, as the combined record may generate perceptions and suggest actions that would not have resulted from any of the individual records that make it up. Second, records in databases may be matched. Computer matching is the cross-checking in two or more unrelated databases for information that fits a certain profile in order to produce matching records or “hits”. Computer matching is used often by government agencies to detect possible instances of fraud or other crimes. For instance, ownership records of homes or motorized vehicles may be matched with records of welfare recipients to detect possible instances of welfare fraud. Computer matching has raised privacy concerns because it is normally done without the consent of the bearers of personal information that are involved. Moreover, matches rarely prove facts about persons but rather generate suspicions that require further investigation. In this way, record matching could promote stereotyping and lead to intrusive investigations. Data Mining is a technique that is usually defined over a single database. It is the process of automatically searching large volumes of data for patterns, using techniques like statistical analysis, machine learning and pattern recognition. When data mining takes place in databases containing personal information, the new information thus gained may be privacy sensitive or confidential even when the old information is not. It may for instance uncover patterns of behavior of persons that were not previously visible. Data mining may also be used to stereotype whole categories of individuals. For instance, a credit card company may use data mining on its customer database to discover that certain zip codes correlate strongly with loan defaults. It may then decide not to extend credit anymore to customers with these zip codes. In summary, data mining may violate individual privacy and may be used to stereotype whole categories of individuals. Ethical policies are needed to prevent this from happening.

No comments:

Post a Comment