Private data discovery for privacy compliance in collaborative environments

  1. (PDF, 412 KB)
  2. Get@NRC: Private data discovery for privacy compliance in collaborative environments (Opens in a new window)
DOIResolve DOI:
AuthorSearch for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for:
TypeBook Chapter
Proceedings titleCooperative Design, Visualization, and Engineering : 5th International Conference, CDVE 2008 Calvià, Mallorca, Spain, September 21-25, 2008 Proceedings
Series titleLecture Notes In Computer Science; Volume >5220
ConferenceCooperative Design, Visualization, and Engineering, 5th International Conference (CDVE 2008), September 21-25, 2008, Palma de Mallorca, Spain
Pages142150; # of pages: 9
Subjectcollaborative computing; privacy; compliance; text mining; machine learning; privacy management; personally identifiable information; confidentialité; conformité; exploration de texte; apprentissage automatique; gestion des renseignements personnels; information personnellement identifiable
AbstractWith the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.
Publication date
PublisherSpringer Berlin Heidelberg
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedYes
NRC number50386
NPARC number8914078
Export citationExport as RIS
Report a correctionReport a correction
Record identifier5007fa13-e850-4388-a48c-7a4fb76ccedb
Record created2009-04-22
Record modified2016-07-15
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: