Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions

  1. (PDF, 253 KB)
AuthorSearch for: ; Search for:
ConferenceFirst International Conference on Rough Sets and Knowledge Technology (RSKT 2006), July 24-26, 2006.
AbstractIn many domains, the data objects are described in terms of a large number of features. The pipelined data mining approach introduced in [12] using two clustering algorithms in combination with rough sets and extended with genetic programming, is investigated with the purpose of discovering important subsets of attributes in high dimensional data. Their classification ability is described in terms of both collections of rules and analytic functions obtained by genetic programming (gene expression programming). The Leader and several k-means algorithms are used as procedures for attribute set simplification of the information systems later presented to rough sets algorithms. Visual data mining techniques including virtual reality were used for inspecting results. The data mining process is setup using high throughput distributed computing techniques. This approach was applied to Breast Cancer gene expression data and it led to subsets of genes with high discrimination power with respect to the decision classes.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number48721
NPARC number8913666
Export citationExport as RIS
Report a correctionReport a correction
Record identifier951fba9e-760b-4970-95b5-e82fc7497ad6
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: