Computer Science homework help. 7.8 Exercises
1. Consider the traffic accident data set shown in Table 7.10.
Table 7,10. Traffic accident data set.
Weat
Condition
her Driver’s
Condition
Tlaffic
Violation
Seat belt Urash
Severity
Good
Good
Good
Good
Good
Good
Good
Alcohol-impaired
Sober
Sober
Sober
Sober
Alcohol-impaired
Alcohol-impaired
Sober
Alcohol-impaired
Sober
Alcohol-impaired
Sober
Exceed speed limit
None
Disobey stop sign
Exceed speed limit
Disobey traffic signal
Disobey stop sign
None
Disobey trafrc signal
None
Disobey traffic signal
Exceed speed limit
Disobey stop sign
No
Yes
Yes
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
Major
Minor
Minor
Major
Major
Minor
Major
Major
Major
Major
Major
Minor
(a) Show a binarized version of the data set.
(b) What is the maximum width of each transaction in the binarized data?
(c) Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?
(d) Create a data set that contains only the following asymmetric binary
attributes: (LJeather : Bad, Driver’s condition : Alcohol-impaired,
Traffic violation: Yes, Seat Belt : No, Crash Severity: t’tajor).
For Traffic violation, only None has a value of 0. The rest of the
attribute values are assigned to 1. Assuming that support threshold is
30%, how many candidate and frequent itemsets will be generated?
(e) Compare the number of candidate and frequent itemsets generated in
parts (c) and (d).
2. (a) Consider the data set shown in Table 7.11. Suppose we apply the following
discretization strategies to the continuous attributes of the data set.
Dl: Partition the range of each continuous attribute into 3 equal-sized
bins.
D2: Partition the range of each continuous attribute into 3 bins; where
each bin contains an eoual number of transactions
474 Chapter 7 Association Analysis: Advanced Concepts
Table 7.11, Data set for Exercise 2.
TID Temperature Pressure Alarm 1 Alarm 2 Alarm 3
I
2
3
4
o
r)
7
8
o
9l)
6D
103
97
80
100
83
86
101
1 105
1040
1090
1084
1038
1080
1025
1030
1 100
I
I
1
1
1
1
1
1
I
1
1
1
1
1
1
1
1
I
For each strategy, answer the following questions:
i. Construct a binarized version of the data set.
ii. Derive all the frequent itemsets having support > 30%.
(b) The continuous attribute can also be discretized using a clustering approach.
i. PIot a graph of temperature versus pressure for the data points shown
in Table 7.11.
ii. How many natural clusters do you observe from the graph? Assign
a label (Cr, Cr, etc.) to each cluster in the graph.
iii. What type of clustering algorithm do you think can be used to identify the clusters? State your reasons clearly.
iv. Replace the temperature and pressure attributes in Table 7.11 with
asymmetric binary attributes C1, C2, etc. Construct a transaction matrix using the new attributes (along with attributes Alarml,
Alarm2, and Alarm3).
v. Derive all the frequent itemsets having support > 30% from the binarized data.
Consider the data set shown in Table 7.I2. The first attribute is continuous,
while the remaining two attributes are asymmetric binary. A rule is considered
to be strong if its support exceeds 15% and its confidence exceeds 60%. The
data given in Table 7.12 supports the following two strong rules:
(i) {(1 < A < 2),8 : 1} —+ {C : 1}
(ii) {(5 < A < 8),8 :1} –+ {C : 1}
(a) Compute the support and confidence for both rules.
(b) To find the rules using the traditional Apriori algorithm, we need to
discretize the continuous attribute A. Suppose we apply the equal width

Computer Science homework help