Each layer uses the same Sigmoid function from logistic regression classification. Each target in the network represents an activation function that takes the previous layer node outputs as inputs weighted by a separate matrix of parameters, $$\ \Theta$$ (with a capital T).
Each layer also adds a bias node, $$a_{0}^{(i)} = 1$$ where $$\ i$$ is the layer of the network.