| 1 | initial version |
As I understand your problem, you have:

Unless you know exactly the 3D coordinate of each chessboard corner with respect to the AR marker frame and unless the chessboard doesn't move, I don't see why it doesn't work.
If one of the two previous conditions is false, the setup should be in my opinion:

To obtain the transformation between the first camera and the marker:

Anyway, a figure could help you if I did not understand your problem.
| 2 | No.2 Revision |
(Note this answer was made before you add all your edit remarks.)
I think that a figure with all the transformation matrix and with all the frames could really help (you and/or us).
As I understand your problem, you have:

Unless you know exactly the 3D coordinate of each chessboard corner with respect to the AR marker frame and unless the chessboard doesn't move, I don't see why it doesn't work.
If one of the two previous conditions is false, the setup should be in my opinion:

To obtain the transformation between the first camera and the marker:

Anyway, a figure could help you if I did not understand your problem.