Javier AGX Flashing - Board ID not recognized
Context
Our jetson device is a Xavier AGX system (Javier AGX DevKit 32GB) and we entered a bad state for our OS. We had some issues with the apt package manager, and had several other issues with the device related to this, like faulty drivers and other system issues. We decided to reflash the device to start from scratch and fix the issues.
We stared trying to reflash the device using the SDK Manager, but we encountered an error during the linux flashing
step.
Problem
When trying to flash our Xavier AGX system (Javier AGX DevKit 32GB) using the SDK Manager, the linux flashing
step fails with the following error:
This error doesnt specify the root cause of the problem, and since we had a bad fs state on the device, we suspected that the error was related to the previous problems. Also, other related users reported this error and relied on introspection with the EEPROM values after relying on the manual fs flashing process. First of all, in order to modify the EEPROM values, we needed a working system to access the i2c bus and modify the EEPROM values. Second, we didnt want to rely on modifying these values, since we could brick the device if we did something wrong.
Warning: THIS IS NOT A TUTORIAL. This is a posmortem document of the steps we followed to fix the problem. This is not a guide to fix the problem, but a guide to understand the problem and the steps we followed to fix it.
Step 1: Get a stable system working
If we were able to flash the device and get a stable jetpack running that was all we needed. We didnt even need to modify the EEPROM values.
The first conversation followed was the next one
This and reading further more about manually flashing the sample Tegra fs with the jetson linux in the device, we decided to try to flash the device manually. - https://docs.nvidia.com/jetson/archives/r35.5.0/DeveloperGuide/IN/QuickStart.html
Now, the hypothesis here was that if we could flash the device with a working system in the eMMC or the nvme, we could then use the SDK Manager to flash the device with the wanted jetpack. (We later found out that this was the incorrect approach, since the SDK Manager's error was actually related to the EEPROM values not being recognized)
Following the steps in this page with the following resources: - Jetson linux release - Tegra sample fs
We were able to successfully flash the device with a working system in the eMMC.
Observations:
-
You might see how in the tutorial steps, the last step is to use the
After the successful flashing, we thought that if there's a way to flash the "internal" storage, there might be a way to flash the nvme storage. After further reading, we found out that theflash.sh
script to flash the device, like so:flash.sh
can receive alsoexternal
andnvme0n1p1
. Which we saw as a miracle that we could later try if the SDK Manager failed again. -
Again follow with attention the command being used:
Thejetson-agx-xavier-devkit
is the board ID that the script uses to flash the device. This means that the script doesn't rely on anything else but user input to flash the device. This might have been a nice hint for use related to the SDK Manager error.
Step 2: Flash the device with the SDK Manager
After successfully flashing the device with a working system in the eMMC, we tried to flash the device with the SDK Manager again. As you might have guessed, the error persisted.
We then tried to flash the device with the flash.sh
script using the external
and nvme0n1p1
options. The flashing was "successful", but the SDK Manager still failed with the same error and the device was not booting. We learned for the helpfull comment in this conversation that the flash.sh
script DOESN'T flash the device with other storage than the eMMC.
Step 3: Just write the EEPROM values
We followed these two conversations with related issues:
And this EEPROM layout page
Now, at first we have to state two clear problems:
1. There's no actual way to know where the EEPROM corruption is. We knew that this COULD be the problem, but we didn't know for sure and we didn't knew where exactly in our EEPROM values the corruption was other than go through the whole EEPROM values and check them one by one.
2. The only way to modify the EEPROM values is to use the i2c
bus. This means that we needed a working system to access the EEPROM values and modify them.
Both of these problems meant that in order to fix the issue we needed to flash the device with a working system in the eMMC, change the EEPROM values and then try to flash the device with the SDK Manager.
Also, a nice resource given is the how to obtain the CRC Value of the EEPROM values here
def AddToCRC(b, crc):
b2 = b
if (b < 0):
b2 = b + 256
for i in range(8):
odd = ((b2^crc) & 1) == 1
crc >>= 1
b2 >>= 1
if (odd):
crc ^= 0x8C # This means crc ^= 140.
return crc
First of all we retreived the EEPROM values using i2cdump -y 0 0x50
and then we parsed the values and calculated the CRC value.
nvidia@nvidia:~$ sudo i2cdump -y 0 0x50
No size specified (using byte-data access)
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
00: 01 00 ff 00 48 0b 04 00 04 4d 00 00 00 00 00 00 ?...H??.?M......
10: 00 00 00 00 36 39 39 2d 38 32 38 38 38 2d 30 30 ....699-82888-00
20: 30 34 2d 34 30 30 20 4d 2e 30 00 00 00 00 00 00 04-400 M.0......
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
40: ff ff ff ff d6 a4 3a 2d b0 48 31 35 36 31 32 32 ....??:-?H156122
50: 31 30 30 34 36 38 36 00 00 00 00 00 00 00 00 00 1004686.........
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 00 00 00 00 00 4e 56 43 42 1c 00 4d 31 00 00 ......NVCB?.M1..
a0: ff ff ff ff ff ff ff ff ff ff ff ff d6 a4 3a 2d ............??:-
b0: b0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ?H..............
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 ...............?
We made a simple script to parse the values from the EEPROM and calculate the CRC value.
def AddToCRC(b, crc):
b2 = b
if (b < 0):
b2 = b + 256
for i in range(8):
odd = ((b2 ^ crc) & 1) == 1
crc >>= 1
b2 >>= 1
if (odd):
crc ^= 0x8C # This means crc ^= 140.
return crc
dump = """00: 01 00 ff 00 48 0b 04 00 04 4d 00 00 00 00 00 00 ?...H??.?M......
10: 00 00 00 00 36 39 39 2d 38 32 38 38 38 2d 30 30 ....699-82888-00
20: 30 34 2d 34 30 30 20 4d 2e 30 00 00 00 00 00 00 04-400 M.0......
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
40: ff ff ff ff d6 a4 3a 2d b0 48 31 35 36 31 32 32 ....??:-?H156122
50: 31 30 30 34 36 38 36 00 00 00 00 00 00 00 00 00 1004686.........
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 00 00 00 00 00 4e 56 43 42 1c 00 4d 31 00 00 ......NVCB?.M1..
a0: ff ff ff ff ff ff ff ff ff ff ff ff d6 a4 3a 2d ............??:-
b0: b0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ?H..............
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 ...............?""" # given by `i2cdump -y 0 0x50`
crc = 0
count = 0
for line in dump.split("\n"):
for byte in line.split(" ")[1:17]:
if count == 255:
break
print(f"{byte}", end=" ")
crc = AddToCRC(int(byte, 16), crc)
count += 1
print()
print(f"{crc:x}")
Now, you might see that the CRC value is f0
. And after running the script, we found out that the CRC value was f0
. We dont really know the values were corrupted but the CRC value was correct. But we knew that the EEPROM values were corrupted.
After a simple lookup in the EEPROM layout page we understood the layout and the values given. We saw that the board information was correctly stored (699-82888-0004-400 M.0
) and the CRC value was correct. We quickly saw that the https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/HR/JetsonEepromLayout.html#value-of-the-crc-8-byte:~:text=2%E2%80%933,of%20EEPROM%20data. values for the "Lenght of module ID" (2–3 254 Length of module ID data starting from this field to the end of EEPROM data.
) was 0xff
in our EEPROM values and should be 254 = 0xfe
.
Warning: This (
0xfe
) isnt the correct value for the "Lenght of module ID" field. This is just us following the layout table and the value given in it. THIS ISNT CORRECT.
We then modified the EEPROM values using the i2cset
command.
We then calculated the CRC value again and it was 0xe9
. We then modified the CRC value in the EEPROM values.
We then tried to flash the device with the SDK Manager again and it gave us the same error again.
After all of this, we thought that this was a mayor error with our board. But kept looking and investigating. We dig deep into what the sdk manager scripts for the jetpack instalation and saw that the error was related to reading the eeprom values. We even modified the scripts in order to asume that the board that we were flashing was the jetson-agx-xavier-devkit
board. After doing this, the SDK Manager was able to go forward with the flashing process, so we knew that the problem was related to the EEPROM values not being recognized by the SDK Manager.
As you might have seen in the Cannot reflash AGX device - incorrect EEPROM values? conversation. This user was able to do similar steps to the ones we did, with a simple difference. the value for the "Lenght of module ID" field was 0xfc
and not 0xfe
.
We repeated the steps changing the value of 0x02
to 0xfc
and the CRC value to 0xdb
. After doing this, the SDK Manager was able to flash the device successfully.
Thanks to the internal response
- Leonardo Llanas.
- Diego Hernandez.
- Gerardo Fregoso.
- Oscar Arreola.
Written by: Ivan Romero.