EDM4hep.jl EDM I/O Tutorial
This tutorial shows how to read EDM4hep data files and perform some simple analysis
You can also download this tutorial as a Jupyter notebook and a plain Julia source file.
Table of contents
Loading the necessary modules
- We will use the
EDM4hepmodule that defines all the EDM4hep types and its relations and links. - We will use the
FHistmodule to create histograms - We will use the
Plotsmodule to plot the histograms
If these modules are not installed, you can install them by running the following commands:
using Pkg
Pkg.add("EDM4hep")
Pkg.add("FHist")
Pkg.add("Plots")using EDM4hep
using EDM4hep.RootIO
using Base.Iterators: partition, take
using FHist
using Plots: plot, scatter, plot!, themeReading an EDM4hep file
We will read an EDM4hep file with the RootIO.Reader function. This function returns a reader object that can be used to access the events in the file. The input file is a ROOT file with the EDM4hep data model and is located to the path ttbar_edm4hep_digi.root.
finput = joinpath(@__DIR__,"../../examples" ,"ttbar_edm4hep_digi.root")"/home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../examples/ttbar_edm4hep_digi.root"We create a reader object to access the events in the file. The object displays a summary table of the content of the file.
reader = RootIO.Reader(finput)┌────────────────┬──────────────────────────────────────────────────────────────
│ Attribute │ Value ⋯
├────────────────┼──────────────────────────────────────────────────────────────
│ File Name(s) │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../ex ⋯
│ # of events │ 25 ⋯
│ IO Format │ TTree ⋯
│ EDM4hep schema │ 1.0.0 ⋯
│ PODIO version │ 0.17.1 ⋯
│ ROOT version │ 6.28.4 ⋯
└────────────────┴──────────────────────────────────────────────────────────────
1 column omitted
Accessing the events
The TTree called events contains the events in the file. We can access the events by using the RootIO.get function with the reader object and the name of the TTree.
events = RootIO.get(reader, "events");┌ Warning: Type MCRecoTrackerHitPlaneAssociation not found in EDM4hep module
└ @ EDM4hep.RootIO ~/work/EDM4hep.jl/EDM4hep.jl/src/RootIO.jl:335If you get warnings is because there is a mismatch between the current schema version of EDM4hep and version when the file was written. The default behavior is ignore old types and set to zero the attributes that do not exists in the file.
We can access the first event in the file by using the index 1 in the events array.
evt = events[1];the evt object is a UnROOT.LazyEvent. Each leaf (column) can be accessed directlyt if you know the name of the leaf. Typically the name is of the form <collection_name>_<field_name>. The full list of names can be obtained by calling the names function on the events object. For example:
for n in names(events) |> sort!
startswith(n, "ECalBarrelCollection") && println(n)
endECalBarrelCollection_cellID
ECalBarrelCollection_contributions_begin
ECalBarrelCollection_contributions_end
ECalBarrelCollection_energy
ECalBarrelCollection_position_x
ECalBarrelCollection_position_y
ECalBarrelCollection_position_zThere is not need to access individual columns like this. The RootIO.get function can be used to access the collections directly.
Accessing the collections
The available collections in the event can be obtained by displaying the reader object.
show(reader)┌────────────────┬──────────────────────────────────────────────────────────────
│ Attribute │ Value ⋯
├────────────────┼──────────────────────────────────────────────────────────────
│ File Name(s) │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../ex ⋯
│ # of events │ 25 ⋯
│ IO Format │ TTree ⋯
│ EDM4hep schema │ 1.0.0 ⋯
│ PODIO version │ 0.17.1 ⋯
│ ROOT version │ 6.28.4 ⋯
└────────────────┴──────────────────────────────────────────────────────────────
1 column omitted
┌─────────────────────────────────┬─────────────────────────────┬──────────────┐
│ BranchName │ Type │ CollectionID │
├─────────────────────────────────┼─────────────────────────────┼──────────────┤
│ AllCaloHitContributionsCombined │ edm4hep!CaloHitContribution │ 0x3a25675d │
│ BeamCalCollection │ edm4hep!SimCalorimeterHit │ 0xc298a348 │
│ ECalBarrelCollection │ edm4hep!SimCalorimeterHit │ 0x407eb17b │
│ ECalEndcapCollection │ edm4hep!SimCalorimeterHit │ 0x0c09a156 │
│ ECalPlugCollection │ edm4hep!SimCalorimeterHit │ 0xd36e0cf8 │
│ EventHeader │ edm4hep!EventHeader │ 0xd793ab91 │
│ HCalBarrelCollection │ edm4hep!SimCalorimeterHit │ 0x6daf3b41 │
│ HCalEndcapCollection │ edm4hep!SimCalorimeterHit │ 0x50d90aed │
│ HCalRingCollection │ edm4hep!SimCalorimeterHit │ 0xad7abfca │
│ InnerTrackerBarrelCollection │ edm4hep!SimTrackerHit │ 0xf57448f3 │
│ InnerTrackerEndcapCollection │ edm4hep!SimTrackerHit │ 0x6d724693 │
│ LumiCalCollection │ edm4hep!SimCalorimeterHit │ 0x45759015 │
│ MCParticle │ edm4hep!MCParticle │ 0x0620d570 │
│ OuterTrackerBarrelCollection │ edm4hep!SimTrackerHit │ 0x083cc03d │
│ OuterTrackerEndcapCollection │ edm4hep!SimTrackerHit │ 0x1450dff0 │
│ VXDTrackerHits │ edm4hep!TrackerHitPlane │ 0xdb43b881 │
│ ⋮ │ ⋮ │ ⋮ │
└─────────────────────────────────┴─────────────────────────────┴──────────────┘
4 rows omittedThe RootIO.get function can be used to access the collections in the event. The function takes the reader object, the event object and the name of the collection as arguments.
calo = RootIO.get(reader, evt, "AllCaloHitContributionsCombined");
hits = RootIO.get(reader, evt, "ECalBarrelCollection");
mcps = RootIO.get(reader, evt, "MCParticle");┌ Warning: Column AllCaloHitContributionsCombined_stepLength not found in the tree
└ @ EDM4hep.RootIO ~/work/EDM4hep.jl/EDM4hep.jl/src/RootIO.jl:196
┌ Warning: Column MCParticle_helicity not found in the tree
└ @ EDM4hep.RootIO ~/work/EDM4hep.jl/EDM4hep.jl/src/RootIO.jl:196Relationships will automatically be setup if the related collections will be also be accessed. For example, in this case the relation of SimCalorimeterHits from the collection ECalBarrelCollection to CaloHitContributions from AllCaloHitContributionsCombined will be filled.
The calo object is a EDM4hep.EDCollection object that contains the calorimeter hit contributions. The hits object is a EDM4hep.EDCollection object that contains the hits in the ECal barrel. The mcps object is a EDM4hep.EDCollection object that contains the MCParticles.
We can now print some information about the collections.
for hit in take(hits, 20)
println("ECAL Hit $(hit.index) has energy $(hit.energy) at position $(hit.position) with $(length(hit.contributions)) contributions")
endECAL Hit #1 has energy 0.0002169429 at position (-1487.2666, 428.77875, -1224.0) with 1 contributions
ECAL Hit #2 has energy 0.00031143325 at position (-1489.09, 435.72046, -1229.1) with 1 contributions
ECAL Hit #3 has energy 0.00032750145 at position (-1490.9135, 442.6622, -1234.2) with 1 contributions
ECAL Hit #4 has energy 0.0010127309 at position (-1490.9135, 442.6622, -1239.3) with 1 contributions
ECAL Hit #5 has energy 0.0010511694 at position (-1486.54, 440.1372, -1244.4) with 1 contributions
ECAL Hit #6 has energy 0.0012481669 at position (-1482.1666, 437.6122, -1249.5) with 1 contributions
ECAL Hit #7 has energy 0.0011303278 at position (-1495.2869, 445.1872, -1234.2) with 1 contributions
ECAL Hit #8 has energy 0.0015850706 at position (-1499.6603, 447.7122, -1239.3) with 1 contributions
ECAL Hit #9 has energy 0.0027348632 at position (-1501.4838, 454.65393, -1244.4) with 1 contributions
ECAL Hit #10 has energy 2.1748384e-7 at position (-1538.6509, 440.77856, -1208.7) with 1 contributions
ECAL Hit #11 has energy 0.00033324969 at position (-1607.1865, 574.5712, -1290.3) with 11 contributions
ECAL Hit #12 has energy 0.00017833424 at position (-1604.6366, 578.988, -1290.3) with 6 contributions
ECAL Hit #13 has energy 0.00012392712 at position (-1624.6803, 584.6712, -1315.8) with 1 contributions
ECAL Hit #14 has energy 2.5159894e-5 at position (-1516.0571, 439.51202, -1234.2) with 1 contributions
ECAL Hit #15 has energy 8.835346e-6 at position (-1676.4069, 484.97818, -1259.7) with 1 contributions
ECAL Hit #16 has energy 1.932569e-5 at position (-1481.44, 448.97067, -1264.8) with 1 contributions
ECAL Hit #17 has energy 3.2642229e-6 at position (1517.5103, -416.7951, 928.2) with 1 contributions
ECAL Hit #18 has energy 4.4099096e-8 at position (-1471.254, 537.31335, -1295.4) with 1 contributions
ECAL Hit #19 has energy 0.00010116091 at position (-1536.8274, 433.83682, -1300.5) with 1 contributions
ECAL Hit #20 has energy 0.00034957606 at position (1552.9, 321.3, -372.3) with 1 contributionsLets check whether the total energy in the hits is the same as the sum of all the calorimeter contributions. This an example to show the expressivity of the Julia language. We can construct a StructArray with all the related hit contributions and use the sum function to sum the energy of all the contributions as a column of the constructed SoA.
for hit in hits
StructArray(hit.contributions).energy |> sum |> c -> c ≈ hit.energy || println("Hit $(hit.index) has energy $(hit.energy) and contributions $(c)")
endDrawing the EDMM4hep data
The following shows how easy is to draw EDM4hep event using the Plots module. In this example we want to plot the calorimeter hits in the space with dots with the size proportional to the energy of the hit.
maxE = hits.energy |> maximum
scatter(hits.position.x, hits.position.y, hits.position.z,
markersize = (hits.energy/maxE)*10, color = :blue)The projection in the X-Y plane is shown below.
scatter(hits.position.x, hits.position.y, markersize = (hits.energy/maxE)*10, color = :blue)Accessing the hit attributes as columns of the hits object is very efficient. Lets verify it. Using the @btime macro from the BenchmarkTools module we can measure the time to access the energy column.
using BenchmarkTools
@benchmark hits.energy |> maximumBenchmarkTools.Trial: 10000 samples with 204 evaluations per sample.
Range (min … max): 379.333 ns … 186.035 μs ┊ GC (min … max): 0.00% … 99.58%
Time (median): 387.093 ns ┊ GC (median): 0.00%
Time (mean ± σ): 471.092 ns ± 2.062 μs ┊ GC (mean ± σ): 9.52% ± 3.75%
▆█▆▃▁▁ ▃▃▁ ▂▃▂▃▄▄▃▄▂ ▂
██████▇▆▄▁▁▁▃▇███▇▇██▆▅▅▄▄▄▄▄▄▁▄█████████▇▆▄▅▅█▇▇██▇█▇▆▅▅▄▄▃▅ █
379 ns Histogram: log(frequency) by time 623 ns <
Memory estimate: 400 bytes, allocs estimate: 3.The time to access the energy column is very small.
@benchmark begin
_max = 0.0f0
for h in hits
_max = max(_max, h.energy)
end
_max
endBenchmarkTools.Trial: 6831 samples with 1 evaluation per sample.
Range (min … max): 516.174 μs … 43.866 ms ┊ GC (min … max): 0.00% … 98.40%
Time (median): 548.865 μs ┊ GC (median): 0.00%
Time (mean ± σ): 729.496 μs ± 1.349 ms ┊ GC (mean ± σ): 14.03% ± 8.47%
▅▇█▇▅▂▁ ▁▁▁▁▁ ▃▅▆▃ ▂
███████████████▇▆▇▆▇▅▆▆▆▅▅▅▅▅▆▅▆▅▆██████▇▆▇▆▆▅▃▁▁▁▁▃▁▁▁▁▄▃▁▃ █
516 μs Histogram: log(frequency) by time 1.15 ms <
Memory estimate: 955.98 KiB, allocs estimate: 21594.The time to access the energy column using a loop is much larger.
This page was generated using Literate.jl.